HMM-3-查找最大概率状态序列(Viterbi)

3 查找最大概率状态序列

当给定模型参数\(\lambda\)和状态序列\(O\)时

3.1 Viterbi

目标为:

\[ \bar{Q} = \underset{Q}{\arg \max} P(Q, O | \lambda) \]

  • 定义\( ^{[2]} \)

\[ \begin{align}
\delta_t(i) &= \underset{q_1,q_2,\cdots,q_{t-1}}{\max} P(q_1,q_2,\cdots,q_t=i, O_1, O_2, \cdots, O_t | \lambda) \\
\delta_{t+1}(j) &= \underset{i}{\max} [\delta_t(i) a_{ij}] b_j(O_{t+1})
\end{align} \]

每次选择过去的概率乘转换概率后的最大概率作为最佳概率, 只要保持跟踪记录上次选择的最大概率对应状态\( S_i \). 则得到了最大概率状态序列.

  • 初始化

\[ \begin{align}
\delta_1(i) &= \pi_i b_i(O_1) \\
\psi_1(i) &= null
\end{align} \]

当前就是第一个状态,没有上一个状态.

  • 循环

\[ \begin{align}
\delta_t(j) &= \underset{i}{\max} [\delta_{t-1}(i) a_{ij}] b_j(O_t) \\
\psi_t(j) &= \underset{i}{\arg \max} [\delta_{t-1}(i) a_{ij}]
\end{align} \]

  • 最后

\[ \begin{align}
\bar{P} &= \underset{i}{\max} \delta_T(i) \\
\bar{q}_T &= \underset{i}{\arg \max} \delta_T(i)
\end{align} \]

\( \bar{P} \) 得到的是最佳路径状态序列和观测序列的联合概率 \( \underset{Q}{\max} P(Q, O | \lambda) \)

  • 回朔得到最大概率状态序列路径

\[ \bar{q}_t = \psi_{t+1}( \bar{q}_{t+1} ) \qquad \text{where} \qquad t=T-1,T-2,\cdots,2,1 \]

3.2 数值稳定性

连乘改为对数求和即可\( ^{[1]} \)

\[ \delta_{t+1}(j) = \underset{i}{\max} \delta_t(i) + \log a_{ij} + \log b_j(O_{t+1}) \]

3.3 代码实验

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import numpy as np
 
np.random.seed(42)
 
T,N,M = 30,10,20
 
O = np.random.randint(low=0, high=M, size=T).flatten() # O = { O_1,O_2,\cdots,O_T } 
Pi = np.random.dirichlet(np.ones(N), size=1).flatten() # \Pi = [\pi_i]
A = np.random.dirichlet(np.ones(N), size=N) # A = [a_{ij}]
B = np.random.dirichlet(np.ones(M), size=N) # B = [b_j(m)]
Lambda = (A, B, Pi) # \lambda = (A,B,\Pi)
 
def P(Q, O):
  ''' p(Q,O | \lambda)=\prod^T_{t=1}p(q_t|q_{t-1}, \lambda)p(O_t|q_t, \lambda) '''
  return np.prod([(A[Q[t-1], Q[t]] if t>0 else Pi[Q[t]]) * B[Q[t],O[t]] for t in range(T)])
 
def logP(Q, O):
  ''' \log P(Q,O | \lambda)=\sum^T_{t=1} \log p(q_t|q_{t-1}, \lambda) + \log p(O_t|q_t, \lambda) '''
  logPi,logA,logB = np.log(Pi), np.log(A), np.log(B)
  return np.sum([(logA[Q[t-1], Q[t]] if t>0 else logPi[Q[t]]) + logB[Q[t],O[t]] for t in range(T)])
 
delta_cache = np.zeros(shape=(T, N))
psi_cache = np.zeros(shape=(T, N), dtype=np.int)
def delta(t):
  ''' 0=>t1, 1=>t2, ...,T-1=>T '''
  if t==0:
    delta_cache[t] = Pi * B[:, O[t]]
    psi_cache[t] = -1
  else:
    for j in range(N):
      max_delta = max([(delta_cache[t-1, i] * A[i, j] * B[j, O[t]], i) for i in range(N)], key=lambda x:x[0])
      delta_cache[t, j] = max_delta[0]
      psi_cache[t, j] = max_delta[1]
 
for t in range(T):
  delta(t)
 
Q = np.zeros(shape=T, dtype=np.int)
P_bar = np.max(delta_cache[T-1])
Q[T-1] = np.argmax(delta_cache[T-1])
for t in range(T-2, -1, -1):
  Q[t] = psi_cache[t+1, Q[t+1]]
 
logdelta_cache = np.zeros(shape=(T, N))
logpsi_cache = np.zeros(shape=(T, N), dtype=np.int)
def logdelta(t):
  ''' 0=>t1, 1=>t2, ...,T-1=>T '''
  if t==0:
    logdelta_cache[t] = np.log(Pi) + np.log(B[:, O[t]])
    logpsi_cache[t] = -1
  else:
    for j in range(N):
      max_delta = max([(logdelta_cache[t-1, i] + np.log(A[i, j]) + np.log(B[j, O[t]]), i) for i in range(N)], key=lambda x:x[0])
      logdelta_cache[t, j] = max_delta[0]
      logpsi_cache[t, j] = max_delta[1]
 
for t in range(T):
  logdelta(t)
 
logQ = np.zeros(shape=T, dtype=np.int)
logP_bar = np.max(logdelta_cache[T-1])
logQ[T-1] = np.argmax(logdelta_cache[T-1])
for t in range(T-2, -1, -1):
  logQ[t] = logpsi_cache[t+1, Q[t+1]]
 
print(P_bar, '\\bar{P}')
print(P(Q, O), Q, 'P(Q, O)')
print(logP_bar, '\\bar{\logP}')
print(logP(Q, O), logQ, '\logP(Q, O)')

output: \[ \begin{align} \bar{P} &= 9.511542024402762e-53 \\ P(Q, O) &= 9.511542024402753e-53 \\ Q &= [3 7 8 2 3 9 1 8 2 2 3 9 1 3 9 1 6 0 1 9 7 1 8 2 3 4 4 5 0 1] \\ \log \bar{ P} &= -119.78450391759523 \\ \log P(Q, O) &= -119.78450391759522 \\ logQ &= [3 7 8 2 3 9 1 8 2 2 3 9 1 3 9 1 6 0 1 9 7 1 8 2 3 4 4 5 0 1] \end{align} \]

References:
[1] Rabiner, L. R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77:257–286.
[2] Daniel Jurafsky, James H. Martin “Speech and Language Processing” 15.3 Hidden Markov Models :P421

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload the CAPTCHA.

Proudly powered by WordPress   Premium Style Theme by www.gopiplus.com