HMM-1-简介

1 简介

HMM是Hidden Markov Models缩写, 1960s, Baum和他的同事发布一系列论文,1970s CMU的Baker用于实现语音处理,1970s IBM Jelinek和他的同事用于语音处理\(^{[1]}\).

hidden-markov-models

重要假设:当前状态仅与上一步状态相关,与上一步之前的状态无关.

1.1 符号定义

  1. \( N \):模型状态数量\(^{[2]}\).
    \[ S = {S_1,S_2,\cdots,S_N} \]

  2. \( M \):不同观测符号数量\(^{[2]}\).
    \[ V = {v_1,v_2,\cdots,v_M} \]

  3. \( T \):序列长度\(^{[2]}\).
    \[ \begin{align}
    O &= { O_1,O_2,\cdots,O_T } \qquad \text{观测序列} \\
    Q &= { q_1,q_2,\cdots,q_T } \qquad \text{状态序列} \\
    \end{align} \]

  4. 状态转移概率\(^{[2]}\):
    \[ A = [a_{ij} ] \quad where \quad a_{ij} \equiv P(q_{t+1} = S_j | q_t = S_i) \]

  5. 观测概率\(^{[2]}\):
    \[ B = [b_j(m)] \quad where \quad b_j(m) \equiv P(O_t = v_m | q_t = S_j) \]

  6. 初始状态概率\(^{[2]}\):
    \[ \Pi = [\pi_i] \quad where \quad \pi_i \equiv P(q_1 = S_i) \]

  7. 模型参数\(^{[2]}\).
    \[ \lambda = (A,B,\Pi) \]

1.2 模型

\( \require{AMScd} \)
\[ \text{HMM}^{[2]} \\
\begin{CD}
q_1 @>>> q_2 @. \cdots @. q_{t-1} @>>> q_t \\
@VVV @VVV @. @VVV @VVV \\
O_1 @. O_2 @. @. O_{t-1} @. O_t
\end{CD} \\
P(Q,O | \lambda)=\prod^T_{t=1}p(q_t|q_{t-1}, \lambda)p(O_t|q_t, \lambda) \\
\]

中文分词词性和序列标注之MEMM-1-比较HMM

1.3 数值稳定性

模型计算时存在多个小于1.0的数连乘,最终模型的值将远远低于计算机浮点数最小表达范围.例:
\[ \begin{align}
\text{suppose:} T,N,M &= 100,100,1000 \\
\text{subject to:} \sum_{q_t}^N p(q_t|q_{t-1}, \lambda) &= 1.0 \\
\sum_{O_t}^M p(O_t|q_t, \lambda) &= 1.0 \\
\text{thus:} P(Q,O | \lambda) & \approx \left[ \frac{1}{MN} \right] ^T = 1E-500
\end{align} \]

IEEE 754中双精度浮点数精度\(^{[3]}\)为 \( 2^{53} \approx 1.11 \times 10^{−16} \approx 1E-16\)

因此,实践中常用对数概率.
\[ \log P(Q,O | \lambda)=\sum^T_{t=1} \log p(q_t|q_{t-1}, \lambda) + \log p(O_t|q_t, \lambda) \]

1.4 代码试验

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import numpy as np
 
np.random.seed(42)
 
T,N,M = 10,20,30
 
O = np.random.randint(low=0, high=M, size=T).flatten() # O = { O_1,O_2,\cdots,O_T } 
Q = np.random.randint(low=0, high=N, size=T).flatten() # Q = { q_1,q_2,\cdots,q_T }
Pi = np.random.dirichlet(np.ones(N), size=1).flatten() # \Pi = [\pi_i]
A = np.random.dirichlet(np.ones(N), size=N) # A = [a_{ij}]
B = np.random.dirichlet(np.ones(M), size=N) # B = [b_j(m)]
Lambda = (A, B, Pi) # \lambda = (A,B,\Pi)
 
def P(Q, O):
  ''' p(Q,O | \lambda)=\prod^T_{t=1}p(q_t|q_{t-1}, \lambda)p(O_t|q_t, \lambda) '''
  return np.prod([(A[Q[t-1], Q[t]] if t>0 else Pi[Q[t]]) * B[Q[t],O[t]] for t in range(T)])
 
def logP(Q, O):
  ''' \log P(Q,O | \lambda)=\sum^T_{t=1} \log p(q_t|q_{t-1}, \lambda) + \log p(O_t|q_t, \lambda) '''
  logPi,logA,logB = np.log(Pi), np.log(A), np.log(B)
  return np.sum([(logA[Q[t-1], Q[t]] if t>0 else logPi[Q[t]]) + logB[Q[t],O[t]] for t in range(T)])
 
print( P(Q, O), np.exp(logP(Q, O)), logP(Q, O) )

output: 1.3708753719444907e-38 1.3708753719444917e-38 -87.1827840403565

References:
[1] Rabiner, L. R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77:257–286.
[2] Daniel Jurafsky, James H. Martin “Speech and Language Processing” 15.3 Hidden Markov Models :P421
[3] Double-precision floating-point format From Wikipedia

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload the CAPTCHA.

Proudly powered by WordPress   Premium Style Theme by www.gopiplus.com