4 linear chain CRF Viterbi

超找给定$ x $序列最大概率输出序列$ y^\ast $。

HMM/MEMM viterbi 类似的搜索方法,但在CRF中并不用计算规范化概率。

$$ \begin{align} y^\ast &= \arg\max_y p(y|x) \\ &= \arg\max_y \frac{ \exp( w \cdot F(y,x) ) }{Z_w(x)} \\ &= \arg\max_y \exp( w \cdot F(y,x) ) \\ &= \arg\max_y w \cdot F(y,x) \\ &= \arg\max_y \sum_k w_k f_k(y, x) \\ &= \arg\max_y \sum_k w_k \sum_i f_k(y_{i-1}, y_i, x_i) \\ &= \arg\max_y \sum_i \sum_k w_k f_k(y_{i-1}, y_i, x_i) \\ &= \arg\max_y \sum_i f^\prime(y_{i-1}, y_i, x_i) \end{align} $$

$$ f^\prime(y_{i-1}, y_i, x_i)=\sum_k w_k f_k(y_{i-1}, y_i, x_i) $$

也就是说只需计算连续最大特征函数求和路径。

可看到线行链中只有前后两个$ y_{i-1}, y_i$是互相依赖的,因此这里和HMM类似,只是把求积换成求和。

  1. 初始化:

    $$ \begin{align} \delta_1(s_j) &= f^\prime(y_0=start, y_1=s_j, x_1) \\ \psi_1(s_j) &= 0 \end{align} $$

  2. 递归:

    $$ \begin{align} \delta_i(s_l) &= \max_{s_j} \delta_{i-1}(s_j) + f^\prime(y_{i-1}=s_j, y_i=s_l, x_i) \\ \psi_i(s_l) &= \arg\max_{s_j} \delta_{i-1}(s_j) + f^\prime(y_{i-1}=s_j, y_i=s_l, x_i) \end{align} $$

  3. 结束:

    $$ \begin{align} \delta_{n+1}(s_l) &= \max_{s_j} \delta_{n}(s_j) + f^\prime(y_{n}=s_j, y_{n+1}=stop, x_{n+1}) \\ \psi_{n+1}(s_l) &= \arg\max_{s_j} \delta_{n}(s_j) + f^\prime(y_{n}=s_j, y_{n+1}=stop, x_{n+1}) \\ p^\ast &= \max_{s_j} \delta_{n+1}(s_j) \\ y_n^\ast &= \max_{s_j} \delta_{n+1}(s_j) \\ y_i^\ast &= \psi_{i+1}(y_{i+1}^\ast) \quad, i = n-1, n-2, \cdots, 1 \end{align} $$

参考资料:
[1] 李航; 统计机器学习
[2] Ethem Alpaydın: Introduction to Machine Learning Third Edition 15 Hidden Markov Models
[3] Lafferty, J., McCallum, A., Pereira, F. (2001). "Conditional random fields: Probabilistic models for segmenting and labeling sequence data". Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp. 282–289.
[4] An Introduction to Conditional Random Fields Charles Sutton1 and Andrew McCallum2

Tag:none

Add a new comment.