中文分词词性和序列标注之CRF-4-查找最大概率序列

4 linear chain CRF Viterbi

超找给定\( x \)序列最大概率输出序列\( y^\ast \)。

HMM/MEMM viterbi类似的搜索方法,但在CRF中并不用计算规范化概率。
\[ \begin{align}
y^\ast &= \arg\max_y p(y|x) \\
&= \arg\max_y \frac{ \exp( w \cdot F(y,x) ) }{Z_w(x)} \\
&= \arg\max_y \exp( w \cdot F(y,x) ) \\
&= \arg\max_y w \cdot F(y,x) \\
&= \arg\max_y \sum_k w_k f_k(y, x) \\
&= \arg\max_y \sum_k w_k \sum_i f_k(y_{i-1}, y_i, x_i) \\
&= \arg\max_y \sum_i \sum_k w_k f_k(y_{i-1}, y_i, x_i) \\
&= \arg\max_y \sum_i f^\prime(y_{i-1}, y_i, x_i)
\end{align} \]

\[ f^\prime(y_{i-1}, y_i, x_i)=\sum_k w_k f_k(y_{i-1}, y_i, x_i) \]

也就是说只需计算连续最大特征函数求和路径。

可看到线行链中只有前后两个\( y_{i-1}, y_i\)是互相依赖的,因此这里和HMM类似,只是把求积换成求和。

  1. 初始化:
    \[ \begin{align}
    \delta_1(s_j) &= f^\prime(y_0=start, y_1=s_j, x_1) \\
    \psi_1(s_j) &= 0
    \end{align} \]

  2. 递归:
    \[ \begin{align}
    \delta_i(s_l) &= \max_{s_j} \delta_{i-1}(s_j) + f^\prime(y_{i-1}=s_j, y_i=s_l, x_i) \\
    \psi_i(s_l) &= \arg\max_{s_j} \delta_{i-1}(s_j) + f^\prime(y_{i-1}=s_j, y_i=s_l, x_i)
    \end{align} \]

  3. 结束:

\[ \begin{align}
\delta_{n+1}(s_l) &= \max_{s_j} \delta_{n}(s_j) + f^\prime(y_{n}=s_j, y_{n+1}=stop, x_{n+1}) \\
\psi_{n+1}(s_l) &= \arg\max_{s_j} \delta_{n}(s_j) + f^\prime(y_{n}=s_j, y_{n+1}=stop, x_{n+1}) \\
p^\ast &= \max_{s_j} \delta_{n+1}(s_j) \\
y_n^\ast &= \max_{s_j} \delta_{n+1}(s_j) \\
y_i^\ast &= \psi_{i+1}(y_{i+1}^\ast) \quad, i = n-1, n-2, \cdots, 1
\end{align} \]

参考资料:
[1] 李航; 统计机器学习
[2] Ethem Alpaydın: Introduction to Machine Learning Third Edition 15 Hidden Markov Models
[3] Lafferty, J., McCallum, A., Pereira, F. (2001). “Conditional random fields: Probabilistic models for segmenting and labeling sequence data”. Proc. 18th International Conf. on Machine Learning. Morgan Kaufmann. pp. 282–289.
[4] An Introduction to Conditional Random Fields Charles Sutton1 and Andrew McCallum2

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload the CAPTCHA.

Proudly powered by WordPress   Premium Style Theme by www.gopiplus.com
渝公网安渝公网安备 50010702500270号 渝ICP备09056628号-7