End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
End-to-end audiovisual speech recognition based on attention fusion of SDBN and BLSTM
Blog Article
An end-to-end audiovisual speech recognition algorithm was proposed.In algorithm,a sparse DBN was constructed by introducing mixed l<sub>1/2</sub>norm and l<sub>1</sub>norm berness white sneakers into Deep Belief Network with bottleneck structure to extract the sparse bottleneck features,so as to reduce the dimension of data features,and then a BLSTM was used to model the feature in time series.Then,a attention mechanism was used to align and fuse the lip visual information and audio auditory information automatically.
Finally,the fused audiovisual information was classified and identified by a BLSTM with a Softmax layer attached.Experiments show that the algorithm can aluminum lotion effectively identify visual and auditory information,and has good recognition rate and robustness in similar algorithms.