Abstract:
The mathematical model and software implementation of an automatic Russian speech recognition system that employs techniques of digital processing and analysis of audiovisual signals from a microphone and a video camera are presented. The description of probabilistic modeling of audiovisual speech based on coupled hidden Markov models, information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Russian speech indicate high accuracy and reliability of the automatic system.
Citation:
A. A. Karpov, “An automatic multimodal speech recognition system with audio and video information”, Avtomat. i Telemekh., 2014, no. 12, 125–138; Autom. Remote Control, 75:12 (2014), 2190–2200
\Bibitem{Kar14}
\by A.~A.~Karpov
\paper An automatic multimodal speech recognition system with audio and video information
\jour Avtomat. i Telemekh.
\yr 2014
\issue 12
\pages 125--138
\mathnet{http://mi.mathnet.ru/at14166}
\transl
\jour Autom. Remote Control
\yr 2014
\vol 75
\issue 12
\pages 2190--2200
\crossref{https://doi.org/10.1134/S000511791412008X}
\isi{https://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=Publons&SrcAuth=Publons_CEL&DestLinkType=FullRecord&DestApp=WOS_CPL&KeyUT=000346402900008}
\scopus{https://www.scopus.com/record/display.url?origin=inward&eid=2-s2.0-84919360128}
Linking options:
https://www.mathnet.ru/eng/at14166
https://www.mathnet.ru/eng/at/y2014/i12/p125
This publication is cited in the following 15 articles:
Astha Gupta, Rakesh Kumar, Yogesh Kumar, 2022 4th International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), 2022, 1492
Fang Y., Yu L., Fei Sh., “Contactless Interactive Control Technology Based on Switching Filtering Algorithm”, Trans. Inst. Meas. Control, 43:2 (2021), 484–494
Malakar M., Keskar R.B., “Progress of Machine Learning Based Automatic Phoneme Recognition and Its Prospect”, Speech Commun., 135 (2021), 37–53
Denis Ivanko, Dmitry Ryumin, Irina Kipyatkova, Alexandr Axyonov, Alexey Karpov, Smart Innovation, Systems and Technologies, 154, Proceedings of 14th International Conference on Electromechanics and Robotics “Zavalishin's Readings”, 2020, 477
M. P. Farkhadov, N. V. Petukhova, S. V. Vaskovskii, M. E. Farkhadova, “Povyshenie effektivnosti rechevogo interfeisa s primeneniem kognitivnykh i lingvisticheskikh znanii”, UBS, 81 (2019), 90–112
S. Pekarskikh, E. Kostyuchenko, L. Balatskaya, “Evaluation of speech quality through recognition and classification of phonemes”, Symmetry-Basel, 11:12 (2019), 1447
Evgeny Kostuchenko, Dariya Novokhrestova, Marina Tirskaya, Alexander Shelupanov, Mikhail Nemirovich-Danchenko, Evgeny Choynzonov, Lidiya Balatskaya, Lecture Notes in Computer Science, 11658, Speech and Computer, 2019, 237
Evgeny Kostuchenko, Dariya Novokhrestova, Svetlana Pekarskikh, Alexander Shelupanov, Mikhail Nemirovich-Danchenko, Evgeny Choynzonov, Lidiya Balatskaya, Lecture Notes in Computer Science, 11658, Speech and Computer, 2019, 359
D. Ivanko, A. Karpov, D. Fedotov, I. Kipyatkova, D. Ryumin, D. Ivanko, W. Minker, M. Zelezny, “Multimodal speech recognition: increasing accuracy using high speed video data”, J. Multimodal User Interfaces, 12:4, SI (2018), 319–328
N. Radha, A. Shahina, P. Prabha, P. B. T. Sri, N. A. Khan, “An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition”, Pattern Recognit. Lett., 115, SI (2018), 39–49
A. A. Karpov, R. M. Yusupov, “Multimodal Interfaces of Human–Computer Interaction”, Her. Russ. Acad. Sci., 88:1 (2018), 67
I. S. Kipyatkova, A. A. Karpov, “A study of neural network Russian language models for automatic continuous speech recognition systems”, Autom. Remote Control, 78:5 (2017), 858–867
Denis Ivanko, Alexey Karpov, Dmitry Ryumin, Irina Kipyatkova, Anton Saveliev, Victor Budkov, Dmitriy Ivanko, Miloš Železný, Lecture Notes in Computer Science, 10458, Speech and Computer, 2017, 757
Alexey Karpov, Alexander Ronzhin, Irina Kipyatkova, Andrey Ronzhin, Vasilisa Verkhodanova, Anton Saveliev, Milos Zelezny, Lecture Notes in Computer Science, 9732, Human-Computer Interaction. Interaction Platforms and Techniques, 2016, 170
A. Karpov, A. Ronzhin, I. Kipyatkova, “Automatic analysis of speech and acoustic events for ambient assisted living”, Universal Access in Human-Computer Interaction: Access To Interaction, Pt II, Lecture Notes in Computer Science, 9176, eds. M. Antona, C. Stephanidis, Springer-Verlag Berlin, 2015, 455–463