Researchers from Chinese internet firm Baidu say they have made a breakthrough in voice recognition delivering a system that outperforms others in standard benchmark tests.
The US-based team led by Andrew Ng drew on ‘deep learning’, a branch of Machine Learning, to develop their system, which they call Deep Speech. While the science behind Deep Speech is complex, the system is based on two key elements: the training of a large recurrent neural network (RNN) using multiple GPUs and the processing of thousands of hours of labelled data.
When trained from large quantities of labelled speech data, the RNN can learn to produce readable character-level transcriptions with improved accuracy. The dataset used to train Deep Speech consisted of over 7,000 hours of conversational and read speech. The research team performed two sets of experiments to evaluate the new system: one assessing conversational speech and one assessing noisy speech.
While Deep Speech outperformed four other commercial systems, including those of Google and Apple, on both tests it was in its handling of noisy speech that it really excelled.