Our previous share was on the pessimistic end of the spectrum, and with good reason. However, the major companies in voice tech are making headway in tackling errors. Specifically, Word Error Rate is continuing to be reduced. WER is the primary measure of accuracy for speech reco systems and accurate recognition is of course a prerequisite to understanding and appropriate processing. So, even though a 20% reduction in WER *just* means going from a 5% WER to a 4% WER. at the scale of todays systems, these kinds of improvements are significant.
On test data that we produced by simultaneously playing recorded speech and media sounds through loudspeakers and re-recording the combined acoustic signal, our system shows a 20% relative reduction in terms of word error rate versus a system trained only on the clean, annotated data.