It's been an error-prone year for the BBC's subtitling department. On the back of yet another NSFW subtitle gaffe at the BBC during the recent royal wedding in which automated subtitles read “beautiful breasts” instead of “beautiful dress”, human-driven transcription services appear best equipped to help the corporation due to their 99% accuracy rate over AI-powered solutions.

The BBC's run of errors started in February this year when its live subtitling service mistakenly produced a subtitle reading “Nigel Owens is a gay”, when it should have been “Nigel Owens is saying penalty and yellow card” in reference to the Welsh international referee awarding Scotland a penalty during a Scotland v England rugby match. After the royal wedding, more automated subtitles mistook ‘six' for ‘sex' during the BBC's popular Saturday night Strictly Come Dancing entertainment show.

The BBC claims that the errors in the subtitles do not lie with its automated speech recognition (ASR) subtitling service, which it claims “produces accuracy levels in excess of 98%” – yet a survey from the World Economic Forum shows that whilst AI has a word accuracy rate of 95%, it will not yet be fully able to substitute human translation or transcription services.

In recent years, there have undoubtedly been major strides made in terms of AI transcription with AI-driven transcription services producing error rates of just 5.9%. However, language experts believe that the human ear is better adapted to recognizing a broader vocabulary, different accents, and interlocked speech.

One of the biggest reasons for inaccuracies in ASR-driven subtitles is the narrow set of simple, short, command-based vocabularies surrounding interactions with bots, who rely on a dictionary-based vocabulary. As a result, they are generally unable to recognize slang-terms, colloquialisms, and interlocked speech.

“The inaccuracies seen in the BBC's ASR-driven subtitles are understandable,” explained transcription service provider GoTranscript CEO, Peter Trebek. “Although AI-driven tools are able to complement features such as ASR-driven subtitles, they are not yet able to recognize interlocked speech as found in sporting and social events, and neither are they yet able to recognize colloquial language or varied accents. This is why for the immediate future, the human ear will still serve as the standard bearer in terms of transcription and translation services.”

AI powered transcription certainly has its perks when it comes to speed, which is likely the main reason it is used for live event transcriptions. However, human powered transcriptions are still the preferred choice for those looking to avoid awkward mistakes.