Microsoft India has announced the availability of its largest publicly available Indian language speech data for research in three languages – Telugu, Tamil and Gujarati. The dataset which includes audio and corresponding transcripts is aimed at helping researchers and academia build Indian language speech recognition for all applications where speech is used, said the American firm.

The Indian language Speech Corpus content is provided by Microsoft Research Open Data initiative, a collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain-specific sciences, said the company.

According to the Redmond-based firm, today, there is a scarcity of adequate digital data for text, speech and linguistic resources – which are imperative in building large machine learning models for many vernacular languages across the world. Moreover, the differences in enunciation, accent, diction, and slang across various regions in India are very subtle. As a result of these complexities, development of accurate digital tools in Indian languages has been slow.

The company asserted that it was working to address this lack of data and catalyze the development of machine learning based models that can help in building systems for low resource languages, thus enabling the ecosystem of researchers, academia and tech companies working on India language models and to accelerate the needs of Indian users.

“Microsoft Indian Language Speech Corpus is an extension of our on-going efforts to reduce language barriers and empower Indians to harness the full potential of the Internet. Using our technology expertise, we want to accelerate innovation in voice-based computing for India by supporting researchers and academia,” said Sundar Srinivasan, General Manager, Artificial Intelligence & Research, Microsoft India.

The company informed that its Indian Language Speech Corpus was tested at Interspeech 2018, the world's largest and most comprehensive conference on the science and technology of spoken language processing. In a Low Resource Speech Recognition Challenge, participants used data from Microsoft Indian language speech corpus to build Automatic Speech Recognition (ASR) systems. They were able to create high-quality speech recognition models using this data, thus validating the efficacy of the Corpus.

ALSO READ

Read More News On
Microsoft

Microsoft releases its largest publicly available Telugu, Tamil and Gujarati speech data for research

Microsoft releases its largest publicly available Telugu, Tamil and Gujarati speech data for research

Microsoft India has announced the availability of its largest publicly available Indian language speech data for research in three languages - Telugu, Tamil and Gujarati.

Exclusive

How AI can enhance your workflow automation

Can smart meters make India energy efficient?

How can India transform two crore women into ‘Lakhpati Didi’

ALSO READ

Microsoft, NVIDIA to enhance enterprise AI with key integrations across Azure

Oracle Database@Azure expands to 15 global regions

From Android to iOs: Microsoft expands Copilot app to Apple users

Will Google’s transition to on-device location storage end geofence warrants abuse

Podcast

Pankaj Agrawal of Cisco on WebEx for Productivity in Hybrid Work and Hybrid Workplaces

Latest in TECH

STMicro quarterly profit declines by 18.4% owing to slow demand

Saudia Airlines launches AI-powered ‘Travel Companion’ with Accenture

Synology launches HD6500, aims to boost data security in India

How AI can enhance your workflow automation

Can smart meters make India energy efficient?

RELATED ARTICLES

Microsoft releases its largest publicly available Telugu, Tamil and Gujarati speech data for research

Microsoft releases its largest publicly available Telugu, Tamil and Gujarati speech data for research

Microsoft India has announced the availability of its largest publicly available Indian language speech data for research in three languages - Telugu, Tamil and Gujarati.

Get the day's headlines from Tech Observer straight in your inbox

Subscribe to our Newsletter

RELATED ARTICLES