HomeLatest NewsEnterprise ITGoogle launches ‘Cloud Text-to-Speech’ services powered by its Britain based AI subsidiary DeepMind

Google launches ‘Cloud Text-to-Speech’ services powered by its Britain based AI subsidiary DeepMind

Google has launched a voice synthesiser called "Cloud Text-to-Speech" which is powered by its Britain-based Artificial Intelligence (AI) subsidiary DeepMind.

Preferred Source of Google

Google has launched a voice synthesiser services called “Cloud Text-to-Speech” which is powered by its -based Artificial Intelligence (AI) subsidiary . The service which is being used by Cisco and Dolphin ONE is now available for developers to add it in their own applications. In a blog post, Dan Aharon, Product Manager, Cloud AI at Google said, “Many Google products (e.g., the , Search, Maps) come with built-in high-quality text-to-speech synthesis that produces natural sounding speech. Developers have been telling us they’d like to add text-to-speech to their own applications, so today we’re bringing this technology to Google Cloud Platform with Cloud Text-to-Speech.”

A text-to-speech service is a form of speech synthesis that converts text into spoken voice output. Google’s text-to-speech are already an integral part of the voices in service like Google Assistant, Search and Maps. A text-to-speech can be used in variety of ways such as to power voice response systems for call centers (IVRs) and enabling real-time natural language conversations; to enable IoT devices (e.g., TVs, cars, robots) to talk back or to convert text-based media (e.g., news articles, books) into spoken format (e.g., podcast or audiobook).

“Cloud Text-to-Speech lets you choose from 32 different voices from 12 languages and variants. Cloud Text-to-Speech correctly pronounces complex text such as names, dates, times and addresses for authentic sounding speech right out of the gate. Cloud Text-to-Speech also allows you to customize pitch, speaking rate, and volume gain, and supports a variety of formats, including MP3 and WAV,” said Aharon,

Advertisement
Saksham Bharat 2026
Saksham Bharat 2026
A multi-stakeholder dialogue on skilling gap in Cybersecurity, Data Resilience and AI — and the roadmap to a Saksham Bharat.
Register Now →
VeeamON 2026 Tour India - Mumbai
VeeamON 2026 Tour India - Mumbai
A VeeamON 2026 India Leadership Series Mumbai for senior public sector and government technology leaders.
Register Now →
Cyber Surakshit Uttar Pradesh
Cyber Surakshit Uttar Pradesh
Find out strategies, frameworks and solutions for building a resilient and secure digital ecosystem across Uttar Pradesh.
Register Now →
VeeamON 2026 Tour India - Bengaluru
VeeamON 2026 Tour India - Bengaluru
A VeeamON 2026 India Leadership Series Bengaluru for senior public sector and government technology leaders.
Register Now →
VeeamON 2026 Tour India - Delhi
VeeamON 2026 Tour India - Delhi
A VeeamON 2026 India Leadership Series Delhi for senior public sector and government technology leaders.
Register Now →
Infosec Reimagined
Infosec Reimagined
Infosec Reimagined 2026 is the premier information security summit where top leaders—CISOs, CROs, CIOs, CTOs and risk executives—converge to redefine cyber resilience.
Register Now →
Digital Senate
Digital Senate
Digital Senate is a premier conference uniting government leaders, technologists and innovators to share ideas, success stories and strategies on digital governance, public sector transformation, cybersecurity and emerging technologies in India.
Register Now →
CIO Prism
CIO Prism
CIO Prism unites forward-thinking technology leaders to exchange transformative insights, shape digital strategies, and foster innovation, empowering enterprises to excel in an era of rapid technological change.
Register Now →

Cloud Text-to-Speech also includes a selection of high-fidelity voices built using WaveNet, a generative model for raw audio created by DeepMind. Company informed that in late 2016, DeepMind introduced the first version of WaveNet — a neural network trained with a large volume of speech samples that’s able to create raw audio waveforms from scratch. During training, the network extracts the underlying structure of the speech, for example which tones follow one another and what shape a realistic speech waveform should have. When given text input, the trained WaveNet model generates the corresponding speech waveforms, one sample at a time, achieving higher accuracy than alternative approaches.

Now company is using an updated version of WaveNet that runs on Google’s Cloud TPU infrastructure. The new, improved WaveNet model generates raw waveforms 1,000 times faster than the original model, and can generate one second of speech in just 50 milliseconds. “In fact, the model is not just quicker, but also higher-fidelity, capable of creating waveforms with 24,000 samples a second. We’ve also increased the resolution of each sample from 8 bits to 16 bits, producing higher quality audio for a more human sound,” said Aharon.

Google said with these adjustments, the new WaveNet model produces more natural sounding speech. “As WaveNet voices also require less recorded audio input to produce high quality models, we expect to continue to improve both the variety as well as quality of the WaveNet voices available to Cloud customers in the coming months,” said Aharon

Advertisement

For Google, Cloud Text-to-Speech customers include Cisco and Dolphin ONE. “As the leading provider of collaboration solutions, Cisco has a long history of bringing the latest technology advances into the enterprise. Google’s Cloud Text-to-Speech has enabled us to achieve the natural sound quality that our customers desire,” said Tim Tuttle, CTO of Cognitive Collaboration, Cisco

“Dolphin ONE’s Calll.io telephony platform offers connectivity from a multitude of devices, at practically any location. We’ve integrated Cloud Text-to-Speech into our products and allow our users to create natural call center experiences. By using Google Cloud’s machine learning tools, we’re instantly delivering cutting-edge technology to our users,” said Jason Berryman, Dolphin ONE

Get the day's headlines from Tech Observer straight in your inbox

By subscribing you agree to our Privacy Policy, T&C and consent to receive newsletters and other important communications.
Sanjay Singh
Sanjay Singh
Sanjay Singh covers startups, consumer electronics and telecom for TechObserver.in
- Advertisement -
Powered By Veeam Logo
- Advertisement -

Subscribe to our Newsletter

By subscribing you agree to our Privacy Policy, T&C and consent to receive newsletters and other important communications.
- Advertisement -

India to Lead Global IT Security Standards Body for Two Years

India will chair the Common Criteria Development Board from April 2026, gaining influence over international IT security certification standards recognised by 38 countries.

RELATED ARTICLES