
Google Cloud Speech to Text
Open siteTransform voice to text accurately across 125+ languages, real-time, customizable, secure.
Google Cloud Speech to Text Information
What is Google Cloud Speech to Text?
Speech-to-Text is a Google Cloud service that allows you to convert audio into text transcriptions and integrate speech recognition into applications. It supports over 125 languages and variants and offers pretrained and customizable models for transcription, including voice control, phone call, and video transcription. It uses advanced speech AI, including Chirp, Google Cloud’s foundation model for speech trained on millions of hours of audio data and billions of text sentences. Speech-to-Text offers three main methods for speech recognition: synchronous, asynchronous, and streaming, returning text results based on whether transcription is needed in post processing, periodically, or in real time. It also offers enterprise-grade encryption with customer-managed encryption keys for all resources as well as batch transcription.
Google Cloud Speech to Text Core Features
- Advanced speech AI using Chirp, Google Cloud’s foundation model for speech
- Support for over 125 languages and variants
- Transcription of short, long, or streaming audio
- Pretrained or customizable models for transcription
- Out-of-the-box regulatory and security compliance
- AI-powered speech recognition and transcription
- Model adaptation to improve accuracy of frequently used words
Google Cloud Speech to Text Pricing
speech-to-text v1 api
$0.024 per min/month
- Data residency for multi region only
- Models include short, long, phone call, and video
- Does not include audit logging
- New customers get $300 in free credits and 60 minutes for transcribing and analyzing audio free per month, not charged against your credits.
speech-to-text v2 api
$0.016 per min/month
- Data residency for multi and single region
- Models include short, long, telephony, video, and Chirp
- Includes audit logging and support for customer managed encryption keys