Jump to Content

Voice and sound generation

Speech Generation

Name	What the agent will do	How to configure it when building an agent
Speech Generator	Convert written text into natural-sounding audio with configurable voices, languages, and speaking styles. Generate single-voice narrations or multi-speaker dialogues for podcasts, training content, and audio communications.	No configuration required; simply enable this capability

Key capabilities:

Single voice audio: Generate up to 10,000 characters of speech with customizable gender, language (21 languages supported), and use case (conversational, narrative, educational, etc.)
Multi-speaker dialogue: Create natural conversations between multiple speakers, supporting up to 5,000 dialogue lines for podcast-style content
High-quality output: All audio generated in MP3 format at 44.1kHz, 128kbps for professional-grade results

Common use cases:

Transform research from @deep-dive into podcast episodes
Create audio versions of written content for accessibility
Generate training materials and onboarding content in audio format
Produce multilingual voice content for global teams

Updated 4 months ago

Did this page help you?