Voice and sound generation
Speech Generation
| Name | What the agent will do | How to configure it when building an agent |
|---|---|---|
| Speech Generator | Convert written text into natural-sounding audio with configurable voices, languages, and speaking styles. Generate single-voice narrations or multi-speaker dialogues for podcasts, training content, and audio communications. | No configuration required; simply enable this capability |
Key capabilities:
- Single voice audio: Generate up to 10,000 characters of speech with customizable gender, language (9 languages supported), and use case (conversational, narrative, educational, etc.)
- Multi-speaker dialogue: Create natural conversations between multiple speakers, supporting up to 5,000 dialogue lines for podcast-style content
- High-quality output: All audio generated in MP3 format at 44.1kHz, 128kbps for professional-grade results
Common use cases:
- Transform research from
@deep-diveinto podcast episodes - Create audio versions of written content for accessibility
- Generate training materials and onboarding content in audio format
- Produce multilingual voice content for global teams
Updated about 12 hours ago
