Voice and sound generation

Speech Generation

NameWhat the agent will doHow to configure it when building an agent
Speech GeneratorConvert written text into natural-sounding audio with configurable voices, languages, and speaking styles. Generate single-voice narrations or multi-speaker dialogues for podcasts, training content, and audio communications.No configuration required; simply enable this capability

Key capabilities:

  • Single voice audio: Generate up to 10,000 characters of speech with customizable gender, language (9 languages supported), and use case (conversational, narrative, educational, etc.)
  • Multi-speaker dialogue: Create natural conversations between multiple speakers, supporting up to 5,000 dialogue lines for podcast-style content
  • High-quality output: All audio generated in MP3 format at 44.1kHz, 128kbps for professional-grade results

Common use cases:

  • Transform research from @deep-dive into podcast episodes
  • Create audio versions of written content for accessibility
  • Generate training materials and onboarding content in audio format
  • Produce multilingual voice content for global teams