Enhanced Speech Capabilities: Upgraded Audio Quality and 21 Languages Now Supported
📌 Context
We've upgraded our ElevenLabs integration to provide you with significantly better speech-to-text (transcription) and text-to-speech (audio generation) capabilities. This upgrade brings more natural-sounding voices, improved transcription accuracy, and expanded language support.
🔄 Impact on Dust
We've made several improvements to how agents handle audio:
-
Transcription engine: Upgraded to the latest model with better accuracy and additional metadata (like speaker identification and timestamps)
-
Voice generation: Switched to a more expressive audio engine that supports advanced audio formatting
-
Language coverage: Expanded from 9 to 21 supported languages
-
Voice selection: Improved the logic for matching voices to languages and contexts, so you get more appropriate and natural-sounding voices by default
👤 Impact for you
You'll notice immediate improvements:
-
Better transcription quality: More accurate speech-to-text across all supported languages, with richer context
-
More natural voices: Audio generated by your agents will sound more expressive and human-like
-
Broader language access: You can now use speech features in 21 languages instead of 9
-
Better default voices: The system will automatically select more appropriate voices based on your language and use case, reducing instances of unexpected or mismatched voices
These improvements apply automatically to all agents using speech capabilities.
✅ Actions required
No action required on your part. All speech-enabled agents will automatically benefit from these improvements. The upgrade is transparent and backward-compatible with your existing agent configurations.
If you'd like to explore the newly available languages or test the improved voice quality, simply use speech features in your agents as you normally would.
