Enhanced Speech Capabilities: Upgraded Audio Quality and 21 Languages Now Supported

📌 Context

We've upgraded our ElevenLabs integration to provide you with significantly better speech-to-text (transcription) and text-to-speech (audio generation) capabilities. This upgrade brings more natural-sounding voices, improved transcription accuracy, and expanded language support.

🔄 Impact on Dust

We've made several improvements to how agents handle audio:

Transcription engine: Upgraded to the latest model with better accuracy and additional metadata (like speaker identification and timestamps)
Voice generation: Switched to a more expressive audio engine that supports advanced audio formatting
Language coverage: Expanded from 9 to 21 supported languages
Voice selection: Improved the logic for matching voices to languages and contexts, so you get more appropriate and natural-sounding voices by default

👤 Impact for you

You'll notice immediate improvements:

Better transcription quality: More accurate speech-to-text across all supported languages, with richer context
More natural voices: Audio generated by your agents will sound more expressive and human-like
Broader language access: You can now use speech features in 21 languages instead of 9
Better default voices: The system will automatically select more appropriate voices based on your language and use case, reducing instances of unexpected or mismatched voices

These improvements apply automatically to all agents using speech capabilities.

✅ Actions required

No action required on your part. All speech-enabled agents will automatically benefit from these improvements. The upgrade is transparent and backward-compatible with your existing agent configurations.

If you'd like to explore the newly available languages or test the improved voice quality, simply use speech features in your agents as you normally would.