259 lines
6.1 KiB
Markdown
259 lines
6.1 KiB
Markdown
# ElevenLabs TTS Models
|
|
|
|
## Overview
|
|
|
|
EVE uses **ElevenLabs Turbo v2.5** by default for text-to-speech. This model is specifically optimized for real-time conversational AI.
|
|
|
|
## ⚠️ Important: V3 Alpha Not Recommended for EVE
|
|
|
|
According to [ElevenLabs documentation](https://elevenlabs.io/docs/models#eleven-v3-alpha):
|
|
|
|
> **"Eleven v3 is not made for real-time applications like Agents Platform."**
|
|
|
|
While V3 offers the highest quality, it is:
|
|
- ❌ **Not optimized for real-time** conversation
|
|
- ❌ **Higher latency** - Slower response times
|
|
- ❌ **Requires multiple generations** - Need to generate several versions and pick the best
|
|
- ✅ **Best for**: Audiobooks, character discussions, pre-recorded content
|
|
|
|
## Current Default Model
|
|
|
|
**Default**: `eleven_turbo_v2_5`
|
|
|
|
This model is optimized for EVE and provides:
|
|
- ✅ Fast generation speed
|
|
- ✅ High-quality natural voices
|
|
- ✅ Low latency for real-time conversation (~100-300ms)
|
|
- ✅ Cost-effective
|
|
- ✅ Multilingual support
|
|
- ✅ **Recommended by ElevenLabs for conversational AI**
|
|
|
|
## Available Models
|
|
|
|
ElevenLabs offers several models you can use:
|
|
|
|
### Turbo Models (Recommended)
|
|
|
|
**`eleven_turbo_v2_5`** (Current Default)
|
|
- Latest turbo model
|
|
- Excellent quality with fast generation
|
|
- Best for conversational AI
|
|
- Low latency
|
|
|
|
**`eleven_turbo_v2`**
|
|
- Previous turbo version
|
|
- Still high quality
|
|
- Slightly older technology
|
|
|
|
### Multilingual Models
|
|
|
|
**`eleven_multilingual_v2`**
|
|
- Supports 29+ languages
|
|
- High quality across languages
|
|
- Slower than turbo but more versatile
|
|
|
|
**`eleven_multilingual_v1`**
|
|
- Original multilingual model
|
|
- Stable and reliable
|
|
- Good for non-English content
|
|
|
|
### Monolingual Models
|
|
|
|
**`eleven_monolingual_v1`**
|
|
- English only
|
|
- High quality
|
|
- Original ElevenLabs model
|
|
- More expensive than turbo
|
|
|
|
### Flash Models
|
|
|
|
**`eleven_flash_v2_5`**
|
|
- Ultra-fast generation
|
|
- Lowest latency
|
|
- Good quality
|
|
- Best for real-time applications
|
|
|
|
**`eleven_flash_v2`**
|
|
- Previous flash version
|
|
- Very fast
|
|
- Lower cost
|
|
|
|
## Changing the Model
|
|
|
|
The model is configurable in the settings store:
|
|
|
|
```typescript
|
|
// In settingsStore.ts
|
|
ttsModel: 'eleven_turbo_v2_5' // Default
|
|
```
|
|
|
|
To change:
|
|
```typescript
|
|
setTtsModel('eleven_flash_v2_5') // For lower latency
|
|
setTtsModel('eleven_multilingual_v2') // For better multilingual support
|
|
```
|
|
|
|
## Model Characteristics
|
|
|
|
### Speed Comparison
|
|
1. **Flash** - Fastest (< 300ms)
|
|
2. **Turbo** - Very Fast (< 500ms)
|
|
3. **Multilingual** - Fast (< 1s)
|
|
4. **Monolingual** - Standard (1-2s)
|
|
|
|
### Quality Comparison
|
|
1. **Monolingual** - Highest quality
|
|
2. **Turbo v2.5** - Excellent quality
|
|
3. **Multilingual v2** - Great quality
|
|
4. **Flash** - Good quality
|
|
|
|
### Cost Comparison
|
|
1. **Flash** - Most economical
|
|
2. **Turbo** - Cost-effective
|
|
3. **Multilingual** - Standard pricing
|
|
4. **Monolingual** - Premium pricing
|
|
|
|
## Recommended Use Cases
|
|
|
|
### Real-Time Conversation (Default)
|
|
```
|
|
Model: eleven_turbo_v2_5
|
|
Speed: 1.0x
|
|
Stability: 50%
|
|
Clarity: 75%
|
|
```
|
|
Best balance for EVE assistant
|
|
|
|
### Ultra-Low Latency
|
|
```
|
|
Model: eleven_flash_v2_5
|
|
Speed: 1.0x
|
|
Stability: 60%
|
|
Clarity: 80%
|
|
```
|
|
For instant responses
|
|
|
|
### Maximum Quality
|
|
```
|
|
Model: eleven_monolingual_v1
|
|
Speed: 1.0x
|
|
Stability: 70%
|
|
Clarity: 85%
|
|
```
|
|
For professional content
|
|
|
|
### Multilingual
|
|
```
|
|
Model: eleven_multilingual_v2
|
|
Speed: 1.0x
|
|
Stability: 55%
|
|
Clarity: 75%
|
|
```
|
|
For non-English languages
|
|
|
|
## Technical Details
|
|
|
|
### API Call
|
|
```typescript
|
|
await client.textToSpeech.convert(voiceId, {
|
|
text: "Hello, how can I help you?",
|
|
model_id: "eleven_turbo_v2_5",
|
|
voice_settings: {
|
|
stability: 0.5,
|
|
similarity_boost: 0.75,
|
|
style: 0.0,
|
|
use_speaker_boost: true
|
|
}
|
|
})
|
|
```
|
|
|
|
### Model Selection Flow
|
|
1. User sends message
|
|
2. EVE responds
|
|
3. User clicks 🔊 speaker icon
|
|
4. TTSControls reads `ttsModel` from settings
|
|
5. Passes to TTS Manager
|
|
6. TTS Manager calls ElevenLabs with model ID
|
|
7. Audio generated and played
|
|
|
|
### Fallback Behavior
|
|
If ElevenLabs model fails or is unavailable:
|
|
- Falls back to Browser Web Speech API
|
|
- Logs warning in console
|
|
- Continues with free browser TTS
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Features
|
|
- **Model selector in UI** - Dropdown to choose model in Settings
|
|
- **Auto-detect best model** - Based on language and use case
|
|
- **Model presets** - Quick selection for different scenarios
|
|
- **Cost tracking** - Show estimated cost per request
|
|
- **Quality metrics** - User feedback on voice quality
|
|
|
|
### Potential Models
|
|
As ElevenLabs releases new models, EVE can be updated:
|
|
- `eleven_turbo_v3` - Next generation turbo
|
|
- `eleven_flash_v3` - Even faster flash model
|
|
- `eleven_multilingual_v3` - Improved multilingual
|
|
- Specialized models for specific use cases
|
|
|
|
## Troubleshooting
|
|
|
|
### Audio Not Playing
|
|
- Check that ElevenLabs API key is valid
|
|
- Verify model ID is correct
|
|
- Check console for error messages
|
|
- Try switching to `eleven_turbo_v2` if v2.5 fails
|
|
|
|
### Poor Quality
|
|
- Try `eleven_monolingual_v1` for better quality
|
|
- Adjust stability and clarity settings
|
|
- Check voice selection
|
|
- Ensure text is well-formatted
|
|
|
|
### Slow Generation
|
|
- Switch to `eleven_flash_v2_5` for speed
|
|
- Reduce text length
|
|
- Check network connection
|
|
- Verify API quota not exceeded
|
|
|
|
### Model Not Found Error
|
|
```
|
|
Error: Model 'eleven_turbo_v3' not found
|
|
```
|
|
- Model ID may be incorrect
|
|
- Model might not be available on your plan
|
|
- Fall back to `eleven_turbo_v2_5`
|
|
- Check ElevenLabs documentation
|
|
|
|
## Model Changelog
|
|
|
|
### v2.5 Models (Current)
|
|
- Released: 2024
|
|
- Improvements: Better quality, faster generation
|
|
- Models: `eleven_turbo_v2_5`, `eleven_flash_v2_5`
|
|
|
|
### v2 Models
|
|
- Released: 2023
|
|
- Improvements: Multilingual support, reduced latency
|
|
- Models: `eleven_turbo_v2`, `eleven_flash_v2`, `eleven_multilingual_v2`
|
|
|
|
### v1 Models (Legacy)
|
|
- Released: 2022-2023
|
|
- Original high-quality models
|
|
- Models: `eleven_monolingual_v1`, `eleven_multilingual_v1`
|
|
|
|
## References
|
|
|
|
- [ElevenLabs Models Documentation](https://elevenlabs.io/docs/api-reference/text-to-speech)
|
|
- [Model Comparison Guide](https://elevenlabs.io/docs/models)
|
|
- [Pricing Information](https://elevenlabs.io/pricing)
|
|
|
|
---
|
|
|
|
**Current Default**: `eleven_turbo_v2_5`
|
|
**Status**: ✅ Configured
|
|
**Version**: v0.2.0-rc
|
|
**Date**: October 5, 2025
|