Initial commit
This commit is contained in:
258
docs/integrations/elevenlabs/ELEVENLABS_MODELS.md
Normal file
258
docs/integrations/elevenlabs/ELEVENLABS_MODELS.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# ElevenLabs TTS Models
|
||||
|
||||
## Overview
|
||||
|
||||
EVE uses **ElevenLabs Turbo v2.5** by default for text-to-speech. This model is specifically optimized for real-time conversational AI.
|
||||
|
||||
## ⚠️ Important: V3 Alpha Not Recommended for EVE
|
||||
|
||||
According to [ElevenLabs documentation](https://elevenlabs.io/docs/models#eleven-v3-alpha):
|
||||
|
||||
> **"Eleven v3 is not made for real-time applications like Agents Platform."**
|
||||
|
||||
While V3 offers the highest quality, it is:
|
||||
- ❌ **Not optimized for real-time** conversation
|
||||
- ❌ **Higher latency** - Slower response times
|
||||
- ❌ **Requires multiple generations** - Need to generate several versions and pick the best
|
||||
- ✅ **Best for**: Audiobooks, character discussions, pre-recorded content
|
||||
|
||||
## Current Default Model
|
||||
|
||||
**Default**: `eleven_turbo_v2_5`
|
||||
|
||||
This model is optimized for EVE and provides:
|
||||
- ✅ Fast generation speed
|
||||
- ✅ High-quality natural voices
|
||||
- ✅ Low latency for real-time conversation (~100-300ms)
|
||||
- ✅ Cost-effective
|
||||
- ✅ Multilingual support
|
||||
- ✅ **Recommended by ElevenLabs for conversational AI**
|
||||
|
||||
## Available Models
|
||||
|
||||
ElevenLabs offers several models you can use:
|
||||
|
||||
### Turbo Models (Recommended)
|
||||
|
||||
**`eleven_turbo_v2_5`** (Current Default)
|
||||
- Latest turbo model
|
||||
- Excellent quality with fast generation
|
||||
- Best for conversational AI
|
||||
- Low latency
|
||||
|
||||
**`eleven_turbo_v2`**
|
||||
- Previous turbo version
|
||||
- Still high quality
|
||||
- Slightly older technology
|
||||
|
||||
### Multilingual Models
|
||||
|
||||
**`eleven_multilingual_v2`**
|
||||
- Supports 29+ languages
|
||||
- High quality across languages
|
||||
- Slower than turbo but more versatile
|
||||
|
||||
**`eleven_multilingual_v1`**
|
||||
- Original multilingual model
|
||||
- Stable and reliable
|
||||
- Good for non-English content
|
||||
|
||||
### Monolingual Models
|
||||
|
||||
**`eleven_monolingual_v1`**
|
||||
- English only
|
||||
- High quality
|
||||
- Original ElevenLabs model
|
||||
- More expensive than turbo
|
||||
|
||||
### Flash Models
|
||||
|
||||
**`eleven_flash_v2_5`**
|
||||
- Ultra-fast generation
|
||||
- Lowest latency
|
||||
- Good quality
|
||||
- Best for real-time applications
|
||||
|
||||
**`eleven_flash_v2`**
|
||||
- Previous flash version
|
||||
- Very fast
|
||||
- Lower cost
|
||||
|
||||
## Changing the Model
|
||||
|
||||
The model is configurable in the settings store:
|
||||
|
||||
```typescript
|
||||
// In settingsStore.ts
|
||||
ttsModel: 'eleven_turbo_v2_5' // Default
|
||||
```
|
||||
|
||||
To change:
|
||||
```typescript
|
||||
setTtsModel('eleven_flash_v2_5') // For lower latency
|
||||
setTtsModel('eleven_multilingual_v2') // For better multilingual support
|
||||
```
|
||||
|
||||
## Model Characteristics
|
||||
|
||||
### Speed Comparison
|
||||
1. **Flash** - Fastest (< 300ms)
|
||||
2. **Turbo** - Very Fast (< 500ms)
|
||||
3. **Multilingual** - Fast (< 1s)
|
||||
4. **Monolingual** - Standard (1-2s)
|
||||
|
||||
### Quality Comparison
|
||||
1. **Monolingual** - Highest quality
|
||||
2. **Turbo v2.5** - Excellent quality
|
||||
3. **Multilingual v2** - Great quality
|
||||
4. **Flash** - Good quality
|
||||
|
||||
### Cost Comparison
|
||||
1. **Flash** - Most economical
|
||||
2. **Turbo** - Cost-effective
|
||||
3. **Multilingual** - Standard pricing
|
||||
4. **Monolingual** - Premium pricing
|
||||
|
||||
## Recommended Use Cases
|
||||
|
||||
### Real-Time Conversation (Default)
|
||||
```
|
||||
Model: eleven_turbo_v2_5
|
||||
Speed: 1.0x
|
||||
Stability: 50%
|
||||
Clarity: 75%
|
||||
```
|
||||
Best balance for EVE assistant
|
||||
|
||||
### Ultra-Low Latency
|
||||
```
|
||||
Model: eleven_flash_v2_5
|
||||
Speed: 1.0x
|
||||
Stability: 60%
|
||||
Clarity: 80%
|
||||
```
|
||||
For instant responses
|
||||
|
||||
### Maximum Quality
|
||||
```
|
||||
Model: eleven_monolingual_v1
|
||||
Speed: 1.0x
|
||||
Stability: 70%
|
||||
Clarity: 85%
|
||||
```
|
||||
For professional content
|
||||
|
||||
### Multilingual
|
||||
```
|
||||
Model: eleven_multilingual_v2
|
||||
Speed: 1.0x
|
||||
Stability: 55%
|
||||
Clarity: 75%
|
||||
```
|
||||
For non-English languages
|
||||
|
||||
## Technical Details
|
||||
|
||||
### API Call
|
||||
```typescript
|
||||
await client.textToSpeech.convert(voiceId, {
|
||||
text: "Hello, how can I help you?",
|
||||
model_id: "eleven_turbo_v2_5",
|
||||
voice_settings: {
|
||||
stability: 0.5,
|
||||
similarity_boost: 0.75,
|
||||
style: 0.0,
|
||||
use_speaker_boost: true
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Model Selection Flow
|
||||
1. User sends message
|
||||
2. EVE responds
|
||||
3. User clicks 🔊 speaker icon
|
||||
4. TTSControls reads `ttsModel` from settings
|
||||
5. Passes to TTS Manager
|
||||
6. TTS Manager calls ElevenLabs with model ID
|
||||
7. Audio generated and played
|
||||
|
||||
### Fallback Behavior
|
||||
If ElevenLabs model fails or is unavailable:
|
||||
- Falls back to Browser Web Speech API
|
||||
- Logs warning in console
|
||||
- Continues with free browser TTS
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
- **Model selector in UI** - Dropdown to choose model in Settings
|
||||
- **Auto-detect best model** - Based on language and use case
|
||||
- **Model presets** - Quick selection for different scenarios
|
||||
- **Cost tracking** - Show estimated cost per request
|
||||
- **Quality metrics** - User feedback on voice quality
|
||||
|
||||
### Potential Models
|
||||
As ElevenLabs releases new models, EVE can be updated:
|
||||
- `eleven_turbo_v3` - Next generation turbo
|
||||
- `eleven_flash_v3` - Even faster flash model
|
||||
- `eleven_multilingual_v3` - Improved multilingual
|
||||
- Specialized models for specific use cases
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Audio Not Playing
|
||||
- Check that ElevenLabs API key is valid
|
||||
- Verify model ID is correct
|
||||
- Check console for error messages
|
||||
- Try switching to `eleven_turbo_v2` if v2.5 fails
|
||||
|
||||
### Poor Quality
|
||||
- Try `eleven_monolingual_v1` for better quality
|
||||
- Adjust stability and clarity settings
|
||||
- Check voice selection
|
||||
- Ensure text is well-formatted
|
||||
|
||||
### Slow Generation
|
||||
- Switch to `eleven_flash_v2_5` for speed
|
||||
- Reduce text length
|
||||
- Check network connection
|
||||
- Verify API quota not exceeded
|
||||
|
||||
### Model Not Found Error
|
||||
```
|
||||
Error: Model 'eleven_turbo_v3' not found
|
||||
```
|
||||
- Model ID may be incorrect
|
||||
- Model might not be available on your plan
|
||||
- Fall back to `eleven_turbo_v2_5`
|
||||
- Check ElevenLabs documentation
|
||||
|
||||
## Model Changelog
|
||||
|
||||
### v2.5 Models (Current)
|
||||
- Released: 2024
|
||||
- Improvements: Better quality, faster generation
|
||||
- Models: `eleven_turbo_v2_5`, `eleven_flash_v2_5`
|
||||
|
||||
### v2 Models
|
||||
- Released: 2023
|
||||
- Improvements: Multilingual support, reduced latency
|
||||
- Models: `eleven_turbo_v2`, `eleven_flash_v2`, `eleven_multilingual_v2`
|
||||
|
||||
### v1 Models (Legacy)
|
||||
- Released: 2022-2023
|
||||
- Original high-quality models
|
||||
- Models: `eleven_monolingual_v1`, `eleven_multilingual_v1`
|
||||
|
||||
## References
|
||||
|
||||
- [ElevenLabs Models Documentation](https://elevenlabs.io/docs/api-reference/text-to-speech)
|
||||
- [Model Comparison Guide](https://elevenlabs.io/docs/models)
|
||||
- [Pricing Information](https://elevenlabs.io/pricing)
|
||||
|
||||
---
|
||||
|
||||
**Current Default**: `eleven_turbo_v2_5`
|
||||
**Status**: ✅ Configured
|
||||
**Version**: v0.2.0-rc
|
||||
**Date**: October 5, 2025
|
||||
Reference in New Issue
Block a user