Files
eve-alpha/docs/tts/TTS_QUALITY_CONTROLS.md
Aodhan Collins 66749a5ce7 Initial commit
2025-10-06 00:33:04 +01:00

241 lines
6.4 KiB
Markdown

# TTS Quality Controls
## Overview
EVE now includes comprehensive voice quality controls allowing you to customize speed, stability, and clarity of text-to-speech output.
## Controls Available
### 1. Speed Control (All Voices)
**Range**: 0.25x - 4.0x
**Default**: 1.0x (Normal)
**Applies to**: Both Browser TTS and ElevenLabs
- **0.25x - 0.75x**: Slower speech, good for learning or understanding complex content
- **1.0x**: Natural speaking pace
- **1.25x - 2.0x**: Faster speech, efficient for experienced listeners
- **2.0x - 4.0x**: Very fast, for quickly scanning content
### 2. Stability Control (ElevenLabs Only)
**Range**: 0% - 100%
**Default**: 50%
**Applies to**: ElevenLabs voices only
**What it does**:
- Controls consistency vs expressiveness of the voice
- Higher values = more consistent, predictable delivery
- Lower values = more varied, emotional, expressive
**When to adjust**:
- **High (70-100%)**: Audiobooks, technical content, professional narration
- **Medium (40-60%)**: General conversation, balanced approach
- **Low (0-30%)**: Character voices, dramatic readings, creative content
### 3. Clarity Control (ElevenLabs Only)
**Range**: 0% - 100%
**Default**: 75%
**Applies to**: ElevenLabs voices only
**What it does**:
- Controls similarity boost / voice clarity enhancement
- Higher values = closer to original voice, enhanced clarity
- Lower values = more variation, creative interpretation
**When to adjust**:
- **High (70-100%)**: Maximum clarity, important information, professional use
- **Medium (50-70%)**: Natural balance
- **Low (0-40%)**: More creative interpretation, character variation
## User Interface
### Location
Settings > Voice Settings > Voice Quality Settings
### Design
- **Speed**: Full-width slider with 0.25 step increments
- Shows current value in label (e.g., "Speed: 1.50x")
- Visual markers at 0.25x, 1.0x, and 4.0x
- **Stability**: Full-width slider with 5% step increments
- Shows percentage in label (e.g., "Stability: 50%")
- Disabled (grayed out) when using browser voices
- Helpful description below slider
- **Clarity**: Full-width slider with 5% step increments
- Shows percentage in label (e.g., "Clarity: 75%")
- Disabled (grayed out) when using browser voices
- Helpful description below slider
### Smart UI Features
- ElevenLabs-only controls show "(ElevenLabs only)" in label
- Controls are disabled when browser voice is selected
- Real-time value display as you drag sliders
- Settings persist across sessions
- All controls visible even when disabled for easy reference
## Technical Implementation
### Settings Store
```typescript
ttsSpeed: number // 0.25 to 4.0
ttsStability: number // 0.0 to 1.0
ttsSimilarityBoost: number // 0.0 to 1.0
```
### Usage in TTS
```typescript
await ttsManager.speak(text, {
voiceId: selectedVoice,
volume: 1.0,
rate: ttsSpeed, // Browser TTS rate
stability: ttsStability, // ElevenLabs stability
similarityBoost: ttsSimilarityBoost // ElevenLabs clarity
})
```
### Provider-Specific Application
**Browser TTS**:
- Uses `rate` parameter from speed control
- Ignores stability and similarity boost (not applicable)
**ElevenLabs TTS**:
- Applies all three parameters
- Speed can be adjusted post-processing if needed
- Stability and similarity boost sent directly to API
## Examples
### For Audiobooks
```
Speed: 1.0x - 1.25x (comfortable listening)
Stability: 80% (consistent narration)
Clarity: 85% (clear pronunciation)
```
### For Casual Chat
```
Speed: 1.0x (natural pace)
Stability: 50% (balanced)
Clarity: 75% (good clarity)
```
### For Quick Scanning
```
Speed: 2.0x - 3.0x (fast playback)
Stability: 60% (maintain clarity at speed)
Clarity: 90% (maximum clarity for comprehension)
```
### For Character Voices
```
Speed: 0.75x - 1.0x (theatrical pacing)
Stability: 20% (high expressiveness)
Clarity: 50% (allow variation)
```
## Benefits
**Personalization** - Adjust voice to your preferences
**Accessibility** - Slower speeds for comprehension
**Efficiency** - Faster speeds for quick consumption
**Quality Control** - Fine-tune ElevenLabs voice output
**Flexibility** - Different settings for different use cases
**Universal** - Speed works on all voices, premium controls for ElevenLabs
## Persistence
All settings are:
- ✅ Saved to localStorage
- ✅ Persist across app restarts
- ✅ Applied automatically to all future TTS playback
- ✅ Can be changed at any time
## Future Enhancements
### Potential Additions
- **Pitch control** for browser TTS
- **Volume control** per-voice
- **Per-voice presets** (save favorite settings for each voice)
- **Quick presets** (Audiobook, Podcast, Speed Reader, etc.)
- **Real-time adjustment** while audio is playing
- **A/B comparison** to test settings side-by-side
### Advanced Features
- **Voice EQ** for fine-tuning frequency response
- **Emotion control** for ElevenLabs (happy, sad, excited, etc.)
- **Speaking style** selection (narration, conversation, etc.)
- **Prosody controls** (emphasis, pauses, intonation)
## Troubleshooting
### Sliders Not Responsive
- Check that voice is enabled
- Verify a voice is selected
- Try refreshing the settings panel
### ElevenLabs Controls Disabled
- Make sure an ElevenLabs voice is selected (starts with "elevenlabs:")
- Browser voices won't enable these controls (by design)
- Check that ElevenLabs API key is configured
### Settings Not Saving
- Check browser localStorage permissions
- Try clearing cache and reloading
- Verify settings store is persisting
### Speed Not Applying
- Browser TTS: Rate should change immediately
- ElevenLabs: Speed adjustment may vary by voice
- Try values between 0.5x - 2.0x for best results
## Testing
### To Test Speed Control
1. Enable TTS
2. Adjust speed slider
3. Click speaker icon on a message
4. Voice should speak at selected speed
### To Test ElevenLabs Controls
1. Select an ElevenLabs voice
2. Adjust stability slider
3. Adjust clarity slider
4. Click speaker icon
5. Notice difference in voice quality
### To Test Persistence
1. Adjust all sliders
2. Close settings
3. Restart app
4. Open settings
5. Values should be preserved
## Recommended Settings
**Default (Balanced)**:
- Speed: 1.0x
- Stability: 50%
- Clarity: 75%
**Professional**:
- Speed: 1.0x
- Stability: 80%
- Clarity: 85%
**Expressive**:
- Speed: 1.0x
- Stability: 30%
- Clarity: 60%
**Fast Listener**:
- Speed: 1.75x
- Stability: 65%
- Clarity: 90%
---
**Status**: ✅ Complete
**Version**: v0.2.0-rc
**Date**: October 5, 2025