241 lines
6.4 KiB
Markdown
241 lines
6.4 KiB
Markdown
# TTS Quality Controls
|
|
|
|
## Overview
|
|
|
|
EVE now includes comprehensive voice quality controls allowing you to customize speed, stability, and clarity of text-to-speech output.
|
|
|
|
## Controls Available
|
|
|
|
### 1. Speed Control (All Voices)
|
|
**Range**: 0.25x - 4.0x
|
|
**Default**: 1.0x (Normal)
|
|
**Applies to**: Both Browser TTS and ElevenLabs
|
|
|
|
- **0.25x - 0.75x**: Slower speech, good for learning or understanding complex content
|
|
- **1.0x**: Natural speaking pace
|
|
- **1.25x - 2.0x**: Faster speech, efficient for experienced listeners
|
|
- **2.0x - 4.0x**: Very fast, for quickly scanning content
|
|
|
|
### 2. Stability Control (ElevenLabs Only)
|
|
**Range**: 0% - 100%
|
|
**Default**: 50%
|
|
**Applies to**: ElevenLabs voices only
|
|
|
|
**What it does**:
|
|
- Controls consistency vs expressiveness of the voice
|
|
- Higher values = more consistent, predictable delivery
|
|
- Lower values = more varied, emotional, expressive
|
|
|
|
**When to adjust**:
|
|
- **High (70-100%)**: Audiobooks, technical content, professional narration
|
|
- **Medium (40-60%)**: General conversation, balanced approach
|
|
- **Low (0-30%)**: Character voices, dramatic readings, creative content
|
|
|
|
### 3. Clarity Control (ElevenLabs Only)
|
|
**Range**: 0% - 100%
|
|
**Default**: 75%
|
|
**Applies to**: ElevenLabs voices only
|
|
|
|
**What it does**:
|
|
- Controls similarity boost / voice clarity enhancement
|
|
- Higher values = closer to original voice, enhanced clarity
|
|
- Lower values = more variation, creative interpretation
|
|
|
|
**When to adjust**:
|
|
- **High (70-100%)**: Maximum clarity, important information, professional use
|
|
- **Medium (50-70%)**: Natural balance
|
|
- **Low (0-40%)**: More creative interpretation, character variation
|
|
|
|
## User Interface
|
|
|
|
### Location
|
|
Settings > Voice Settings > Voice Quality Settings
|
|
|
|
### Design
|
|
- **Speed**: Full-width slider with 0.25 step increments
|
|
- Shows current value in label (e.g., "Speed: 1.50x")
|
|
- Visual markers at 0.25x, 1.0x, and 4.0x
|
|
|
|
- **Stability**: Full-width slider with 5% step increments
|
|
- Shows percentage in label (e.g., "Stability: 50%")
|
|
- Disabled (grayed out) when using browser voices
|
|
- Helpful description below slider
|
|
|
|
- **Clarity**: Full-width slider with 5% step increments
|
|
- Shows percentage in label (e.g., "Clarity: 75%")
|
|
- Disabled (grayed out) when using browser voices
|
|
- Helpful description below slider
|
|
|
|
### Smart UI Features
|
|
- ElevenLabs-only controls show "(ElevenLabs only)" in label
|
|
- Controls are disabled when browser voice is selected
|
|
- Real-time value display as you drag sliders
|
|
- Settings persist across sessions
|
|
- All controls visible even when disabled for easy reference
|
|
|
|
## Technical Implementation
|
|
|
|
### Settings Store
|
|
```typescript
|
|
ttsSpeed: number // 0.25 to 4.0
|
|
ttsStability: number // 0.0 to 1.0
|
|
ttsSimilarityBoost: number // 0.0 to 1.0
|
|
```
|
|
|
|
### Usage in TTS
|
|
```typescript
|
|
await ttsManager.speak(text, {
|
|
voiceId: selectedVoice,
|
|
volume: 1.0,
|
|
rate: ttsSpeed, // Browser TTS rate
|
|
stability: ttsStability, // ElevenLabs stability
|
|
similarityBoost: ttsSimilarityBoost // ElevenLabs clarity
|
|
})
|
|
```
|
|
|
|
### Provider-Specific Application
|
|
|
|
**Browser TTS**:
|
|
- Uses `rate` parameter from speed control
|
|
- Ignores stability and similarity boost (not applicable)
|
|
|
|
**ElevenLabs TTS**:
|
|
- Applies all three parameters
|
|
- Speed can be adjusted post-processing if needed
|
|
- Stability and similarity boost sent directly to API
|
|
|
|
## Examples
|
|
|
|
### For Audiobooks
|
|
```
|
|
Speed: 1.0x - 1.25x (comfortable listening)
|
|
Stability: 80% (consistent narration)
|
|
Clarity: 85% (clear pronunciation)
|
|
```
|
|
|
|
### For Casual Chat
|
|
```
|
|
Speed: 1.0x (natural pace)
|
|
Stability: 50% (balanced)
|
|
Clarity: 75% (good clarity)
|
|
```
|
|
|
|
### For Quick Scanning
|
|
```
|
|
Speed: 2.0x - 3.0x (fast playback)
|
|
Stability: 60% (maintain clarity at speed)
|
|
Clarity: 90% (maximum clarity for comprehension)
|
|
```
|
|
|
|
### For Character Voices
|
|
```
|
|
Speed: 0.75x - 1.0x (theatrical pacing)
|
|
Stability: 20% (high expressiveness)
|
|
Clarity: 50% (allow variation)
|
|
```
|
|
|
|
## Benefits
|
|
|
|
✅ **Personalization** - Adjust voice to your preferences
|
|
✅ **Accessibility** - Slower speeds for comprehension
|
|
✅ **Efficiency** - Faster speeds for quick consumption
|
|
✅ **Quality Control** - Fine-tune ElevenLabs voice output
|
|
✅ **Flexibility** - Different settings for different use cases
|
|
✅ **Universal** - Speed works on all voices, premium controls for ElevenLabs
|
|
|
|
## Persistence
|
|
|
|
All settings are:
|
|
- ✅ Saved to localStorage
|
|
- ✅ Persist across app restarts
|
|
- ✅ Applied automatically to all future TTS playback
|
|
- ✅ Can be changed at any time
|
|
|
|
## Future Enhancements
|
|
|
|
### Potential Additions
|
|
- **Pitch control** for browser TTS
|
|
- **Volume control** per-voice
|
|
- **Per-voice presets** (save favorite settings for each voice)
|
|
- **Quick presets** (Audiobook, Podcast, Speed Reader, etc.)
|
|
- **Real-time adjustment** while audio is playing
|
|
- **A/B comparison** to test settings side-by-side
|
|
|
|
### Advanced Features
|
|
- **Voice EQ** for fine-tuning frequency response
|
|
- **Emotion control** for ElevenLabs (happy, sad, excited, etc.)
|
|
- **Speaking style** selection (narration, conversation, etc.)
|
|
- **Prosody controls** (emphasis, pauses, intonation)
|
|
|
|
## Troubleshooting
|
|
|
|
### Sliders Not Responsive
|
|
- Check that voice is enabled
|
|
- Verify a voice is selected
|
|
- Try refreshing the settings panel
|
|
|
|
### ElevenLabs Controls Disabled
|
|
- Make sure an ElevenLabs voice is selected (starts with "elevenlabs:")
|
|
- Browser voices won't enable these controls (by design)
|
|
- Check that ElevenLabs API key is configured
|
|
|
|
### Settings Not Saving
|
|
- Check browser localStorage permissions
|
|
- Try clearing cache and reloading
|
|
- Verify settings store is persisting
|
|
|
|
### Speed Not Applying
|
|
- Browser TTS: Rate should change immediately
|
|
- ElevenLabs: Speed adjustment may vary by voice
|
|
- Try values between 0.5x - 2.0x for best results
|
|
|
|
## Testing
|
|
|
|
### To Test Speed Control
|
|
1. Enable TTS
|
|
2. Adjust speed slider
|
|
3. Click speaker icon on a message
|
|
4. Voice should speak at selected speed
|
|
|
|
### To Test ElevenLabs Controls
|
|
1. Select an ElevenLabs voice
|
|
2. Adjust stability slider
|
|
3. Adjust clarity slider
|
|
4. Click speaker icon
|
|
5. Notice difference in voice quality
|
|
|
|
### To Test Persistence
|
|
1. Adjust all sliders
|
|
2. Close settings
|
|
3. Restart app
|
|
4. Open settings
|
|
5. Values should be preserved
|
|
|
|
## Recommended Settings
|
|
|
|
**Default (Balanced)**:
|
|
- Speed: 1.0x
|
|
- Stability: 50%
|
|
- Clarity: 75%
|
|
|
|
**Professional**:
|
|
- Speed: 1.0x
|
|
- Stability: 80%
|
|
- Clarity: 85%
|
|
|
|
**Expressive**:
|
|
- Speed: 1.0x
|
|
- Stability: 30%
|
|
- Clarity: 60%
|
|
|
|
**Fast Listener**:
|
|
- Speed: 1.75x
|
|
- Stability: 65%
|
|
- Clarity: 90%
|
|
|
|
---
|
|
|
|
**Status**: ✅ Complete
|
|
**Version**: v0.2.0-rc
|
|
**Date**: October 5, 2025
|