Files
eve-alpha/docs/tts/TTS_QUALITY_CONTROLS.md
Aodhan Collins 66749a5ce7 Initial commit
2025-10-06 00:33:04 +01:00

6.4 KiB

TTS Quality Controls

Overview

EVE now includes comprehensive voice quality controls allowing you to customize speed, stability, and clarity of text-to-speech output.

Controls Available

1. Speed Control (All Voices)

Range: 0.25x - 4.0x
Default: 1.0x (Normal)
Applies to: Both Browser TTS and ElevenLabs

  • 0.25x - 0.75x: Slower speech, good for learning or understanding complex content
  • 1.0x: Natural speaking pace
  • 1.25x - 2.0x: Faster speech, efficient for experienced listeners
  • 2.0x - 4.0x: Very fast, for quickly scanning content

2. Stability Control (ElevenLabs Only)

Range: 0% - 100%
Default: 50%
Applies to: ElevenLabs voices only

What it does:

  • Controls consistency vs expressiveness of the voice
  • Higher values = more consistent, predictable delivery
  • Lower values = more varied, emotional, expressive

When to adjust:

  • High (70-100%): Audiobooks, technical content, professional narration
  • Medium (40-60%): General conversation, balanced approach
  • Low (0-30%): Character voices, dramatic readings, creative content

3. Clarity Control (ElevenLabs Only)

Range: 0% - 100%
Default: 75%
Applies to: ElevenLabs voices only

What it does:

  • Controls similarity boost / voice clarity enhancement
  • Higher values = closer to original voice, enhanced clarity
  • Lower values = more variation, creative interpretation

When to adjust:

  • High (70-100%): Maximum clarity, important information, professional use
  • Medium (50-70%): Natural balance
  • Low (0-40%): More creative interpretation, character variation

User Interface

Location

Settings > Voice Settings > Voice Quality Settings

Design

  • Speed: Full-width slider with 0.25 step increments

    • Shows current value in label (e.g., "Speed: 1.50x")
    • Visual markers at 0.25x, 1.0x, and 4.0x
  • Stability: Full-width slider with 5% step increments

    • Shows percentage in label (e.g., "Stability: 50%")
    • Disabled (grayed out) when using browser voices
    • Helpful description below slider
  • Clarity: Full-width slider with 5% step increments

    • Shows percentage in label (e.g., "Clarity: 75%")
    • Disabled (grayed out) when using browser voices
    • Helpful description below slider

Smart UI Features

  • ElevenLabs-only controls show "(ElevenLabs only)" in label
  • Controls are disabled when browser voice is selected
  • Real-time value display as you drag sliders
  • Settings persist across sessions
  • All controls visible even when disabled for easy reference

Technical Implementation

Settings Store

ttsSpeed: number // 0.25 to 4.0
ttsStability: number // 0.0 to 1.0
ttsSimilarityBoost: number // 0.0 to 1.0

Usage in TTS

await ttsManager.speak(text, {
  voiceId: selectedVoice,
  volume: 1.0,
  rate: ttsSpeed,         // Browser TTS rate
  stability: ttsStability, // ElevenLabs stability
  similarityBoost: ttsSimilarityBoost // ElevenLabs clarity
})

Provider-Specific Application

Browser TTS:

  • Uses rate parameter from speed control
  • Ignores stability and similarity boost (not applicable)

ElevenLabs TTS:

  • Applies all three parameters
  • Speed can be adjusted post-processing if needed
  • Stability and similarity boost sent directly to API

Examples

For Audiobooks

Speed: 1.0x - 1.25x (comfortable listening)
Stability: 80% (consistent narration)
Clarity: 85% (clear pronunciation)

For Casual Chat

Speed: 1.0x (natural pace)
Stability: 50% (balanced)
Clarity: 75% (good clarity)

For Quick Scanning

Speed: 2.0x - 3.0x (fast playback)
Stability: 60% (maintain clarity at speed)
Clarity: 90% (maximum clarity for comprehension)

For Character Voices

Speed: 0.75x - 1.0x (theatrical pacing)
Stability: 20% (high expressiveness)
Clarity: 50% (allow variation)

Benefits

Personalization - Adjust voice to your preferences
Accessibility - Slower speeds for comprehension
Efficiency - Faster speeds for quick consumption
Quality Control - Fine-tune ElevenLabs voice output
Flexibility - Different settings for different use cases
Universal - Speed works on all voices, premium controls for ElevenLabs

Persistence

All settings are:

  • Saved to localStorage
  • Persist across app restarts
  • Applied automatically to all future TTS playback
  • Can be changed at any time

Future Enhancements

Potential Additions

  • Pitch control for browser TTS
  • Volume control per-voice
  • Per-voice presets (save favorite settings for each voice)
  • Quick presets (Audiobook, Podcast, Speed Reader, etc.)
  • Real-time adjustment while audio is playing
  • A/B comparison to test settings side-by-side

Advanced Features

  • Voice EQ for fine-tuning frequency response
  • Emotion control for ElevenLabs (happy, sad, excited, etc.)
  • Speaking style selection (narration, conversation, etc.)
  • Prosody controls (emphasis, pauses, intonation)

Troubleshooting

Sliders Not Responsive

  • Check that voice is enabled
  • Verify a voice is selected
  • Try refreshing the settings panel

ElevenLabs Controls Disabled

  • Make sure an ElevenLabs voice is selected (starts with "elevenlabs:")
  • Browser voices won't enable these controls (by design)
  • Check that ElevenLabs API key is configured

Settings Not Saving

  • Check browser localStorage permissions
  • Try clearing cache and reloading
  • Verify settings store is persisting

Speed Not Applying

  • Browser TTS: Rate should change immediately
  • ElevenLabs: Speed adjustment may vary by voice
  • Try values between 0.5x - 2.0x for best results

Testing

To Test Speed Control

  1. Enable TTS
  2. Adjust speed slider
  3. Click speaker icon on a message
  4. Voice should speak at selected speed

To Test ElevenLabs Controls

  1. Select an ElevenLabs voice
  2. Adjust stability slider
  3. Adjust clarity slider
  4. Click speaker icon
  5. Notice difference in voice quality

To Test Persistence

  1. Adjust all sliders
  2. Close settings
  3. Restart app
  4. Open settings
  5. Values should be preserved

Default (Balanced):

  • Speed: 1.0x
  • Stability: 50%
  • Clarity: 75%

Professional:

  • Speed: 1.0x
  • Stability: 80%
  • Clarity: 85%

Expressive:

  • Speed: 1.0x
  • Stability: 30%
  • Clarity: 60%

Fast Listener:

  • Speed: 1.75x
  • Stability: 65%
  • Clarity: 90%

Status: Complete
Version: v0.2.0-rc
Date: October 5, 2025