Files
eve-alpha/docs/planning/PHASE2_PLAN.md
Aodhan Collins 66749a5ce7 Initial commit
2025-10-06 00:33:04 +01:00

9.9 KiB

Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)

Status: 🚀 In Progress
Start Date: October 5, 2025
Target Completion: TBD

Overview

Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.

Implementation Priority Order

Priority 1: Conversation Management (Week 1)

Impact: High | Complexity: Low | Foundation for: Export features, history search

Features - Conversation Management

  • Store structure already supports this (chatStore)
  • Save conversations to local storage/file system
  • Load previous conversations
  • Export conversations (JSON, Markdown, TXT)
  • Conversation metadata (title, tags, date)
  • Conversation list/browser UI

Technical Approach - Conversation Management

// New store: conversationStore.ts
interface Conversation {
  id: string
  title: string
  messages: ChatMessage[]
  created: number
  updated: number
  tags: string[]
  model: string
}

Files to Create/Modify - Conversation Management

  • src/stores/conversationStore.ts - New conversation management store
  • src/components/ConversationList.tsx - Browse saved conversations
  • src/components/ConversationExport.tsx - Export functionality
  • src-tauri/src/main.rs - Add file system commands for save/load

Priority 2: Advanced Message Formatting (Week 1-2)

Impact: High | Complexity: Medium | Dependencies: None

Features - Advanced Message Formatting

  • Code syntax highlighting
  • Markdown rendering with proper styling
  • LaTeX/Math equation support
  • Mermaid diagram rendering
  • Copy code blocks to clipboard
  • Collapsible code sections

Technical Approach - Advanced Message Formatting

Dependencies to Add:

{
  "react-markdown": "^9.0.1",
  "react-syntax-highlighter": "^15.5.0",
  "rehype-katex": "^7.0.0",
  "remark-math": "^6.0.0",
  "remark-gfm": "^4.0.0",
  "mermaid": "^10.6.1"
}

Files to Create/Modify - Advanced Message Formatting

  • src/components/MessageContent.tsx - Enhanced message renderer
  • src/components/CodeBlock.tsx - Code block with syntax highlighting
  • src/components/MermaidDiagram.tsx - Mermaid diagram renderer
  • src/lib/markdown.ts - Markdown processing utilities

Priority 3: Text-to-Speech Integration (Week 2-3)

Impact: High | Complexity: Medium | Dependencies: ElevenLabs API

Features - Text-to-Speech

  • ElevenLabs API integration
  • Voice selection UI
  • Per-message TTS toggle
  • Speech controls (play/pause/stop)
  • Voice settings (speed, stability, clarity)
  • Audio queue management
  • Local fallback (Web Speech API)

Technical Approach - Text-to-Speech

Dependencies to Add:

{
  "elevenlabs": "^0.8.0"
}

New Rust Dependencies (Cargo.toml):

rodio = "0.17"  # Audio playback

Files to Create/Modify - Text-to-Speech

  • src/lib/elevenlabs.ts - ElevenLabs API client
  • src/lib/tts.ts - TTS abstraction layer with fallback
  • src/components/TTSControls.tsx - Voice playback controls
  • src/components/VoiceSettings.tsx - Voice configuration UI
  • src-tauri/src/audio.rs - Audio playback module (Rust)
  • src-tauri/src/main.rs - Add audio commands

Implementation Steps

  1. Create ElevenLabs API client with voice listing
  2. Add voice selection to settings
  3. Implement audio playback queue
  4. Add per-message TTS buttons
  5. Create global audio controls
  6. Implement Web Speech API fallback

Priority 4: Speech-to-Text Integration (Week 3-4)

Impact: High | Complexity: Medium-High | Dependencies: Web Speech API or Whisper

Features - Speech-to-Text

  • Push-to-talk button
  • Continuous listening mode
  • Voice activity detection (VAD)
  • Visual feedback (waveform/mic indicator)
  • Keyboard shortcut for voice input
  • Language selection
  • Fallback to Web Speech API

Technical Approach - Speech-to-Text

Option A: Web Speech API (Browser)
  • Zero cost, works offline
  • Limited accuracy, browser-dependent
  • Good for MVP
Option B: OpenAI Whisper API
  • High accuracy
  • Costs per API call
  • Better for production

Recommendation: Start with Web Speech API, add Whisper as optional upgrade

Files to Create/Modify - Speech-to-Text

  • src/lib/stt.ts - STT abstraction layer
  • src/lib/whisper.ts - OpenAI Whisper client (optional)
  • src/components/VoiceInput.tsx - Microphone button and controls
  • src/components/WaveformVisualizer.tsx - Audio visualization
  • src/hooks/useVoiceRecording.ts - Voice recording hook

Priority 5: File Attachment Support (Week 4)

Impact: Medium | Complexity: Medium | Dependencies: None

Features - File Attachments

  • File upload UI (drag & drop + button)
  • Image preview and analysis
  • PDF text extraction
  • File size limits
  • Multiple file support
  • File metadata display

Technical Approach - File Attachments

Dependencies to Add:

{
  "pdf-parse": "^1.1.1",
  "image-type": "^5.2.0",
  "file-type": "^16.5.3",
  "mime-types": "^2.1.34"
}

Rust Dependencies (if needed for file processing):

pdf-extract = "0.7"
image = "0.24"

Files to Create/Modify - File Attachments

  • src/components/FileUpload.tsx - Drag & drop file upload
  • src/components/FilePreview.tsx - Preview attached files
  • src/lib/fileProcessor.ts - Extract text from various formats
  • src-tauri/src/file_handler.rs - File processing in Rust
  • Update chatStore.ts - Add attachments to messages

Priority 6: System Integration (Week 5)

Impact: Medium | Complexity: Medium-High | Dependencies: Tauri capabilities

Features - System Integration

  • Global keyboard shortcuts
  • System tray icon
  • Quick launch hotkey
  • Desktop notifications
  • Minimize to tray
  • Auto-start option

Technical Approach - System Integration

Tauri Features to Enable (tauri.conf.json):

{
  "tauri": {
    "systemTray": {
      "iconPath": "icons/tray-icon.png"
    },
    "bundle": {
      "windows": {
        "webviewInstallMode": {
          "type": "downloadBootstrapper"
        }
      }
    }
  }
}

Files to Create/Modify - System Integration

  • src-tauri/src/tray.rs - System tray implementation
  • src-tauri/src/shortcuts.rs - Global shortcut handler
  • src/components/NotificationSettings.tsx - Notification preferences
  • Update src-tauri/tauri.conf.json - Enable system tray

Additional Improvements

Code Quality

  • Add unit tests for new features
  • Integration tests for API clients
  • E2E tests with Playwright
  • Error boundary components
  • Comprehensive error handling

Performance

  • Lazy load heavy components
  • Virtual scrolling for long conversations
  • Optimize re-renders with React.memo
  • Audio streaming optimization
  • File upload progress indicators

UX Polish

  • Loading skeletons
  • Toast notifications
  • Keyboard navigation improvements
  • Accessibility audit
  • Responsive design refinements

Dependencies Summary

New npm Packages

npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
npm install -D @types/react-syntax-highlighter

New Rust Crates

# Add to src-tauri/Cargo.toml
rodio = "0.17"           # Audio playback
pdf-extract = "0.7"      # PDF processing (optional)
image = "0.24"           # Image processing (optional)

Testing Strategy

Manual Testing Checklist

  • All conversation operations (save/load/export)
  • Markdown rendering with various content types
  • TTS with different voices and settings
  • STT in push-to-talk and continuous modes
  • File uploads (images, PDFs, code files)
  • Keyboard shortcuts on all platforms
  • System tray interactions

Automated Tests

  • Unit tests for utility functions
  • Integration tests for API clients
  • Component tests with React Testing Library
  • E2E tests for critical user flows

Risk Mitigation

Known Risks

  1. API Costs: ElevenLabs and Whisper can be expensive

    • Mitigation: Use free Web Speech API as default, make premium APIs optional
  2. Audio Latency: TTS/STT pipeline may feel slow

    • Mitigation: Stream audio where possible, show clear loading states
  3. Cross-platform Issues: Audio/shortcuts may behave differently

    • Mitigation: Test on Linux/macOS/Windows early and often
  4. File Security: Handling user files safely

    • Mitigation: Strict file type validation, size limits, sandboxing

Success Criteria

Phase 2 is complete when:

  • Users can save, load, and export conversations
  • Messages render with proper code highlighting and formatting
  • TTS works with at least one voice provider
  • STT works with Web Speech API
  • Users can attach and discuss files
  • Basic keyboard shortcuts are functional
  • System tray integration works on Linux
  • All features are documented
  • No critical bugs or performance issues

Timeline Estimate

Optimistic: 4 weeks
Realistic: 5-6 weeks
Conservative: 8 weeks

Depends on:

  • Time available per week
  • API complexity/issues
  • Cross-platform testing needs
  • Feature scope adjustments

Next Steps

  1. Install dependencies for conversation management and markdown rendering
  2. Implement conversation store and basic save/load
  3. Create ConversationList component for browsing history
  4. Enhance message rendering with react-markdown and syntax highlighting
  5. Integrate ElevenLabs TTS with settings UI
  6. Add voice input with Web Speech API
  7. Implement file attachments with preview
  8. Add system tray and keyboard shortcuts

Last Updated: October 5, 2025
Status: Ready to begin implementation