# Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0) **Status**: 🚀 In Progress **Start Date**: October 5, 2025 **Target Completion**: TBD ## Overview Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features. ## Implementation Priority Order ### Priority 1: Conversation Management (Week 1) **Impact**: High | **Complexity**: Low | **Foundation for**: Export features, history search #### Features - Conversation Management - [x] Store structure already supports this (chatStore) - [ ] Save conversations to local storage/file system - [ ] Load previous conversations - [ ] Export conversations (JSON, Markdown, TXT) - [ ] Conversation metadata (title, tags, date) - [ ] Conversation list/browser UI #### Technical Approach - Conversation Management ```typescript // New store: conversationStore.ts interface Conversation { id: string title: string messages: ChatMessage[] created: number updated: number tags: string[] model: string } ``` #### Files to Create/Modify - Conversation Management - `src/stores/conversationStore.ts` - New conversation management store - `src/components/ConversationList.tsx` - Browse saved conversations - `src/components/ConversationExport.tsx` - Export functionality - `src-tauri/src/main.rs` - Add file system commands for save/load --- ### Priority 2: Advanced Message Formatting (Week 1-2) **Impact**: High | **Complexity**: Medium | **Dependencies**: None #### Features - Advanced Message Formatting - [ ] Code syntax highlighting - [ ] Markdown rendering with proper styling - [ ] LaTeX/Math equation support - [ ] Mermaid diagram rendering - [ ] Copy code blocks to clipboard - [ ] Collapsible code sections #### Technical Approach - Advanced Message Formatting **Dependencies to Add**: ```json { "react-markdown": "^9.0.1", "react-syntax-highlighter": "^15.5.0", "rehype-katex": "^7.0.0", "remark-math": "^6.0.0", "remark-gfm": "^4.0.0", "mermaid": "^10.6.1" } ``` #### Files to Create/Modify - Advanced Message Formatting - `src/components/MessageContent.tsx` - Enhanced message renderer - `src/components/CodeBlock.tsx` - Code block with syntax highlighting - `src/components/MermaidDiagram.tsx` - Mermaid diagram renderer - `src/lib/markdown.ts` - Markdown processing utilities --- ### Priority 3: Text-to-Speech Integration (Week 2-3) **Impact**: High | **Complexity**: Medium | **Dependencies**: ElevenLabs API #### Features - Text-to-Speech - [ ] ElevenLabs API integration - [ ] Voice selection UI - [ ] Per-message TTS toggle - [ ] Speech controls (play/pause/stop) - [ ] Voice settings (speed, stability, clarity) - [ ] Audio queue management - [ ] Local fallback (Web Speech API) #### Technical Approach - Text-to-Speech **Dependencies to Add**: ```json { "elevenlabs": "^0.8.0" } ``` **New Rust Dependencies** (Cargo.toml): ```toml rodio = "0.17" # Audio playback ``` #### Files to Create/Modify - Text-to-Speech - `src/lib/elevenlabs.ts` - ElevenLabs API client - `src/lib/tts.ts` - TTS abstraction layer with fallback - `src/components/TTSControls.tsx` - Voice playback controls - `src/components/VoiceSettings.tsx` - Voice configuration UI - `src-tauri/src/audio.rs` - Audio playback module (Rust) - `src-tauri/src/main.rs` - Add audio commands #### Implementation Steps 1. Create ElevenLabs API client with voice listing 2. Add voice selection to settings 3. Implement audio playback queue 4. Add per-message TTS buttons 5. Create global audio controls 6. Implement Web Speech API fallback --- ### Priority 4: Speech-to-Text Integration (Week 3-4) **Impact**: High | **Complexity**: Medium-High | **Dependencies**: Web Speech API or Whisper #### Features - Speech-to-Text - [ ] Push-to-talk button - [ ] Continuous listening mode - [ ] Voice activity detection (VAD) - [ ] Visual feedback (waveform/mic indicator) - [ ] Keyboard shortcut for voice input - [ ] Language selection - [ ] Fallback to Web Speech API #### Technical Approach - Speech-to-Text ##### Option A: Web Speech API (Browser) - Zero cost, works offline - Limited accuracy, browser-dependent - Good for MVP ##### Option B: OpenAI Whisper API - High accuracy - Costs per API call - Better for production **Recommendation**: Start with Web Speech API, add Whisper as optional upgrade #### Files to Create/Modify - Speech-to-Text - `src/lib/stt.ts` - STT abstraction layer - `src/lib/whisper.ts` - OpenAI Whisper client (optional) - `src/components/VoiceInput.tsx` - Microphone button and controls - `src/components/WaveformVisualizer.tsx` - Audio visualization - `src/hooks/useVoiceRecording.ts` - Voice recording hook --- ### Priority 5: File Attachment Support (Week 4) **Impact**: Medium | **Complexity**: Medium | **Dependencies**: None #### Features - File Attachments - [ ] File upload UI (drag & drop + button) - [ ] Image preview and analysis - [ ] PDF text extraction - [ ] File size limits - [ ] Multiple file support - [ ] File metadata display #### Technical Approach - File Attachments **Dependencies to Add**: ```json { "pdf-parse": "^1.1.1", "image-type": "^5.2.0", "file-type": "^16.5.3", "mime-types": "^2.1.34" } ``` **Rust Dependencies** (if needed for file processing): ```toml pdf-extract = "0.7" image = "0.24" ``` #### Files to Create/Modify - File Attachments - `src/components/FileUpload.tsx` - Drag & drop file upload - `src/components/FilePreview.tsx` - Preview attached files - `src/lib/fileProcessor.ts` - Extract text from various formats - `src-tauri/src/file_handler.rs` - File processing in Rust - Update `chatStore.ts` - Add attachments to messages --- ### Priority 6: System Integration (Week 5) **Impact**: Medium | **Complexity**: Medium-High | **Dependencies**: Tauri capabilities #### Features - System Integration - [ ] Global keyboard shortcuts - [ ] System tray icon - [ ] Quick launch hotkey - [ ] Desktop notifications - [ ] Minimize to tray - [ ] Auto-start option #### Technical Approach - System Integration **Tauri Features to Enable** (tauri.conf.json): ```json { "tauri": { "systemTray": { "iconPath": "icons/tray-icon.png" }, "bundle": { "windows": { "webviewInstallMode": { "type": "downloadBootstrapper" } } } } } ``` #### Files to Create/Modify - System Integration - `src-tauri/src/tray.rs` - System tray implementation - `src-tauri/src/shortcuts.rs` - Global shortcut handler - `src/components/NotificationSettings.tsx` - Notification preferences - Update `src-tauri/tauri.conf.json` - Enable system tray --- ## Additional Improvements ### Code Quality - [ ] Add unit tests for new features - [ ] Integration tests for API clients - [ ] E2E tests with Playwright - [ ] Error boundary components - [ ] Comprehensive error handling ### Performance - [ ] Lazy load heavy components - [ ] Virtual scrolling for long conversations - [ ] Optimize re-renders with React.memo - [ ] Audio streaming optimization - [ ] File upload progress indicators ### UX Polish - [ ] Loading skeletons - [ ] Toast notifications - [ ] Keyboard navigation improvements - [ ] Accessibility audit - [ ] Responsive design refinements --- ## Dependencies Summary ### New npm Packages ```bash npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type npm install -D @types/react-syntax-highlighter ``` ### New Rust Crates ```toml # Add to src-tauri/Cargo.toml rodio = "0.17" # Audio playback pdf-extract = "0.7" # PDF processing (optional) image = "0.24" # Image processing (optional) ``` --- ## Testing Strategy ### Manual Testing Checklist - [ ] All conversation operations (save/load/export) - [ ] Markdown rendering with various content types - [ ] TTS with different voices and settings - [ ] STT in push-to-talk and continuous modes - [ ] File uploads (images, PDFs, code files) - [ ] Keyboard shortcuts on all platforms - [ ] System tray interactions ### Automated Tests - [ ] Unit tests for utility functions - [ ] Integration tests for API clients - [ ] Component tests with React Testing Library - [ ] E2E tests for critical user flows --- ## Risk Mitigation ### Known Risks 1. **API Costs**: ElevenLabs and Whisper can be expensive - **Mitigation**: Use free Web Speech API as default, make premium APIs optional 2. **Audio Latency**: TTS/STT pipeline may feel slow - **Mitigation**: Stream audio where possible, show clear loading states 3. **Cross-platform Issues**: Audio/shortcuts may behave differently - **Mitigation**: Test on Linux/macOS/Windows early and often 4. **File Security**: Handling user files safely - **Mitigation**: Strict file type validation, size limits, sandboxing --- ## Success Criteria Phase 2 is complete when: - ✅ Users can save, load, and export conversations - ✅ Messages render with proper code highlighting and formatting - ✅ TTS works with at least one voice provider - ✅ STT works with Web Speech API - ✅ Users can attach and discuss files - ✅ Basic keyboard shortcuts are functional - ✅ System tray integration works on Linux - ✅ All features are documented - ✅ No critical bugs or performance issues --- ## Timeline Estimate **Optimistic**: 4 weeks **Realistic**: 5-6 weeks **Conservative**: 8 weeks Depends on: - Time available per week - API complexity/issues - Cross-platform testing needs - Feature scope adjustments --- ## Next Steps 1. **Install dependencies** for conversation management and markdown rendering 2. **Implement conversation store** and basic save/load 3. **Create ConversationList component** for browsing history 4. **Enhance message rendering** with react-markdown and syntax highlighting 5. **Integrate ElevenLabs TTS** with settings UI 6. **Add voice input** with Web Speech API 7. **Implement file attachments** with preview 8. **Add system tray** and keyboard shortcuts --- **Last Updated**: October 5, 2025 **Status**: Ready to begin implementation