9.9 KiB
Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)
Status: 🚀 In Progress
Start Date: October 5, 2025
Target Completion: TBD
Overview
Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.
Implementation Priority Order
Priority 1: Conversation Management (Week 1)
Impact: High | Complexity: Low | Foundation for: Export features, history search
Features - Conversation Management
- Store structure already supports this (chatStore)
- Save conversations to local storage/file system
- Load previous conversations
- Export conversations (JSON, Markdown, TXT)
- Conversation metadata (title, tags, date)
- Conversation list/browser UI
Technical Approach - Conversation Management
// New store: conversationStore.ts
interface Conversation {
id: string
title: string
messages: ChatMessage[]
created: number
updated: number
tags: string[]
model: string
}
Files to Create/Modify - Conversation Management
src/stores/conversationStore.ts- New conversation management storesrc/components/ConversationList.tsx- Browse saved conversationssrc/components/ConversationExport.tsx- Export functionalitysrc-tauri/src/main.rs- Add file system commands for save/load
Priority 2: Advanced Message Formatting (Week 1-2)
Impact: High | Complexity: Medium | Dependencies: None
Features - Advanced Message Formatting
- Code syntax highlighting
- Markdown rendering with proper styling
- LaTeX/Math equation support
- Mermaid diagram rendering
- Copy code blocks to clipboard
- Collapsible code sections
Technical Approach - Advanced Message Formatting
Dependencies to Add:
{
"react-markdown": "^9.0.1",
"react-syntax-highlighter": "^15.5.0",
"rehype-katex": "^7.0.0",
"remark-math": "^6.0.0",
"remark-gfm": "^4.0.0",
"mermaid": "^10.6.1"
}
Files to Create/Modify - Advanced Message Formatting
src/components/MessageContent.tsx- Enhanced message renderersrc/components/CodeBlock.tsx- Code block with syntax highlightingsrc/components/MermaidDiagram.tsx- Mermaid diagram renderersrc/lib/markdown.ts- Markdown processing utilities
Priority 3: Text-to-Speech Integration (Week 2-3)
Impact: High | Complexity: Medium | Dependencies: ElevenLabs API
Features - Text-to-Speech
- ElevenLabs API integration
- Voice selection UI
- Per-message TTS toggle
- Speech controls (play/pause/stop)
- Voice settings (speed, stability, clarity)
- Audio queue management
- Local fallback (Web Speech API)
Technical Approach - Text-to-Speech
Dependencies to Add:
{
"elevenlabs": "^0.8.0"
}
New Rust Dependencies (Cargo.toml):
rodio = "0.17" # Audio playback
Files to Create/Modify - Text-to-Speech
src/lib/elevenlabs.ts- ElevenLabs API clientsrc/lib/tts.ts- TTS abstraction layer with fallbacksrc/components/TTSControls.tsx- Voice playback controlssrc/components/VoiceSettings.tsx- Voice configuration UIsrc-tauri/src/audio.rs- Audio playback module (Rust)src-tauri/src/main.rs- Add audio commands
Implementation Steps
- Create ElevenLabs API client with voice listing
- Add voice selection to settings
- Implement audio playback queue
- Add per-message TTS buttons
- Create global audio controls
- Implement Web Speech API fallback
Priority 4: Speech-to-Text Integration (Week 3-4)
Impact: High | Complexity: Medium-High | Dependencies: Web Speech API or Whisper
Features - Speech-to-Text
- Push-to-talk button
- Continuous listening mode
- Voice activity detection (VAD)
- Visual feedback (waveform/mic indicator)
- Keyboard shortcut for voice input
- Language selection
- Fallback to Web Speech API
Technical Approach - Speech-to-Text
Option A: Web Speech API (Browser)
- Zero cost, works offline
- Limited accuracy, browser-dependent
- Good for MVP
Option B: OpenAI Whisper API
- High accuracy
- Costs per API call
- Better for production
Recommendation: Start with Web Speech API, add Whisper as optional upgrade
Files to Create/Modify - Speech-to-Text
src/lib/stt.ts- STT abstraction layersrc/lib/whisper.ts- OpenAI Whisper client (optional)src/components/VoiceInput.tsx- Microphone button and controlssrc/components/WaveformVisualizer.tsx- Audio visualizationsrc/hooks/useVoiceRecording.ts- Voice recording hook
Priority 5: File Attachment Support (Week 4)
Impact: Medium | Complexity: Medium | Dependencies: None
Features - File Attachments
- File upload UI (drag & drop + button)
- Image preview and analysis
- PDF text extraction
- File size limits
- Multiple file support
- File metadata display
Technical Approach - File Attachments
Dependencies to Add:
{
"pdf-parse": "^1.1.1",
"image-type": "^5.2.0",
"file-type": "^16.5.3",
"mime-types": "^2.1.34"
}
Rust Dependencies (if needed for file processing):
pdf-extract = "0.7"
image = "0.24"
Files to Create/Modify - File Attachments
src/components/FileUpload.tsx- Drag & drop file uploadsrc/components/FilePreview.tsx- Preview attached filessrc/lib/fileProcessor.ts- Extract text from various formatssrc-tauri/src/file_handler.rs- File processing in Rust- Update
chatStore.ts- Add attachments to messages
Priority 6: System Integration (Week 5)
Impact: Medium | Complexity: Medium-High | Dependencies: Tauri capabilities
Features - System Integration
- Global keyboard shortcuts
- System tray icon
- Quick launch hotkey
- Desktop notifications
- Minimize to tray
- Auto-start option
Technical Approach - System Integration
Tauri Features to Enable (tauri.conf.json):
{
"tauri": {
"systemTray": {
"iconPath": "icons/tray-icon.png"
},
"bundle": {
"windows": {
"webviewInstallMode": {
"type": "downloadBootstrapper"
}
}
}
}
}
Files to Create/Modify - System Integration
src-tauri/src/tray.rs- System tray implementationsrc-tauri/src/shortcuts.rs- Global shortcut handlersrc/components/NotificationSettings.tsx- Notification preferences- Update
src-tauri/tauri.conf.json- Enable system tray
Additional Improvements
Code Quality
- Add unit tests for new features
- Integration tests for API clients
- E2E tests with Playwright
- Error boundary components
- Comprehensive error handling
Performance
- Lazy load heavy components
- Virtual scrolling for long conversations
- Optimize re-renders with React.memo
- Audio streaming optimization
- File upload progress indicators
UX Polish
- Loading skeletons
- Toast notifications
- Keyboard navigation improvements
- Accessibility audit
- Responsive design refinements
Dependencies Summary
New npm Packages
npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
npm install -D @types/react-syntax-highlighter
New Rust Crates
# Add to src-tauri/Cargo.toml
rodio = "0.17" # Audio playback
pdf-extract = "0.7" # PDF processing (optional)
image = "0.24" # Image processing (optional)
Testing Strategy
Manual Testing Checklist
- All conversation operations (save/load/export)
- Markdown rendering with various content types
- TTS with different voices and settings
- STT in push-to-talk and continuous modes
- File uploads (images, PDFs, code files)
- Keyboard shortcuts on all platforms
- System tray interactions
Automated Tests
- Unit tests for utility functions
- Integration tests for API clients
- Component tests with React Testing Library
- E2E tests for critical user flows
Risk Mitigation
Known Risks
-
API Costs: ElevenLabs and Whisper can be expensive
- Mitigation: Use free Web Speech API as default, make premium APIs optional
-
Audio Latency: TTS/STT pipeline may feel slow
- Mitigation: Stream audio where possible, show clear loading states
-
Cross-platform Issues: Audio/shortcuts may behave differently
- Mitigation: Test on Linux/macOS/Windows early and often
-
File Security: Handling user files safely
- Mitigation: Strict file type validation, size limits, sandboxing
Success Criteria
Phase 2 is complete when:
- ✅ Users can save, load, and export conversations
- ✅ Messages render with proper code highlighting and formatting
- ✅ TTS works with at least one voice provider
- ✅ STT works with Web Speech API
- ✅ Users can attach and discuss files
- ✅ Basic keyboard shortcuts are functional
- ✅ System tray integration works on Linux
- ✅ All features are documented
- ✅ No critical bugs or performance issues
Timeline Estimate
Optimistic: 4 weeks
Realistic: 5-6 weeks
Conservative: 8 weeks
Depends on:
- Time available per week
- API complexity/issues
- Cross-platform testing needs
- Feature scope adjustments
Next Steps
- Install dependencies for conversation management and markdown rendering
- Implement conversation store and basic save/load
- Create ConversationList component for browsing history
- Enhance message rendering with react-markdown and syntax highlighting
- Integrate ElevenLabs TTS with settings UI
- Add voice input with Web Speech API
- Implement file attachments with preview
- Add system tray and keyboard shortcuts
Last Updated: October 5, 2025
Status: Ready to begin implementation