Files
eve-alpha/docs/planning/PHASE2_PLAN.md
Aodhan Collins 66749a5ce7 Initial commit
2025-10-06 00:33:04 +01:00

396 lines
9.9 KiB
Markdown

# Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)
**Status**: 🚀 In Progress
**Start Date**: October 5, 2025
**Target Completion**: TBD
## Overview
Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.
## Implementation Priority Order
### Priority 1: Conversation Management (Week 1)
**Impact**: High | **Complexity**: Low | **Foundation for**: Export features, history search
#### Features - Conversation Management
- [x] Store structure already supports this (chatStore)
- [ ] Save conversations to local storage/file system
- [ ] Load previous conversations
- [ ] Export conversations (JSON, Markdown, TXT)
- [ ] Conversation metadata (title, tags, date)
- [ ] Conversation list/browser UI
#### Technical Approach - Conversation Management
```typescript
// New store: conversationStore.ts
interface Conversation {
id: string
title: string
messages: ChatMessage[]
created: number
updated: number
tags: string[]
model: string
}
```
#### Files to Create/Modify - Conversation Management
- `src/stores/conversationStore.ts` - New conversation management store
- `src/components/ConversationList.tsx` - Browse saved conversations
- `src/components/ConversationExport.tsx` - Export functionality
- `src-tauri/src/main.rs` - Add file system commands for save/load
---
### Priority 2: Advanced Message Formatting (Week 1-2)
**Impact**: High | **Complexity**: Medium | **Dependencies**: None
#### Features - Advanced Message Formatting
- [ ] Code syntax highlighting
- [ ] Markdown rendering with proper styling
- [ ] LaTeX/Math equation support
- [ ] Mermaid diagram rendering
- [ ] Copy code blocks to clipboard
- [ ] Collapsible code sections
#### Technical Approach - Advanced Message Formatting
**Dependencies to Add**:
```json
{
"react-markdown": "^9.0.1",
"react-syntax-highlighter": "^15.5.0",
"rehype-katex": "^7.0.0",
"remark-math": "^6.0.0",
"remark-gfm": "^4.0.0",
"mermaid": "^10.6.1"
}
```
#### Files to Create/Modify - Advanced Message Formatting
- `src/components/MessageContent.tsx` - Enhanced message renderer
- `src/components/CodeBlock.tsx` - Code block with syntax highlighting
- `src/components/MermaidDiagram.tsx` - Mermaid diagram renderer
- `src/lib/markdown.ts` - Markdown processing utilities
---
### Priority 3: Text-to-Speech Integration (Week 2-3)
**Impact**: High | **Complexity**: Medium | **Dependencies**: ElevenLabs API
#### Features - Text-to-Speech
- [ ] ElevenLabs API integration
- [ ] Voice selection UI
- [ ] Per-message TTS toggle
- [ ] Speech controls (play/pause/stop)
- [ ] Voice settings (speed, stability, clarity)
- [ ] Audio queue management
- [ ] Local fallback (Web Speech API)
#### Technical Approach - Text-to-Speech
**Dependencies to Add**:
```json
{
"elevenlabs": "^0.8.0"
}
```
**New Rust Dependencies** (Cargo.toml):
```toml
rodio = "0.17" # Audio playback
```
#### Files to Create/Modify - Text-to-Speech
- `src/lib/elevenlabs.ts` - ElevenLabs API client
- `src/lib/tts.ts` - TTS abstraction layer with fallback
- `src/components/TTSControls.tsx` - Voice playback controls
- `src/components/VoiceSettings.tsx` - Voice configuration UI
- `src-tauri/src/audio.rs` - Audio playback module (Rust)
- `src-tauri/src/main.rs` - Add audio commands
#### Implementation Steps
1. Create ElevenLabs API client with voice listing
2. Add voice selection to settings
3. Implement audio playback queue
4. Add per-message TTS buttons
5. Create global audio controls
6. Implement Web Speech API fallback
---
### Priority 4: Speech-to-Text Integration (Week 3-4)
**Impact**: High | **Complexity**: Medium-High | **Dependencies**: Web Speech API or Whisper
#### Features - Speech-to-Text
- [ ] Push-to-talk button
- [ ] Continuous listening mode
- [ ] Voice activity detection (VAD)
- [ ] Visual feedback (waveform/mic indicator)
- [ ] Keyboard shortcut for voice input
- [ ] Language selection
- [ ] Fallback to Web Speech API
#### Technical Approach - Speech-to-Text
##### Option A: Web Speech API (Browser)
- Zero cost, works offline
- Limited accuracy, browser-dependent
- Good for MVP
##### Option B: OpenAI Whisper API
- High accuracy
- Costs per API call
- Better for production
**Recommendation**: Start with Web Speech API, add Whisper as optional upgrade
#### Files to Create/Modify - Speech-to-Text
- `src/lib/stt.ts` - STT abstraction layer
- `src/lib/whisper.ts` - OpenAI Whisper client (optional)
- `src/components/VoiceInput.tsx` - Microphone button and controls
- `src/components/WaveformVisualizer.tsx` - Audio visualization
- `src/hooks/useVoiceRecording.ts` - Voice recording hook
---
### Priority 5: File Attachment Support (Week 4)
**Impact**: Medium | **Complexity**: Medium | **Dependencies**: None
#### Features - File Attachments
- [ ] File upload UI (drag & drop + button)
- [ ] Image preview and analysis
- [ ] PDF text extraction
- [ ] File size limits
- [ ] Multiple file support
- [ ] File metadata display
#### Technical Approach - File Attachments
**Dependencies to Add**:
```json
{
"pdf-parse": "^1.1.1",
"image-type": "^5.2.0",
"file-type": "^16.5.3",
"mime-types": "^2.1.34"
}
```
**Rust Dependencies** (if needed for file processing):
```toml
pdf-extract = "0.7"
image = "0.24"
```
#### Files to Create/Modify - File Attachments
- `src/components/FileUpload.tsx` - Drag & drop file upload
- `src/components/FilePreview.tsx` - Preview attached files
- `src/lib/fileProcessor.ts` - Extract text from various formats
- `src-tauri/src/file_handler.rs` - File processing in Rust
- Update `chatStore.ts` - Add attachments to messages
---
### Priority 6: System Integration (Week 5)
**Impact**: Medium | **Complexity**: Medium-High | **Dependencies**: Tauri capabilities
#### Features - System Integration
- [ ] Global keyboard shortcuts
- [ ] System tray icon
- [ ] Quick launch hotkey
- [ ] Desktop notifications
- [ ] Minimize to tray
- [ ] Auto-start option
#### Technical Approach - System Integration
**Tauri Features to Enable** (tauri.conf.json):
```json
{
"tauri": {
"systemTray": {
"iconPath": "icons/tray-icon.png"
},
"bundle": {
"windows": {
"webviewInstallMode": {
"type": "downloadBootstrapper"
}
}
}
}
}
```
#### Files to Create/Modify - System Integration
- `src-tauri/src/tray.rs` - System tray implementation
- `src-tauri/src/shortcuts.rs` - Global shortcut handler
- `src/components/NotificationSettings.tsx` - Notification preferences
- Update `src-tauri/tauri.conf.json` - Enable system tray
---
## Additional Improvements
### Code Quality
- [ ] Add unit tests for new features
- [ ] Integration tests for API clients
- [ ] E2E tests with Playwright
- [ ] Error boundary components
- [ ] Comprehensive error handling
### Performance
- [ ] Lazy load heavy components
- [ ] Virtual scrolling for long conversations
- [ ] Optimize re-renders with React.memo
- [ ] Audio streaming optimization
- [ ] File upload progress indicators
### UX Polish
- [ ] Loading skeletons
- [ ] Toast notifications
- [ ] Keyboard navigation improvements
- [ ] Accessibility audit
- [ ] Responsive design refinements
---
## Dependencies Summary
### New npm Packages
```bash
npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
npm install -D @types/react-syntax-highlighter
```
### New Rust Crates
```toml
# Add to src-tauri/Cargo.toml
rodio = "0.17" # Audio playback
pdf-extract = "0.7" # PDF processing (optional)
image = "0.24" # Image processing (optional)
```
---
## Testing Strategy
### Manual Testing Checklist
- [ ] All conversation operations (save/load/export)
- [ ] Markdown rendering with various content types
- [ ] TTS with different voices and settings
- [ ] STT in push-to-talk and continuous modes
- [ ] File uploads (images, PDFs, code files)
- [ ] Keyboard shortcuts on all platforms
- [ ] System tray interactions
### Automated Tests
- [ ] Unit tests for utility functions
- [ ] Integration tests for API clients
- [ ] Component tests with React Testing Library
- [ ] E2E tests for critical user flows
---
## Risk Mitigation
### Known Risks
1. **API Costs**: ElevenLabs and Whisper can be expensive
- **Mitigation**: Use free Web Speech API as default, make premium APIs optional
2. **Audio Latency**: TTS/STT pipeline may feel slow
- **Mitigation**: Stream audio where possible, show clear loading states
3. **Cross-platform Issues**: Audio/shortcuts may behave differently
- **Mitigation**: Test on Linux/macOS/Windows early and often
4. **File Security**: Handling user files safely
- **Mitigation**: Strict file type validation, size limits, sandboxing
---
## Success Criteria
Phase 2 is complete when:
- ✅ Users can save, load, and export conversations
- ✅ Messages render with proper code highlighting and formatting
- ✅ TTS works with at least one voice provider
- ✅ STT works with Web Speech API
- ✅ Users can attach and discuss files
- ✅ Basic keyboard shortcuts are functional
- ✅ System tray integration works on Linux
- ✅ All features are documented
- ✅ No critical bugs or performance issues
---
## Timeline Estimate
**Optimistic**: 4 weeks
**Realistic**: 5-6 weeks
**Conservative**: 8 weeks
Depends on:
- Time available per week
- API complexity/issues
- Cross-platform testing needs
- Feature scope adjustments
---
## Next Steps
1. **Install dependencies** for conversation management and markdown rendering
2. **Implement conversation store** and basic save/load
3. **Create ConversationList component** for browsing history
4. **Enhance message rendering** with react-markdown and syntax highlighting
5. **Integrate ElevenLabs TTS** with settings UI
6. **Add voice input** with Web Speech API
7. **Implement file attachments** with preview
8. **Add system tray** and keyboard shortcuts
---
**Last Updated**: October 5, 2025
**Status**: Ready to begin implementation