Initial commit
This commit is contained in:
395
docs/planning/PHASE2_PLAN.md
Normal file
395
docs/planning/PHASE2_PLAN.md
Normal file
@@ -0,0 +1,395 @@
|
||||
# Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)
|
||||
|
||||
**Status**: 🚀 In Progress
|
||||
**Start Date**: October 5, 2025
|
||||
**Target Completion**: TBD
|
||||
|
||||
## Overview
|
||||
|
||||
Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.
|
||||
|
||||
## Implementation Priority Order
|
||||
|
||||
### Priority 1: Conversation Management (Week 1)
|
||||
|
||||
**Impact**: High | **Complexity**: Low | **Foundation for**: Export features, history search
|
||||
|
||||
#### Features - Conversation Management
|
||||
|
||||
- [x] Store structure already supports this (chatStore)
|
||||
- [ ] Save conversations to local storage/file system
|
||||
- [ ] Load previous conversations
|
||||
- [ ] Export conversations (JSON, Markdown, TXT)
|
||||
- [ ] Conversation metadata (title, tags, date)
|
||||
- [ ] Conversation list/browser UI
|
||||
|
||||
#### Technical Approach - Conversation Management
|
||||
|
||||
```typescript
|
||||
// New store: conversationStore.ts
|
||||
interface Conversation {
|
||||
id: string
|
||||
title: string
|
||||
messages: ChatMessage[]
|
||||
created: number
|
||||
updated: number
|
||||
tags: string[]
|
||||
model: string
|
||||
}
|
||||
```
|
||||
|
||||
#### Files to Create/Modify - Conversation Management
|
||||
|
||||
- `src/stores/conversationStore.ts` - New conversation management store
|
||||
- `src/components/ConversationList.tsx` - Browse saved conversations
|
||||
- `src/components/ConversationExport.tsx` - Export functionality
|
||||
- `src-tauri/src/main.rs` - Add file system commands for save/load
|
||||
|
||||
---
|
||||
|
||||
### Priority 2: Advanced Message Formatting (Week 1-2)
|
||||
|
||||
**Impact**: High | **Complexity**: Medium | **Dependencies**: None
|
||||
|
||||
#### Features - Advanced Message Formatting
|
||||
|
||||
- [ ] Code syntax highlighting
|
||||
- [ ] Markdown rendering with proper styling
|
||||
- [ ] LaTeX/Math equation support
|
||||
- [ ] Mermaid diagram rendering
|
||||
- [ ] Copy code blocks to clipboard
|
||||
- [ ] Collapsible code sections
|
||||
|
||||
#### Technical Approach - Advanced Message Formatting
|
||||
|
||||
**Dependencies to Add**:
|
||||
|
||||
```json
|
||||
{
|
||||
"react-markdown": "^9.0.1",
|
||||
"react-syntax-highlighter": "^15.5.0",
|
||||
"rehype-katex": "^7.0.0",
|
||||
"remark-math": "^6.0.0",
|
||||
"remark-gfm": "^4.0.0",
|
||||
"mermaid": "^10.6.1"
|
||||
}
|
||||
```
|
||||
|
||||
#### Files to Create/Modify - Advanced Message Formatting
|
||||
|
||||
- `src/components/MessageContent.tsx` - Enhanced message renderer
|
||||
- `src/components/CodeBlock.tsx` - Code block with syntax highlighting
|
||||
- `src/components/MermaidDiagram.tsx` - Mermaid diagram renderer
|
||||
- `src/lib/markdown.ts` - Markdown processing utilities
|
||||
|
||||
---
|
||||
|
||||
### Priority 3: Text-to-Speech Integration (Week 2-3)
|
||||
|
||||
**Impact**: High | **Complexity**: Medium | **Dependencies**: ElevenLabs API
|
||||
|
||||
#### Features - Text-to-Speech
|
||||
|
||||
- [ ] ElevenLabs API integration
|
||||
- [ ] Voice selection UI
|
||||
- [ ] Per-message TTS toggle
|
||||
- [ ] Speech controls (play/pause/stop)
|
||||
- [ ] Voice settings (speed, stability, clarity)
|
||||
- [ ] Audio queue management
|
||||
- [ ] Local fallback (Web Speech API)
|
||||
|
||||
#### Technical Approach - Text-to-Speech
|
||||
|
||||
**Dependencies to Add**:
|
||||
|
||||
```json
|
||||
{
|
||||
"elevenlabs": "^0.8.0"
|
||||
}
|
||||
```
|
||||
|
||||
**New Rust Dependencies** (Cargo.toml):
|
||||
|
||||
```toml
|
||||
rodio = "0.17" # Audio playback
|
||||
```
|
||||
|
||||
#### Files to Create/Modify - Text-to-Speech
|
||||
|
||||
- `src/lib/elevenlabs.ts` - ElevenLabs API client
|
||||
- `src/lib/tts.ts` - TTS abstraction layer with fallback
|
||||
- `src/components/TTSControls.tsx` - Voice playback controls
|
||||
- `src/components/VoiceSettings.tsx` - Voice configuration UI
|
||||
- `src-tauri/src/audio.rs` - Audio playback module (Rust)
|
||||
- `src-tauri/src/main.rs` - Add audio commands
|
||||
|
||||
#### Implementation Steps
|
||||
|
||||
1. Create ElevenLabs API client with voice listing
|
||||
2. Add voice selection to settings
|
||||
3. Implement audio playback queue
|
||||
4. Add per-message TTS buttons
|
||||
5. Create global audio controls
|
||||
6. Implement Web Speech API fallback
|
||||
|
||||
---
|
||||
|
||||
### Priority 4: Speech-to-Text Integration (Week 3-4)
|
||||
|
||||
**Impact**: High | **Complexity**: Medium-High | **Dependencies**: Web Speech API or Whisper
|
||||
|
||||
#### Features - Speech-to-Text
|
||||
|
||||
- [ ] Push-to-talk button
|
||||
- [ ] Continuous listening mode
|
||||
- [ ] Voice activity detection (VAD)
|
||||
- [ ] Visual feedback (waveform/mic indicator)
|
||||
- [ ] Keyboard shortcut for voice input
|
||||
- [ ] Language selection
|
||||
- [ ] Fallback to Web Speech API
|
||||
|
||||
#### Technical Approach - Speech-to-Text
|
||||
|
||||
##### Option A: Web Speech API (Browser)
|
||||
|
||||
- Zero cost, works offline
|
||||
- Limited accuracy, browser-dependent
|
||||
- Good for MVP
|
||||
|
||||
##### Option B: OpenAI Whisper API
|
||||
|
||||
- High accuracy
|
||||
- Costs per API call
|
||||
- Better for production
|
||||
|
||||
**Recommendation**: Start with Web Speech API, add Whisper as optional upgrade
|
||||
|
||||
#### Files to Create/Modify - Speech-to-Text
|
||||
|
||||
- `src/lib/stt.ts` - STT abstraction layer
|
||||
- `src/lib/whisper.ts` - OpenAI Whisper client (optional)
|
||||
- `src/components/VoiceInput.tsx` - Microphone button and controls
|
||||
- `src/components/WaveformVisualizer.tsx` - Audio visualization
|
||||
- `src/hooks/useVoiceRecording.ts` - Voice recording hook
|
||||
|
||||
---
|
||||
|
||||
### Priority 5: File Attachment Support (Week 4)
|
||||
|
||||
**Impact**: Medium | **Complexity**: Medium | **Dependencies**: None
|
||||
|
||||
#### Features - File Attachments
|
||||
|
||||
- [ ] File upload UI (drag & drop + button)
|
||||
- [ ] Image preview and analysis
|
||||
- [ ] PDF text extraction
|
||||
- [ ] File size limits
|
||||
- [ ] Multiple file support
|
||||
- [ ] File metadata display
|
||||
|
||||
#### Technical Approach - File Attachments
|
||||
|
||||
**Dependencies to Add**:
|
||||
|
||||
```json
|
||||
{
|
||||
"pdf-parse": "^1.1.1",
|
||||
"image-type": "^5.2.0",
|
||||
"file-type": "^16.5.3",
|
||||
"mime-types": "^2.1.34"
|
||||
}
|
||||
```
|
||||
|
||||
**Rust Dependencies** (if needed for file processing):
|
||||
|
||||
```toml
|
||||
pdf-extract = "0.7"
|
||||
image = "0.24"
|
||||
```
|
||||
|
||||
#### Files to Create/Modify - File Attachments
|
||||
|
||||
- `src/components/FileUpload.tsx` - Drag & drop file upload
|
||||
- `src/components/FilePreview.tsx` - Preview attached files
|
||||
- `src/lib/fileProcessor.ts` - Extract text from various formats
|
||||
- `src-tauri/src/file_handler.rs` - File processing in Rust
|
||||
- Update `chatStore.ts` - Add attachments to messages
|
||||
|
||||
---
|
||||
|
||||
### Priority 6: System Integration (Week 5)
|
||||
|
||||
**Impact**: Medium | **Complexity**: Medium-High | **Dependencies**: Tauri capabilities
|
||||
|
||||
#### Features - System Integration
|
||||
|
||||
- [ ] Global keyboard shortcuts
|
||||
- [ ] System tray icon
|
||||
- [ ] Quick launch hotkey
|
||||
- [ ] Desktop notifications
|
||||
- [ ] Minimize to tray
|
||||
- [ ] Auto-start option
|
||||
|
||||
#### Technical Approach - System Integration
|
||||
|
||||
**Tauri Features to Enable** (tauri.conf.json):
|
||||
|
||||
```json
|
||||
{
|
||||
"tauri": {
|
||||
"systemTray": {
|
||||
"iconPath": "icons/tray-icon.png"
|
||||
},
|
||||
"bundle": {
|
||||
"windows": {
|
||||
"webviewInstallMode": {
|
||||
"type": "downloadBootstrapper"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Files to Create/Modify - System Integration
|
||||
|
||||
- `src-tauri/src/tray.rs` - System tray implementation
|
||||
- `src-tauri/src/shortcuts.rs` - Global shortcut handler
|
||||
- `src/components/NotificationSettings.tsx` - Notification preferences
|
||||
- Update `src-tauri/tauri.conf.json` - Enable system tray
|
||||
|
||||
---
|
||||
|
||||
## Additional Improvements
|
||||
|
||||
### Code Quality
|
||||
|
||||
- [ ] Add unit tests for new features
|
||||
- [ ] Integration tests for API clients
|
||||
- [ ] E2E tests with Playwright
|
||||
- [ ] Error boundary components
|
||||
- [ ] Comprehensive error handling
|
||||
|
||||
### Performance
|
||||
|
||||
- [ ] Lazy load heavy components
|
||||
- [ ] Virtual scrolling for long conversations
|
||||
- [ ] Optimize re-renders with React.memo
|
||||
- [ ] Audio streaming optimization
|
||||
- [ ] File upload progress indicators
|
||||
|
||||
### UX Polish
|
||||
|
||||
- [ ] Loading skeletons
|
||||
- [ ] Toast notifications
|
||||
- [ ] Keyboard navigation improvements
|
||||
- [ ] Accessibility audit
|
||||
- [ ] Responsive design refinements
|
||||
|
||||
---
|
||||
|
||||
## Dependencies Summary
|
||||
|
||||
### New npm Packages
|
||||
|
||||
```bash
|
||||
npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
|
||||
npm install -D @types/react-syntax-highlighter
|
||||
```
|
||||
|
||||
### New Rust Crates
|
||||
|
||||
```toml
|
||||
# Add to src-tauri/Cargo.toml
|
||||
rodio = "0.17" # Audio playback
|
||||
pdf-extract = "0.7" # PDF processing (optional)
|
||||
image = "0.24" # Image processing (optional)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Manual Testing Checklist
|
||||
|
||||
- [ ] All conversation operations (save/load/export)
|
||||
- [ ] Markdown rendering with various content types
|
||||
- [ ] TTS with different voices and settings
|
||||
- [ ] STT in push-to-talk and continuous modes
|
||||
- [ ] File uploads (images, PDFs, code files)
|
||||
- [ ] Keyboard shortcuts on all platforms
|
||||
- [ ] System tray interactions
|
||||
|
||||
### Automated Tests
|
||||
|
||||
- [ ] Unit tests for utility functions
|
||||
- [ ] Integration tests for API clients
|
||||
- [ ] Component tests with React Testing Library
|
||||
- [ ] E2E tests for critical user flows
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
### Known Risks
|
||||
|
||||
1. **API Costs**: ElevenLabs and Whisper can be expensive
|
||||
- **Mitigation**: Use free Web Speech API as default, make premium APIs optional
|
||||
|
||||
2. **Audio Latency**: TTS/STT pipeline may feel slow
|
||||
- **Mitigation**: Stream audio where possible, show clear loading states
|
||||
|
||||
3. **Cross-platform Issues**: Audio/shortcuts may behave differently
|
||||
- **Mitigation**: Test on Linux/macOS/Windows early and often
|
||||
|
||||
4. **File Security**: Handling user files safely
|
||||
- **Mitigation**: Strict file type validation, size limits, sandboxing
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Phase 2 is complete when:
|
||||
|
||||
- ✅ Users can save, load, and export conversations
|
||||
- ✅ Messages render with proper code highlighting and formatting
|
||||
- ✅ TTS works with at least one voice provider
|
||||
- ✅ STT works with Web Speech API
|
||||
- ✅ Users can attach and discuss files
|
||||
- ✅ Basic keyboard shortcuts are functional
|
||||
- ✅ System tray integration works on Linux
|
||||
- ✅ All features are documented
|
||||
- ✅ No critical bugs or performance issues
|
||||
|
||||
---
|
||||
|
||||
## Timeline Estimate
|
||||
|
||||
**Optimistic**: 4 weeks
|
||||
**Realistic**: 5-6 weeks
|
||||
**Conservative**: 8 weeks
|
||||
|
||||
Depends on:
|
||||
|
||||
- Time available per week
|
||||
- API complexity/issues
|
||||
- Cross-platform testing needs
|
||||
- Feature scope adjustments
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Install dependencies** for conversation management and markdown rendering
|
||||
2. **Implement conversation store** and basic save/load
|
||||
3. **Create ConversationList component** for browsing history
|
||||
4. **Enhance message rendering** with react-markdown and syntax highlighting
|
||||
5. **Integrate ElevenLabs TTS** with settings UI
|
||||
6. **Add voice input** with Web Speech API
|
||||
7. **Implement file attachments** with preview
|
||||
8. **Add system tray** and keyboard shortcuts
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: October 5, 2025
|
||||
**Status**: Ready to begin implementation
|
||||
Reference in New Issue
Block a user