396 lines
9.9 KiB
Markdown
396 lines
9.9 KiB
Markdown
# Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)
|
|
|
|
**Status**: 🚀 In Progress
|
|
**Start Date**: October 5, 2025
|
|
**Target Completion**: TBD
|
|
|
|
## Overview
|
|
|
|
Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.
|
|
|
|
## Implementation Priority Order
|
|
|
|
### Priority 1: Conversation Management (Week 1)
|
|
|
|
**Impact**: High | **Complexity**: Low | **Foundation for**: Export features, history search
|
|
|
|
#### Features - Conversation Management
|
|
|
|
- [x] Store structure already supports this (chatStore)
|
|
- [ ] Save conversations to local storage/file system
|
|
- [ ] Load previous conversations
|
|
- [ ] Export conversations (JSON, Markdown, TXT)
|
|
- [ ] Conversation metadata (title, tags, date)
|
|
- [ ] Conversation list/browser UI
|
|
|
|
#### Technical Approach - Conversation Management
|
|
|
|
```typescript
|
|
// New store: conversationStore.ts
|
|
interface Conversation {
|
|
id: string
|
|
title: string
|
|
messages: ChatMessage[]
|
|
created: number
|
|
updated: number
|
|
tags: string[]
|
|
model: string
|
|
}
|
|
```
|
|
|
|
#### Files to Create/Modify - Conversation Management
|
|
|
|
- `src/stores/conversationStore.ts` - New conversation management store
|
|
- `src/components/ConversationList.tsx` - Browse saved conversations
|
|
- `src/components/ConversationExport.tsx` - Export functionality
|
|
- `src-tauri/src/main.rs` - Add file system commands for save/load
|
|
|
|
---
|
|
|
|
### Priority 2: Advanced Message Formatting (Week 1-2)
|
|
|
|
**Impact**: High | **Complexity**: Medium | **Dependencies**: None
|
|
|
|
#### Features - Advanced Message Formatting
|
|
|
|
- [ ] Code syntax highlighting
|
|
- [ ] Markdown rendering with proper styling
|
|
- [ ] LaTeX/Math equation support
|
|
- [ ] Mermaid diagram rendering
|
|
- [ ] Copy code blocks to clipboard
|
|
- [ ] Collapsible code sections
|
|
|
|
#### Technical Approach - Advanced Message Formatting
|
|
|
|
**Dependencies to Add**:
|
|
|
|
```json
|
|
{
|
|
"react-markdown": "^9.0.1",
|
|
"react-syntax-highlighter": "^15.5.0",
|
|
"rehype-katex": "^7.0.0",
|
|
"remark-math": "^6.0.0",
|
|
"remark-gfm": "^4.0.0",
|
|
"mermaid": "^10.6.1"
|
|
}
|
|
```
|
|
|
|
#### Files to Create/Modify - Advanced Message Formatting
|
|
|
|
- `src/components/MessageContent.tsx` - Enhanced message renderer
|
|
- `src/components/CodeBlock.tsx` - Code block with syntax highlighting
|
|
- `src/components/MermaidDiagram.tsx` - Mermaid diagram renderer
|
|
- `src/lib/markdown.ts` - Markdown processing utilities
|
|
|
|
---
|
|
|
|
### Priority 3: Text-to-Speech Integration (Week 2-3)
|
|
|
|
**Impact**: High | **Complexity**: Medium | **Dependencies**: ElevenLabs API
|
|
|
|
#### Features - Text-to-Speech
|
|
|
|
- [ ] ElevenLabs API integration
|
|
- [ ] Voice selection UI
|
|
- [ ] Per-message TTS toggle
|
|
- [ ] Speech controls (play/pause/stop)
|
|
- [ ] Voice settings (speed, stability, clarity)
|
|
- [ ] Audio queue management
|
|
- [ ] Local fallback (Web Speech API)
|
|
|
|
#### Technical Approach - Text-to-Speech
|
|
|
|
**Dependencies to Add**:
|
|
|
|
```json
|
|
{
|
|
"elevenlabs": "^0.8.0"
|
|
}
|
|
```
|
|
|
|
**New Rust Dependencies** (Cargo.toml):
|
|
|
|
```toml
|
|
rodio = "0.17" # Audio playback
|
|
```
|
|
|
|
#### Files to Create/Modify - Text-to-Speech
|
|
|
|
- `src/lib/elevenlabs.ts` - ElevenLabs API client
|
|
- `src/lib/tts.ts` - TTS abstraction layer with fallback
|
|
- `src/components/TTSControls.tsx` - Voice playback controls
|
|
- `src/components/VoiceSettings.tsx` - Voice configuration UI
|
|
- `src-tauri/src/audio.rs` - Audio playback module (Rust)
|
|
- `src-tauri/src/main.rs` - Add audio commands
|
|
|
|
#### Implementation Steps
|
|
|
|
1. Create ElevenLabs API client with voice listing
|
|
2. Add voice selection to settings
|
|
3. Implement audio playback queue
|
|
4. Add per-message TTS buttons
|
|
5. Create global audio controls
|
|
6. Implement Web Speech API fallback
|
|
|
|
---
|
|
|
|
### Priority 4: Speech-to-Text Integration (Week 3-4)
|
|
|
|
**Impact**: High | **Complexity**: Medium-High | **Dependencies**: Web Speech API or Whisper
|
|
|
|
#### Features - Speech-to-Text
|
|
|
|
- [ ] Push-to-talk button
|
|
- [ ] Continuous listening mode
|
|
- [ ] Voice activity detection (VAD)
|
|
- [ ] Visual feedback (waveform/mic indicator)
|
|
- [ ] Keyboard shortcut for voice input
|
|
- [ ] Language selection
|
|
- [ ] Fallback to Web Speech API
|
|
|
|
#### Technical Approach - Speech-to-Text
|
|
|
|
##### Option A: Web Speech API (Browser)
|
|
|
|
- Zero cost, works offline
|
|
- Limited accuracy, browser-dependent
|
|
- Good for MVP
|
|
|
|
##### Option B: OpenAI Whisper API
|
|
|
|
- High accuracy
|
|
- Costs per API call
|
|
- Better for production
|
|
|
|
**Recommendation**: Start with Web Speech API, add Whisper as optional upgrade
|
|
|
|
#### Files to Create/Modify - Speech-to-Text
|
|
|
|
- `src/lib/stt.ts` - STT abstraction layer
|
|
- `src/lib/whisper.ts` - OpenAI Whisper client (optional)
|
|
- `src/components/VoiceInput.tsx` - Microphone button and controls
|
|
- `src/components/WaveformVisualizer.tsx` - Audio visualization
|
|
- `src/hooks/useVoiceRecording.ts` - Voice recording hook
|
|
|
|
---
|
|
|
|
### Priority 5: File Attachment Support (Week 4)
|
|
|
|
**Impact**: Medium | **Complexity**: Medium | **Dependencies**: None
|
|
|
|
#### Features - File Attachments
|
|
|
|
- [ ] File upload UI (drag & drop + button)
|
|
- [ ] Image preview and analysis
|
|
- [ ] PDF text extraction
|
|
- [ ] File size limits
|
|
- [ ] Multiple file support
|
|
- [ ] File metadata display
|
|
|
|
#### Technical Approach - File Attachments
|
|
|
|
**Dependencies to Add**:
|
|
|
|
```json
|
|
{
|
|
"pdf-parse": "^1.1.1",
|
|
"image-type": "^5.2.0",
|
|
"file-type": "^16.5.3",
|
|
"mime-types": "^2.1.34"
|
|
}
|
|
```
|
|
|
|
**Rust Dependencies** (if needed for file processing):
|
|
|
|
```toml
|
|
pdf-extract = "0.7"
|
|
image = "0.24"
|
|
```
|
|
|
|
#### Files to Create/Modify - File Attachments
|
|
|
|
- `src/components/FileUpload.tsx` - Drag & drop file upload
|
|
- `src/components/FilePreview.tsx` - Preview attached files
|
|
- `src/lib/fileProcessor.ts` - Extract text from various formats
|
|
- `src-tauri/src/file_handler.rs` - File processing in Rust
|
|
- Update `chatStore.ts` - Add attachments to messages
|
|
|
|
---
|
|
|
|
### Priority 6: System Integration (Week 5)
|
|
|
|
**Impact**: Medium | **Complexity**: Medium-High | **Dependencies**: Tauri capabilities
|
|
|
|
#### Features - System Integration
|
|
|
|
- [ ] Global keyboard shortcuts
|
|
- [ ] System tray icon
|
|
- [ ] Quick launch hotkey
|
|
- [ ] Desktop notifications
|
|
- [ ] Minimize to tray
|
|
- [ ] Auto-start option
|
|
|
|
#### Technical Approach - System Integration
|
|
|
|
**Tauri Features to Enable** (tauri.conf.json):
|
|
|
|
```json
|
|
{
|
|
"tauri": {
|
|
"systemTray": {
|
|
"iconPath": "icons/tray-icon.png"
|
|
},
|
|
"bundle": {
|
|
"windows": {
|
|
"webviewInstallMode": {
|
|
"type": "downloadBootstrapper"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
#### Files to Create/Modify - System Integration
|
|
|
|
- `src-tauri/src/tray.rs` - System tray implementation
|
|
- `src-tauri/src/shortcuts.rs` - Global shortcut handler
|
|
- `src/components/NotificationSettings.tsx` - Notification preferences
|
|
- Update `src-tauri/tauri.conf.json` - Enable system tray
|
|
|
|
---
|
|
|
|
## Additional Improvements
|
|
|
|
### Code Quality
|
|
|
|
- [ ] Add unit tests for new features
|
|
- [ ] Integration tests for API clients
|
|
- [ ] E2E tests with Playwright
|
|
- [ ] Error boundary components
|
|
- [ ] Comprehensive error handling
|
|
|
|
### Performance
|
|
|
|
- [ ] Lazy load heavy components
|
|
- [ ] Virtual scrolling for long conversations
|
|
- [ ] Optimize re-renders with React.memo
|
|
- [ ] Audio streaming optimization
|
|
- [ ] File upload progress indicators
|
|
|
|
### UX Polish
|
|
|
|
- [ ] Loading skeletons
|
|
- [ ] Toast notifications
|
|
- [ ] Keyboard navigation improvements
|
|
- [ ] Accessibility audit
|
|
- [ ] Responsive design refinements
|
|
|
|
---
|
|
|
|
## Dependencies Summary
|
|
|
|
### New npm Packages
|
|
|
|
```bash
|
|
npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
|
|
npm install -D @types/react-syntax-highlighter
|
|
```
|
|
|
|
### New Rust Crates
|
|
|
|
```toml
|
|
# Add to src-tauri/Cargo.toml
|
|
rodio = "0.17" # Audio playback
|
|
pdf-extract = "0.7" # PDF processing (optional)
|
|
image = "0.24" # Image processing (optional)
|
|
```
|
|
|
|
---
|
|
|
|
## Testing Strategy
|
|
|
|
### Manual Testing Checklist
|
|
|
|
- [ ] All conversation operations (save/load/export)
|
|
- [ ] Markdown rendering with various content types
|
|
- [ ] TTS with different voices and settings
|
|
- [ ] STT in push-to-talk and continuous modes
|
|
- [ ] File uploads (images, PDFs, code files)
|
|
- [ ] Keyboard shortcuts on all platforms
|
|
- [ ] System tray interactions
|
|
|
|
### Automated Tests
|
|
|
|
- [ ] Unit tests for utility functions
|
|
- [ ] Integration tests for API clients
|
|
- [ ] Component tests with React Testing Library
|
|
- [ ] E2E tests for critical user flows
|
|
|
|
---
|
|
|
|
## Risk Mitigation
|
|
|
|
### Known Risks
|
|
|
|
1. **API Costs**: ElevenLabs and Whisper can be expensive
|
|
- **Mitigation**: Use free Web Speech API as default, make premium APIs optional
|
|
|
|
2. **Audio Latency**: TTS/STT pipeline may feel slow
|
|
- **Mitigation**: Stream audio where possible, show clear loading states
|
|
|
|
3. **Cross-platform Issues**: Audio/shortcuts may behave differently
|
|
- **Mitigation**: Test on Linux/macOS/Windows early and often
|
|
|
|
4. **File Security**: Handling user files safely
|
|
- **Mitigation**: Strict file type validation, size limits, sandboxing
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
Phase 2 is complete when:
|
|
|
|
- ✅ Users can save, load, and export conversations
|
|
- ✅ Messages render with proper code highlighting and formatting
|
|
- ✅ TTS works with at least one voice provider
|
|
- ✅ STT works with Web Speech API
|
|
- ✅ Users can attach and discuss files
|
|
- ✅ Basic keyboard shortcuts are functional
|
|
- ✅ System tray integration works on Linux
|
|
- ✅ All features are documented
|
|
- ✅ No critical bugs or performance issues
|
|
|
|
---
|
|
|
|
## Timeline Estimate
|
|
|
|
**Optimistic**: 4 weeks
|
|
**Realistic**: 5-6 weeks
|
|
**Conservative**: 8 weeks
|
|
|
|
Depends on:
|
|
|
|
- Time available per week
|
|
- API complexity/issues
|
|
- Cross-platform testing needs
|
|
- Feature scope adjustments
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Install dependencies** for conversation management and markdown rendering
|
|
2. **Implement conversation store** and basic save/load
|
|
3. **Create ConversationList component** for browsing history
|
|
4. **Enhance message rendering** with react-markdown and syntax highlighting
|
|
5. **Integrate ElevenLabs TTS** with settings UI
|
|
6. **Add voice input** with Web Speech API
|
|
7. **Implement file attachments** with preview
|
|
8. **Add system tray** and keyboard shortcuts
|
|
|
|
---
|
|
|
|
**Last Updated**: October 5, 2025
|
|
**Status**: Ready to begin implementation
|