Initial commit

2025-10-06 00:33:04 +01:00
commit 66749a5ce7
71 changed files with 22041 additions and 0 deletions
--- a/docs/planning/PHASE2_PLAN.md
+++ b/docs/planning/PHASE2_PLAN.md
@@ -0,0 +1,395 @@
+# Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)
+
+**Status**: 🚀 In Progress  
+**Start Date**: October 5, 2025  
+**Target Completion**: TBD  
+
+## Overview
+
+Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.
+
+## Implementation Priority Order
+
+### Priority 1: Conversation Management (Week 1)
+
+**Impact**: High | **Complexity**: Low | **Foundation for**: Export features, history search
+
+#### Features - Conversation Management
+
+- [x] Store structure already supports this (chatStore)
+- [ ] Save conversations to local storage/file system
+- [ ] Load previous conversations
+- [ ] Export conversations (JSON, Markdown, TXT)
+- [ ] Conversation metadata (title, tags, date)
+- [ ] Conversation list/browser UI
+
+#### Technical Approach - Conversation Management
+
+```typescript
+// New store: conversationStore.ts
+interface Conversation {
+  id: string
+  title: string
+  messages: ChatMessage[]
+  created: number
+  updated: number
+  tags: string[]
+  model: string
+}
+```
+
+#### Files to Create/Modify - Conversation Management
+
+- `src/stores/conversationStore.ts` - New conversation management store
+- `src/components/ConversationList.tsx` - Browse saved conversations
+- `src/components/ConversationExport.tsx` - Export functionality
+- `src-tauri/src/main.rs` - Add file system commands for save/load
+
+---
+
+### Priority 2: Advanced Message Formatting (Week 1-2)
+
+**Impact**: High | **Complexity**: Medium | **Dependencies**: None
+
+#### Features - Advanced Message Formatting
+
+- [ ] Code syntax highlighting
+- [ ] Markdown rendering with proper styling
+- [ ] LaTeX/Math equation support
+- [ ] Mermaid diagram rendering
+- [ ] Copy code blocks to clipboard
+- [ ] Collapsible code sections
+
+#### Technical Approach - Advanced Message Formatting
+
+**Dependencies to Add**:
+
+```json
+{
+  "react-markdown": "^9.0.1",
+  "react-syntax-highlighter": "^15.5.0",
+  "rehype-katex": "^7.0.0",
+  "remark-math": "^6.0.0",
+  "remark-gfm": "^4.0.0",
+  "mermaid": "^10.6.1"
+}
+```
+
+#### Files to Create/Modify - Advanced Message Formatting
+
+- `src/components/MessageContent.tsx` - Enhanced message renderer
+- `src/components/CodeBlock.tsx` - Code block with syntax highlighting
+- `src/components/MermaidDiagram.tsx` - Mermaid diagram renderer
+- `src/lib/markdown.ts` - Markdown processing utilities
+
+---
+
+### Priority 3: Text-to-Speech Integration (Week 2-3)
+
+**Impact**: High | **Complexity**: Medium | **Dependencies**: ElevenLabs API
+
+#### Features - Text-to-Speech
+
+- [ ] ElevenLabs API integration
+- [ ] Voice selection UI
+- [ ] Per-message TTS toggle
+- [ ] Speech controls (play/pause/stop)
+- [ ] Voice settings (speed, stability, clarity)
+- [ ] Audio queue management
+- [ ] Local fallback (Web Speech API)
+
+#### Technical Approach - Text-to-Speech
+
+**Dependencies to Add**:
+
+```json
+{
+  "elevenlabs": "^0.8.0"
+}
+```
+
+**New Rust Dependencies** (Cargo.toml):
+
+```toml
+rodio = "0.17"  # Audio playback
+```
+
+#### Files to Create/Modify - Text-to-Speech
+
+- `src/lib/elevenlabs.ts` - ElevenLabs API client
+- `src/lib/tts.ts` - TTS abstraction layer with fallback
+- `src/components/TTSControls.tsx` - Voice playback controls
+- `src/components/VoiceSettings.tsx` - Voice configuration UI
+- `src-tauri/src/audio.rs` - Audio playback module (Rust)
+- `src-tauri/src/main.rs` - Add audio commands
+
+#### Implementation Steps
+
+1. Create ElevenLabs API client with voice listing
+2. Add voice selection to settings
+3. Implement audio playback queue
+4. Add per-message TTS buttons
+5. Create global audio controls
+6. Implement Web Speech API fallback
+
+---
+
+### Priority 4: Speech-to-Text Integration (Week 3-4)
+
+**Impact**: High | **Complexity**: Medium-High | **Dependencies**: Web Speech API or Whisper
+
+#### Features - Speech-to-Text
+
+- [ ] Push-to-talk button
+- [ ] Continuous listening mode
+- [ ] Voice activity detection (VAD)
+- [ ] Visual feedback (waveform/mic indicator)
+- [ ] Keyboard shortcut for voice input
+- [ ] Language selection
+- [ ] Fallback to Web Speech API
+
+#### Technical Approach - Speech-to-Text
+
+##### Option A: Web Speech API (Browser)
+
+- Zero cost, works offline
+- Limited accuracy, browser-dependent
+- Good for MVP
+
+##### Option B: OpenAI Whisper API
+
+- High accuracy
+- Costs per API call
+- Better for production
+
+**Recommendation**: Start with Web Speech API, add Whisper as optional upgrade
+
+#### Files to Create/Modify - Speech-to-Text
+
+- `src/lib/stt.ts` - STT abstraction layer
+- `src/lib/whisper.ts` - OpenAI Whisper client (optional)
+- `src/components/VoiceInput.tsx` - Microphone button and controls
+- `src/components/WaveformVisualizer.tsx` - Audio visualization
+- `src/hooks/useVoiceRecording.ts` - Voice recording hook
+
+---
+
+### Priority 5: File Attachment Support (Week 4)
+
+**Impact**: Medium | **Complexity**: Medium | **Dependencies**: None
+
+#### Features - File Attachments
+
+- [ ] File upload UI (drag & drop + button)
+- [ ] Image preview and analysis
+- [ ] PDF text extraction
+- [ ] File size limits
+- [ ] Multiple file support
+- [ ] File metadata display
+
+#### Technical Approach - File Attachments
+
+**Dependencies to Add**:
+
+```json
+{
+  "pdf-parse": "^1.1.1",
+  "image-type": "^5.2.0",
+  "file-type": "^16.5.3",
+  "mime-types": "^2.1.34"
+}
+```
+
+**Rust Dependencies** (if needed for file processing):
+
+```toml
+pdf-extract = "0.7"
+image = "0.24"
+```
+
+#### Files to Create/Modify - File Attachments
+
+- `src/components/FileUpload.tsx` - Drag & drop file upload
+- `src/components/FilePreview.tsx` - Preview attached files
+- `src/lib/fileProcessor.ts` - Extract text from various formats
+- `src-tauri/src/file_handler.rs` - File processing in Rust
+- Update `chatStore.ts` - Add attachments to messages
+
+---
+
+### Priority 6: System Integration (Week 5)
+
+**Impact**: Medium | **Complexity**: Medium-High | **Dependencies**: Tauri capabilities
+
+#### Features - System Integration
+
+- [ ] Global keyboard shortcuts
+- [ ] System tray icon
+- [ ] Quick launch hotkey
+- [ ] Desktop notifications
+- [ ] Minimize to tray
+- [ ] Auto-start option
+
+#### Technical Approach - System Integration
+
+**Tauri Features to Enable** (tauri.conf.json):
+
+```json
+{
+  "tauri": {
+    "systemTray": {
+      "iconPath": "icons/tray-icon.png"
+    },
+    "bundle": {
+      "windows": {
+        "webviewInstallMode": {
+          "type": "downloadBootstrapper"
+        }
+      }
+    }
+  }
+}
+```
+
+#### Files to Create/Modify - System Integration
+
+- `src-tauri/src/tray.rs` - System tray implementation
+- `src-tauri/src/shortcuts.rs` - Global shortcut handler
+- `src/components/NotificationSettings.tsx` - Notification preferences
+- Update `src-tauri/tauri.conf.json` - Enable system tray
+
+---
+
+## Additional Improvements
+
+### Code Quality
+
+- [ ] Add unit tests for new features
+- [ ] Integration tests for API clients
+- [ ] E2E tests with Playwright
+- [ ] Error boundary components
+- [ ] Comprehensive error handling
+
+### Performance
+
+- [ ] Lazy load heavy components
+- [ ] Virtual scrolling for long conversations
+- [ ] Optimize re-renders with React.memo
+- [ ] Audio streaming optimization
+- [ ] File upload progress indicators
+
+### UX Polish
+
+- [ ] Loading skeletons
+- [ ] Toast notifications
+- [ ] Keyboard navigation improvements
+- [ ] Accessibility audit
+- [ ] Responsive design refinements
+
+---
+
+## Dependencies Summary
+
+### New npm Packages
+
+```bash
+npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
+npm install -D @types/react-syntax-highlighter
+```
+
+### New Rust Crates
+
+```toml
+# Add to src-tauri/Cargo.toml
+rodio = "0.17"           # Audio playback
+pdf-extract = "0.7"      # PDF processing (optional)
+image = "0.24"           # Image processing (optional)
+```
+
+---
+
+## Testing Strategy
+
+### Manual Testing Checklist
+
+- [ ] All conversation operations (save/load/export)
+- [ ] Markdown rendering with various content types
+- [ ] TTS with different voices and settings
+- [ ] STT in push-to-talk and continuous modes
+- [ ] File uploads (images, PDFs, code files)
+- [ ] Keyboard shortcuts on all platforms
+- [ ] System tray interactions
+
+### Automated Tests
+
+- [ ] Unit tests for utility functions
+- [ ] Integration tests for API clients
+- [ ] Component tests with React Testing Library
+- [ ] E2E tests for critical user flows
+
+---
+
+## Risk Mitigation
+
+### Known Risks
+
+1. **API Costs**: ElevenLabs and Whisper can be expensive
+   - **Mitigation**: Use free Web Speech API as default, make premium APIs optional
+
+2. **Audio Latency**: TTS/STT pipeline may feel slow
+   - **Mitigation**: Stream audio where possible, show clear loading states
+
+3. **Cross-platform Issues**: Audio/shortcuts may behave differently
+   - **Mitigation**: Test on Linux/macOS/Windows early and often
+
+4. **File Security**: Handling user files safely
+   - **Mitigation**: Strict file type validation, size limits, sandboxing
+
+---
+
+## Success Criteria
+
+Phase 2 is complete when:
+
+- ✅ Users can save, load, and export conversations
+- ✅ Messages render with proper code highlighting and formatting
+- ✅ TTS works with at least one voice provider
+- ✅ STT works with Web Speech API
+- ✅ Users can attach and discuss files
+- ✅ Basic keyboard shortcuts are functional
+- ✅ System tray integration works on Linux
+- ✅ All features are documented
+- ✅ No critical bugs or performance issues
+
+---
+
+## Timeline Estimate
+
+**Optimistic**: 4 weeks  
+**Realistic**: 5-6 weeks  
+**Conservative**: 8 weeks
+
+Depends on:
+
+- Time available per week
+- API complexity/issues
+- Cross-platform testing needs
+- Feature scope adjustments
+
+---
+
+## Next Steps
+
+1. **Install dependencies** for conversation management and markdown rendering
+2. **Implement conversation store** and basic save/load
+3. **Create ConversationList component** for browsing history
+4. **Enhance message rendering** with react-markdown and syntax highlighting
+5. **Integrate ElevenLabs TTS** with settings UI
+6. **Add voice input** with Web Speech API
+7. **Implement file attachments** with preview
+8. **Add system tray** and keyboard shortcuts
+
+---
+
+**Last Updated**: October 5, 2025  
+**Status**: Ready to begin implementation