eve-alpha/docs/planning/PHASE2_PLAN.md

# Phase 2 Implementation Plan - Enhanced Capabilities (v0.2.0)

**Status**: 🚀 In Progress
**Start Date**: October 5, 2025
**Target Completion**: TBD

## Overview

Phase 2 builds upon the stable v0.1.0 foundation to add enhanced interaction capabilities, improved UX, and productivity features.

## Implementation Priority Order

### Priority 1: Conversation Management (Week 1)

**Impact**: High | **Complexity**: Low | **Foundation for**: Export features, history search

#### Features - Conversation Management

- [x] Store structure already supports this (chatStore)
- [ ] Save conversations to local storage/file system
- [ ] Load previous conversations
- [ ] Export conversations (JSON, Markdown, TXT)
- [ ] Conversation metadata (title, tags, date)
- [ ] Conversation list/browser UI

#### Technical Approach - Conversation Management

```typescript
// New store: conversationStore.ts
interface Conversation {
  id: string
  title: string
  messages: ChatMessage[]
  created: number
  updated: number
  tags: string[]
  model: string
}
```

#### Files to Create/Modify - Conversation Management

- `src/stores/conversationStore.ts` - New conversation management store
- `src/components/ConversationList.tsx` - Browse saved conversations
- `src/components/ConversationExport.tsx` - Export functionality
- `src-tauri/src/main.rs` - Add file system commands for save/load

---

### Priority 2: Advanced Message Formatting (Week 1-2)

**Impact**: High | **Complexity**: Medium | **Dependencies**: None

#### Features - Advanced Message Formatting

- [ ] Code syntax highlighting
- [ ] Markdown rendering with proper styling
- [ ] LaTeX/Math equation support
- [ ] Mermaid diagram rendering
- [ ] Copy code blocks to clipboard
- [ ] Collapsible code sections

#### Technical Approach - Advanced Message Formatting

**Dependencies to Add**:

```json
{
  "react-markdown": "^9.0.1",
  "react-syntax-highlighter": "^15.5.0",
  "rehype-katex": "^7.0.0",
  "remark-math": "^6.0.0",
  "remark-gfm": "^4.0.0",
  "mermaid": "^10.6.1"
}
```

#### Files to Create/Modify - Advanced Message Formatting

- `src/components/MessageContent.tsx` - Enhanced message renderer
- `src/components/CodeBlock.tsx` - Code block with syntax highlighting
- `src/components/MermaidDiagram.tsx` - Mermaid diagram renderer
- `src/lib/markdown.ts` - Markdown processing utilities

---

### Priority 3: Text-to-Speech Integration (Week 2-3)

**Impact**: High | **Complexity**: Medium | **Dependencies**: ElevenLabs API

#### Features - Text-to-Speech

- [ ] ElevenLabs API integration
- [ ] Voice selection UI
- [ ] Per-message TTS toggle
- [ ] Speech controls (play/pause/stop)
- [ ] Voice settings (speed, stability, clarity)
- [ ] Audio queue management
- [ ] Local fallback (Web Speech API)

#### Technical Approach - Text-to-Speech

**Dependencies to Add**:

```json
{
  "elevenlabs": "^0.8.0"
}
```

**New Rust Dependencies** (Cargo.toml):

```toml
rodio = "0.17"  # Audio playback
```

#### Files to Create/Modify - Text-to-Speech

- `src/lib/elevenlabs.ts` - ElevenLabs API client
- `src/lib/tts.ts` - TTS abstraction layer with fallback
- `src/components/TTSControls.tsx` - Voice playback controls
- `src/components/VoiceSettings.tsx` - Voice configuration UI
- `src-tauri/src/audio.rs` - Audio playback module (Rust)
- `src-tauri/src/main.rs` - Add audio commands

#### Implementation Steps

1. Create ElevenLabs API client with voice listing
2. Add voice selection to settings
3. Implement audio playback queue
4. Add per-message TTS buttons
5. Create global audio controls
6. Implement Web Speech API fallback

---

### Priority 4: Speech-to-Text Integration (Week 3-4)

**Impact**: High | **Complexity**: Medium-High | **Dependencies**: Web Speech API or Whisper

#### Features - Speech-to-Text

- [ ] Push-to-talk button
- [ ] Continuous listening mode
- [ ] Voice activity detection (VAD)
- [ ] Visual feedback (waveform/mic indicator)
- [ ] Keyboard shortcut for voice input
- [ ] Language selection
- [ ] Fallback to Web Speech API

#### Technical Approach - Speech-to-Text

##### Option A: Web Speech API (Browser)

- Zero cost, works offline
- Limited accuracy, browser-dependent
- Good for MVP

##### Option B: OpenAI Whisper API

- High accuracy
- Costs per API call
- Better for production

**Recommendation**: Start with Web Speech API, add Whisper as optional upgrade

#### Files to Create/Modify - Speech-to-Text

- `src/lib/stt.ts` - STT abstraction layer
- `src/lib/whisper.ts` - OpenAI Whisper client (optional)
- `src/components/VoiceInput.tsx` - Microphone button and controls
- `src/components/WaveformVisualizer.tsx` - Audio visualization
- `src/hooks/useVoiceRecording.ts` - Voice recording hook

---

### Priority 5: File Attachment Support (Week 4)

**Impact**: Medium | **Complexity**: Medium | **Dependencies**: None

#### Features - File Attachments

- [ ] File upload UI (drag & drop + button)
- [ ] Image preview and analysis
- [ ] PDF text extraction
- [ ] File size limits
- [ ] Multiple file support
- [ ] File metadata display

#### Technical Approach - File Attachments

**Dependencies to Add**:

```json
{
  "pdf-parse": "^1.1.1",
  "image-type": "^5.2.0",
  "file-type": "^16.5.3",
  "mime-types": "^2.1.34"
}
```

**Rust Dependencies** (if needed for file processing):

```toml
pdf-extract = "0.7"
image = "0.24"
```

#### Files to Create/Modify - File Attachments

- `src/components/FileUpload.tsx` - Drag & drop file upload
- `src/components/FilePreview.tsx` - Preview attached files
- `src/lib/fileProcessor.ts` - Extract text from various formats
- `src-tauri/src/file_handler.rs` - File processing in Rust
- Update `chatStore.ts` - Add attachments to messages

---

### Priority 6: System Integration (Week 5)

**Impact**: Medium | **Complexity**: Medium-High | **Dependencies**: Tauri capabilities

#### Features - System Integration

- [ ] Global keyboard shortcuts
- [ ] System tray icon
- [ ] Quick launch hotkey
- [ ] Desktop notifications
- [ ] Minimize to tray
- [ ] Auto-start option

#### Technical Approach - System Integration

**Tauri Features to Enable** (tauri.conf.json):

```json
{
  "tauri": {
    "systemTray": {
      "iconPath": "icons/tray-icon.png"
    },
    "bundle": {
      "windows": {
        "webviewInstallMode": {
          "type": "downloadBootstrapper"
        }
      }
    }
  }
}
```

#### Files to Create/Modify - System Integration

- `src-tauri/src/tray.rs` - System tray implementation
- `src-tauri/src/shortcuts.rs` - Global shortcut handler
- `src/components/NotificationSettings.tsx` - Notification preferences
- Update `src-tauri/tauri.conf.json` - Enable system tray

---

## Additional Improvements

### Code Quality

- [ ] Add unit tests for new features
- [ ] Integration tests for API clients
- [ ] E2E tests with Playwright
- [ ] Error boundary components
- [ ] Comprehensive error handling

### Performance

- [ ] Lazy load heavy components
- [ ] Virtual scrolling for long conversations
- [ ] Optimize re-renders with React.memo
- [ ] Audio streaming optimization
- [ ] File upload progress indicators

### UX Polish

- [ ] Loading skeletons
- [ ] Toast notifications
- [ ] Keyboard navigation improvements
- [ ] Accessibility audit
- [ ] Responsive design refinements

---

## Dependencies Summary

### New npm Packages

```bash
npm install react-markdown react-syntax-highlighter rehype-katex remark-math remark-gfm mermaid elevenlabs pdf-parse image-type
npm install -D @types/react-syntax-highlighter
```

### New Rust Crates

```toml
# Add to src-tauri/Cargo.toml
rodio = "0.17"           # Audio playback
pdf-extract = "0.7"      # PDF processing (optional)
image = "0.24"           # Image processing (optional)
```

---

## Testing Strategy

### Manual Testing Checklist

- [ ] All conversation operations (save/load/export)
- [ ] Markdown rendering with various content types
- [ ] TTS with different voices and settings
- [ ] STT in push-to-talk and continuous modes
- [ ] File uploads (images, PDFs, code files)
- [ ] Keyboard shortcuts on all platforms
- [ ] System tray interactions

### Automated Tests

- [ ] Unit tests for utility functions
- [ ] Integration tests for API clients
- [ ] Component tests with React Testing Library
- [ ] E2E tests for critical user flows

---

## Risk Mitigation

### Known Risks

1. **API Costs**: ElevenLabs and Whisper can be expensive
   - **Mitigation**: Use free Web Speech API as default, make premium APIs optional

2. **Audio Latency**: TTS/STT pipeline may feel slow
   - **Mitigation**: Stream audio where possible, show clear loading states

3. **Cross-platform Issues**: Audio/shortcuts may behave differently
   - **Mitigation**: Test on Linux/macOS/Windows early and often

4. **File Security**: Handling user files safely
   - **Mitigation**: Strict file type validation, size limits, sandboxing

---

## Success Criteria

Phase 2 is complete when:

- ✅ Users can save, load, and export conversations
- ✅ Messages render with proper code highlighting and formatting
- ✅ TTS works with at least one voice provider
- ✅ STT works with Web Speech API
- ✅ Users can attach and discuss files
- ✅ Basic keyboard shortcuts are functional
- ✅ System tray integration works on Linux
- ✅ All features are documented
- ✅ No critical bugs or performance issues

---

## Timeline Estimate

**Optimistic**: 4 weeks
**Realistic**: 5-6 weeks
**Conservative**: 8 weeks

Depends on:

- Time available per week
- API complexity/issues
- Cross-platform testing needs
- Feature scope adjustments

---

## Next Steps

1. **Install dependencies** for conversation management and markdown rendering
2. **Implement conversation store** and basic save/load
3. **Create ConversationList component** for browsing history
4. **Enhance message rendering** with react-markdown and syntax highlighting
5. **Integrate ElevenLabs TTS** with settings UI
6. **Add voice input** with Web Speech API
7. **Implement file attachments** with preview
8. **Add system tray** and keyboard shortcuts

---

**Last Updated**: October 5, 2025
**Status**: Ready to begin implementation