Bugfixes and updated audio playback.

2025-10-06 23:25:21 +01:00
parent f2881710ea
commit 0a7b164b29
15 changed files with 1875 additions and 107 deletions
--- a/docs/planning/PHASE3_PLAN.md
+++ b/docs/planning/PHASE3_PLAN.md
@@ -0,0 +1,574 @@
+# Phase 3 - Knowledge Base & Memory (v0.3.0)
+
+**Target Version**: v0.3.0  
+**Estimated Duration**: 20-30 hours  
+**Priority**: High  
+**Status**: 📋 Planning
+
+---
+
+## 🎯 Phase 3 Goals
+
+Transform EVE from a conversational assistant into an **intelligent knowledge companion** with:
+1. **Long-term memory** - Remember past conversations and user preferences
+2. **Document library** - Manage and reference documents
+3. **Vision capabilities** - Generate and analyze images
+4. **Web access** - Real-time information retrieval
+
+---
+
+## 📊 Feature Breakdown
+
+### 1. Long-Term Memory System
+**Priority**: Critical  
+**Estimated Time**: 8-10 hours
+
+#### Objectives
+- Store and retrieve conversational context across sessions
+- Semantic search through all conversations
+- Auto-extract and store key information
+- Build personal knowledge graph
+
+#### Technical Approach
+
+**A. Vector Database Integration**
+- **Options**:
+  1. ChromaDB (lightweight, local-first)
+  2. LanceDB (Rust-based, fast)
+  3. SQLite + vector extension
+- **Recommendation**: ChromaDB for ease of use
+- **Storage**: Embed messages, extract entities, store relationships
+
+**B. Embedding Pipeline**
+```
+User Message → OpenAI Embeddings API → Vector Store
+                    ↓
+            Semantic Search ← Query
+                    ↓
+            Retrieved Context → Enhanced Prompt
+```
+
+**C. Implementation Plan**
+1. Set up vector database (ChromaDB)
+2. Create embedding service (`src/lib/embeddings.ts`)
+3. Background job to embed existing messages
+4. Add semantic search to conversation store
+5. UI for memory search and management
+6. Context injection for relevant memories
+
+**D. Files to Create**
+- `src/lib/embeddings.ts` - Embedding service
+- `src/lib/vectordb.ts` - Vector database client
+- `src/stores/memoryStore.ts` - Memory state management
+- `src/components/MemorySearch.tsx` - Search UI
+- `src/components/MemoryPanel.tsx` - Memory management UI
+
+**E. Features**
+- [x] Vector database setup
+- [x] Automatic message embedding
+- [x] Semantic search interface
+- [x] Memory extraction (entities, facts)
+- [x] Knowledge graph visualization
+- [x] Context injection in prompts
+- [x] Memory management UI
+
+---
+
+### 2. Document Library
+**Priority**: High  
+**Estimated Time**: 6-8 hours
+
+#### Objectives
+- Upload and store reference documents
+- Full-text search across documents
+- Automatic document summarization
+- Link documents to conversations
+
+#### Technical Approach
+
+**A. Document Storage**
+- **Backend**: Tauri file system access
+- **Location**: `{app_data_dir}/documents/`
+- **Indexing**: SQLite FTS5 for full-text search
+- **Metadata**: Title, author, date, tags, summary
+
+**B. Document Processing Pipeline**
+```
+Upload → Parse (PDF/DOCX/MD) → Extract Text → Embed Chunks
+           ↓                        ↓              ↓
+       Metadata              Full-Text Index   Vector Store
+```
+
+**C. Implementation Plan**
+1. Rust commands for file management
+2. Document parser library integration
+3. SQLite database for metadata and FTS
+4. Chunking and embedding for semantic search
+5. Document viewer component
+6. Library management UI
+
+**D. Files to Create**
+- `src-tauri/src/documents.rs` - Document management (Rust)
+- `src/lib/documentParser.ts` - Document parsing
+- `src/stores/documentStore.ts` - Document state
+- `src/components/DocumentLibrary.tsx` - Library UI
+- `src/components/DocumentViewer.tsx` - Document viewer
+
+**E. Features**
+- [x] Upload documents (PDF, DOCX, TXT, MD)
+- [x] Full-text search
+- [x] Document categorization
+- [x] Automatic summarization
+- [x] Reference in conversations
+- [x] Document viewer
+- [x] Export/backup library
+
+**F. Dependencies**
+```json
+{
+  "pdf-parse": "^1.1.1",           // PDF parsing
+  "mammoth": "^1.6.0",              // DOCX parsing
+  "better-sqlite3": "^9.0.0"        // SQLite
+}
+```
+
+---
+
+### 3. Vision & Image Generation
+**Priority**: High  
+**Estimated Time**: 4-6 hours
+
+#### Objectives
+- Generate images from text prompts
+- Analyze uploaded images
+- Edit and manipulate existing images
+- Screenshot annotation tools
+
+#### Technical Approach
+
+**A. Image Generation**
+- **Provider**: DALL-E 3 (via OpenAI API)
+- **Alternative**: Stable Diffusion (local)
+- **Storage**: `{app_data_dir}/generated_images/`
+
+**B. Image Analysis**
+- **Provider**: GPT-4 Vision (OpenAI)
+- **Features**: 
+  - Describe images
+  - Extract text (OCR)
+  - Answer questions about images
+  - Compare multiple images
+
+**C. Implementation Plan**
+1. OpenAI Vision API integration
+2. DALL-E 3 API integration
+3. Image storage and management
+4. Image generation UI
+5. Image analysis in chat
+6. Gallery component
+
+**D. Files to Create**
+- `src/lib/vision.ts` - Vision API client
+- `src/lib/imageGeneration.ts` - DALL-E client
+- `src/components/ImageGenerator.tsx` - Generation UI
+- `src/components/ImageGallery.tsx` - Gallery view
+- `src/stores/imageStore.ts` - Image state
+
+**E. Features**
+- [x] Text-to-image generation
+- [x] Image analysis and description
+- [x] OCR text extraction
+- [x] Image-based conversations
+- [x] Generation history
+- [x] Image editing tools (basic)
+- [x] Screenshot capture and analysis
+
+**F. Dependencies**
+```json
+{
+  "openai": "^4.0.0"  // Already installed
+}
+```
+
+---
+
+### 4. Web Access & Real-Time Information
+**Priority**: Medium  
+**Estimated Time**: 6-8 hours
+
+#### Objectives
+- Search the web for current information
+- Extract and summarize web content
+- Integrate news and articles
+- Fact-checking capabilities
+
+#### Technical Approach
+
+**A. Web Search**
+- **Options**:
+  1. Brave Search API (privacy-focused, free tier)
+  2. SerpAPI (Google results, paid)
+  3. Custom scraper (legal concerns)
+- **Recommendation**: Brave Search API
+
+**B. Content Extraction**
+- **Library**: Mozilla Readability or Cheerio
+- **Process**: Fetch → Parse → Clean → Summarize
+- **Caching**: Store extracted content locally
+
+**C. Implementation Plan**
+1. Web search API integration
+2. Content extraction service
+3. URL preview component
+4. Web search command in chat
+5. Article summarization
+6. Citation tracking
+
+**D. Files to Create**
+- `src/lib/webSearch.ts` - Search API client
+- `src/lib/webScraper.ts` - Content extraction
+- `src/components/WebSearchPanel.tsx` - Search UI
+- `src/components/ArticlePreview.tsx` - Preview component
+- `src/stores/webStore.ts` - Web content state
+
+**E. Features**
+- [x] Web search from chat
+- [x] URL content extraction
+- [x] Article summarization
+- [x] News aggregation
+- [x] Fact verification
+- [x] Source citations
+- [x] Link preview cards
+
+**F. Commands**
+```typescript
+// In-chat commands
+/search [query]       // Web search
+/summarize [url]      // Summarize article
+/news [topic]         // Get latest news
+/fact-check [claim]   // Verify information
+```
+
+**G. Dependencies**
+```json
+{
+  "cheerio": "^1.0.0-rc.12",       // HTML parsing
+  "@mozilla/readability": "^0.5.0", // Content extraction
+  "node-fetch": "^3.3.2"            // HTTP requests
+}
+```
+
+---
+
+## 🗂️ Database Schema
+
+### Memory Database (Vector Store)
+```typescript
+interface Memory {
+  id: string
+  conversationId: string
+  messageId: string
+  content: string
+  embedding: number[]           // 1536-dim vector
+  entities: string[]            // Extracted entities
+  timestamp: number
+  importance: number            // 0-1 relevance score
+  metadata: {
+    speaker: 'user' | 'assistant'
+    tags: string[]
+    references: string[]        // Related memory IDs
+  }
+}
+```
+
+### Document Database (SQLite)
+```sql
+CREATE TABLE documents (
+  id TEXT PRIMARY KEY,
+  title TEXT NOT NULL,
+  filename TEXT NOT NULL,
+  filepath TEXT NOT NULL,
+  content TEXT,                 -- Full text for FTS
+  summary TEXT,
+  file_type TEXT,               -- pdf, docx, txt, md
+  file_size INTEGER,
+  upload_date INTEGER,
+  tags TEXT,                    -- JSON array
+  metadata TEXT                 -- JSON object
+);
+
+CREATE VIRTUAL TABLE documents_fts USING fts5(
+  content,
+  title,
+  tags
+);
+```
+
+### Image Database (SQLite)
+```sql
+CREATE TABLE images (
+  id TEXT PRIMARY KEY,
+  filename TEXT NOT NULL,
+  filepath TEXT NOT NULL,
+  prompt TEXT,                  -- For generated images
+  description TEXT,             -- AI-generated description
+  analysis TEXT,                -- Detailed analysis
+  width INTEGER,
+  height INTEGER,
+  file_size INTEGER,
+  created_date INTEGER,
+  source TEXT,                  -- 'generated', 'uploaded', 'screenshot'
+  metadata TEXT                 -- JSON object
+);
+```
+
+---
+
+## 🎨 UI Components
+
+### New Screens
+1. **Memory Dashboard** (`/memory`)
+   - Knowledge graph visualization
+   - Memory timeline
+   - Entity browser
+   - Search interface
+
+2. **Document Library** (`/documents`)
+   - Grid/list view
+   - Upload area
+   - Search and filter
+   - Document viewer
+
+3. **Image Gallery** (`/images`)
+   - Masonry layout
+   - Generation form
+   - Image details panel
+   - Edit tools
+
+4. **Web Research** (`/web`)
+   - Search interface
+   - Article list
+   - Preview panel
+   - Saved articles
+
+### Enhanced Components
+1. **Chat Interface**
+   - Memory context indicator
+   - Document reference links
+   - Image inline display
+   - Web search results
+
+2. **Settings**
+   - Memory settings (retention, privacy)
+   - API keys (OpenAI, Brave)
+   - Storage management
+   - Feature toggles
+
+---
+
+## 🔧 Technical Architecture
+
+### State Management
+```typescript
+// New Stores
+memoryStore       // Memory & knowledge graph
+documentStore     // Document library
+imageStore        // Image gallery
+webStore          // Web search & articles
+
+// Enhanced Stores
+chatStore         // Add memory injection
+settingsStore     // Add new API keys
+```
+
+### Backend (Rust)
+```rust
+// New modules
+src-tauri/src/
+  ├── memory/
+  │   ├── embeddings.rs
+  │   └── vectordb.rs
+  ├── documents/
+  │   ├── parser.rs
+  │   ├── storage.rs
+  │   └── search.rs
+  └── images/
+      ├── generator.rs
+      └── storage.rs
+```
+
+### API Integration
+```typescript
+// New API clients
+OpenAI Embeddings API   // Text embeddings
+OpenAI Vision API       // Image analysis
+DALL-E 3 API           // Image generation
+Brave Search API       // Web search
+```
+
+---
+
+## 📦 Dependencies
+
+### Frontend
+```json
+{
+  "chromadb": "^1.7.0",                    // Vector database
+  "better-sqlite3": "^9.0.0",              // SQLite
+  "cheerio": "^1.0.0-rc.12",               // Web scraping
+  "@mozilla/readability": "^0.5.0",        // Content extraction
+  "d3": "^7.8.5",                          // Knowledge graph viz
+  "react-force-graph": "^1.43.0",          // Graph component
+  "pdfjs-dist": "^3.11.174",               // PDF preview
+  "react-image-gallery": "^1.3.0"          // Image gallery
+}
+```
+
+### Backend (Rust)
+```toml
+[dependencies]
+chromadb = "0.1"              # Vector DB client
+rusqlite = "0.30"             # SQLite
+pdf-extract = "0.7"           # PDF parsing
+lopdf = "0.31"                # PDF manipulation
+image = "0.24"                # Image processing
+```
+
+---
+
+## 🚀 Implementation Timeline
+
+### Week 1: Foundation (8-10 hours)
+- **Days 1-2**: Vector database setup
+- **Day 3**: Embedding pipeline
+- **Day 4**: Memory store and basic UI
+- **Day 5**: Testing and refinement
+
+### Week 2: Documents & Vision (10-12 hours)
+- **Days 1-2**: Document storage and parsing
+- **Day 3**: Full-text search implementation
+- **Day 4**: Vision API integration
+- **Day 5**: Image generation UI
+
+### Week 3: Web & Polish (6-8 hours)
+- **Days 1-2**: Web search integration
+- **Day 3**: Content extraction
+- **Day 4**: UI polish and testing
+- **Day 5**: Documentation
+
+**Total Estimated Time**: 24-30 hours
+
+---
+
+## 🎯 Success Metrics
+
+### Functionality
+- [ ] Can remember facts from past conversations
+- [ ] Can search semantically through history
+- [ ] Can reference uploaded documents
+- [ ] Can generate images from prompts
+- [ ] Can analyze uploaded images
+- [ ] Can search the web for information
+- [ ] Can summarize web articles
+
+### Performance
+- [ ] Memory search: <500ms
+- [ ] Document search: <200ms
+- [ ] Image generation: <10s (API-dependent)
+- [ ] Web search: <2s
+- [ ] No UI lag with large knowledge base
+
+### User Experience
+- [ ] Intuitive memory management
+- [ ] Easy document upload and search
+- [ ] Seamless image generation workflow
+- [ ] Useful web search integration
+- [ ] Clear indication of memory usage
+
+---
+
+## 🔒 Privacy & Security
+
+### Data Storage
+- All data stored locally by default
+- Encrypted sensitive information
+- User control over data retention
+- Clear data deletion options
+
+### API Keys
+- Secure storage in Tauri config
+- Never logged or exposed
+- Optional API usage (user can disable features)
+
+### Memory System
+- User can view all stored memories
+- One-click memory deletion
+- Configurable retention periods
+- Export capabilities for transparency
+
+---
+
+## 🧪 Testing Strategy
+
+### Unit Tests
+- Vector database operations
+- Document parsing
+- Search functionality
+- Embedding generation
+
+### Integration Tests
+- End-to-end memory storage/retrieval
+- Document upload workflow
+- Image generation pipeline
+- Web search flow
+
+### Manual Testing
+- Memory accuracy
+- Search relevance
+- UI responsiveness
+- Cross-platform compatibility
+
+---
+
+## 📝 Documentation
+
+### User Documentation
+- Memory system guide
+- Document library tutorial
+- Image generation how-to
+- Web search commands reference
+
+### Developer Documentation
+- Vector database architecture
+- Embedding pipeline details
+- API integration guides
+- Database schemas
+
+---
+
+## 🎉 Phase 3 Vision
+
+By the end of Phase 3, EVE will:
+- **Remember everything** - Long-term conversational memory
+- **Reference knowledge** - Built-in document library
+- **See and create** - Vision and image generation
+- **Stay current** - Real-time web information
+
+This transforms EVE from a **conversational assistant** into a **knowledge companion** that grows smarter over time and has access to both personal knowledge and real-time information.
+
+---
+
+## 🔜 Post-Phase 3
+
+After Phase 3 completion, we'll move to:
+- **Phase 4**: Developer tools, plugins, customization
+- **v1.0**: Production release with all core features
+- **Beyond**: Mobile apps, team features, advanced AI
+
+---
+
+**Status**: Ready to Start  
+**Prerequisites**: Phase 2 Complete ✅  
+**Next Step**: Begin Long-Term Memory implementation
+
+**Created**: October 6, 2025, 11:20pm UTC+01:00