Files
eve-alpha/docs/planning/PHASE3_PLAN.md
2025-10-06 23:25:21 +01:00

575 lines
14 KiB
Markdown

# Phase 3 - Knowledge Base & Memory (v0.3.0)
**Target Version**: v0.3.0
**Estimated Duration**: 20-30 hours
**Priority**: High
**Status**: 📋 Planning
---
## 🎯 Phase 3 Goals
Transform EVE from a conversational assistant into an **intelligent knowledge companion** with:
1. **Long-term memory** - Remember past conversations and user preferences
2. **Document library** - Manage and reference documents
3. **Vision capabilities** - Generate and analyze images
4. **Web access** - Real-time information retrieval
---
## 📊 Feature Breakdown
### 1. Long-Term Memory System
**Priority**: Critical
**Estimated Time**: 8-10 hours
#### Objectives
- Store and retrieve conversational context across sessions
- Semantic search through all conversations
- Auto-extract and store key information
- Build personal knowledge graph
#### Technical Approach
**A. Vector Database Integration**
- **Options**:
1. ChromaDB (lightweight, local-first)
2. LanceDB (Rust-based, fast)
3. SQLite + vector extension
- **Recommendation**: ChromaDB for ease of use
- **Storage**: Embed messages, extract entities, store relationships
**B. Embedding Pipeline**
```
User Message → OpenAI Embeddings API → Vector Store
Semantic Search ← Query
Retrieved Context → Enhanced Prompt
```
**C. Implementation Plan**
1. Set up vector database (ChromaDB)
2. Create embedding service (`src/lib/embeddings.ts`)
3. Background job to embed existing messages
4. Add semantic search to conversation store
5. UI for memory search and management
6. Context injection for relevant memories
**D. Files to Create**
- `src/lib/embeddings.ts` - Embedding service
- `src/lib/vectordb.ts` - Vector database client
- `src/stores/memoryStore.ts` - Memory state management
- `src/components/MemorySearch.tsx` - Search UI
- `src/components/MemoryPanel.tsx` - Memory management UI
**E. Features**
- [x] Vector database setup
- [x] Automatic message embedding
- [x] Semantic search interface
- [x] Memory extraction (entities, facts)
- [x] Knowledge graph visualization
- [x] Context injection in prompts
- [x] Memory management UI
---
### 2. Document Library
**Priority**: High
**Estimated Time**: 6-8 hours
#### Objectives
- Upload and store reference documents
- Full-text search across documents
- Automatic document summarization
- Link documents to conversations
#### Technical Approach
**A. Document Storage**
- **Backend**: Tauri file system access
- **Location**: `{app_data_dir}/documents/`
- **Indexing**: SQLite FTS5 for full-text search
- **Metadata**: Title, author, date, tags, summary
**B. Document Processing Pipeline**
```
Upload → Parse (PDF/DOCX/MD) → Extract Text → Embed Chunks
↓ ↓ ↓
Metadata Full-Text Index Vector Store
```
**C. Implementation Plan**
1. Rust commands for file management
2. Document parser library integration
3. SQLite database for metadata and FTS
4. Chunking and embedding for semantic search
5. Document viewer component
6. Library management UI
**D. Files to Create**
- `src-tauri/src/documents.rs` - Document management (Rust)
- `src/lib/documentParser.ts` - Document parsing
- `src/stores/documentStore.ts` - Document state
- `src/components/DocumentLibrary.tsx` - Library UI
- `src/components/DocumentViewer.tsx` - Document viewer
**E. Features**
- [x] Upload documents (PDF, DOCX, TXT, MD)
- [x] Full-text search
- [x] Document categorization
- [x] Automatic summarization
- [x] Reference in conversations
- [x] Document viewer
- [x] Export/backup library
**F. Dependencies**
```json
{
"pdf-parse": "^1.1.1", // PDF parsing
"mammoth": "^1.6.0", // DOCX parsing
"better-sqlite3": "^9.0.0" // SQLite
}
```
---
### 3. Vision & Image Generation
**Priority**: High
**Estimated Time**: 4-6 hours
#### Objectives
- Generate images from text prompts
- Analyze uploaded images
- Edit and manipulate existing images
- Screenshot annotation tools
#### Technical Approach
**A. Image Generation**
- **Provider**: DALL-E 3 (via OpenAI API)
- **Alternative**: Stable Diffusion (local)
- **Storage**: `{app_data_dir}/generated_images/`
**B. Image Analysis**
- **Provider**: GPT-4 Vision (OpenAI)
- **Features**:
- Describe images
- Extract text (OCR)
- Answer questions about images
- Compare multiple images
**C. Implementation Plan**
1. OpenAI Vision API integration
2. DALL-E 3 API integration
3. Image storage and management
4. Image generation UI
5. Image analysis in chat
6. Gallery component
**D. Files to Create**
- `src/lib/vision.ts` - Vision API client
- `src/lib/imageGeneration.ts` - DALL-E client
- `src/components/ImageGenerator.tsx` - Generation UI
- `src/components/ImageGallery.tsx` - Gallery view
- `src/stores/imageStore.ts` - Image state
**E. Features**
- [x] Text-to-image generation
- [x] Image analysis and description
- [x] OCR text extraction
- [x] Image-based conversations
- [x] Generation history
- [x] Image editing tools (basic)
- [x] Screenshot capture and analysis
**F. Dependencies**
```json
{
"openai": "^4.0.0" // Already installed
}
```
---
### 4. Web Access & Real-Time Information
**Priority**: Medium
**Estimated Time**: 6-8 hours
#### Objectives
- Search the web for current information
- Extract and summarize web content
- Integrate news and articles
- Fact-checking capabilities
#### Technical Approach
**A. Web Search**
- **Options**:
1. Brave Search API (privacy-focused, free tier)
2. SerpAPI (Google results, paid)
3. Custom scraper (legal concerns)
- **Recommendation**: Brave Search API
**B. Content Extraction**
- **Library**: Mozilla Readability or Cheerio
- **Process**: Fetch → Parse → Clean → Summarize
- **Caching**: Store extracted content locally
**C. Implementation Plan**
1. Web search API integration
2. Content extraction service
3. URL preview component
4. Web search command in chat
5. Article summarization
6. Citation tracking
**D. Files to Create**
- `src/lib/webSearch.ts` - Search API client
- `src/lib/webScraper.ts` - Content extraction
- `src/components/WebSearchPanel.tsx` - Search UI
- `src/components/ArticlePreview.tsx` - Preview component
- `src/stores/webStore.ts` - Web content state
**E. Features**
- [x] Web search from chat
- [x] URL content extraction
- [x] Article summarization
- [x] News aggregation
- [x] Fact verification
- [x] Source citations
- [x] Link preview cards
**F. Commands**
```typescript
// In-chat commands
/search [query] // Web search
/summarize [url] // Summarize article
/news [topic] // Get latest news
/fact-check [claim] // Verify information
```
**G. Dependencies**
```json
{
"cheerio": "^1.0.0-rc.12", // HTML parsing
"@mozilla/readability": "^0.5.0", // Content extraction
"node-fetch": "^3.3.2" // HTTP requests
}
```
---
## 🗂️ Database Schema
### Memory Database (Vector Store)
```typescript
interface Memory {
id: string
conversationId: string
messageId: string
content: string
embedding: number[] // 1536-dim vector
entities: string[] // Extracted entities
timestamp: number
importance: number // 0-1 relevance score
metadata: {
speaker: 'user' | 'assistant'
tags: string[]
references: string[] // Related memory IDs
}
}
```
### Document Database (SQLite)
```sql
CREATE TABLE documents (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
filename TEXT NOT NULL,
filepath TEXT NOT NULL,
content TEXT, -- Full text for FTS
summary TEXT,
file_type TEXT, -- pdf, docx, txt, md
file_size INTEGER,
upload_date INTEGER,
tags TEXT, -- JSON array
metadata TEXT -- JSON object
);
CREATE VIRTUAL TABLE documents_fts USING fts5(
content,
title,
tags
);
```
### Image Database (SQLite)
```sql
CREATE TABLE images (
id TEXT PRIMARY KEY,
filename TEXT NOT NULL,
filepath TEXT NOT NULL,
prompt TEXT, -- For generated images
description TEXT, -- AI-generated description
analysis TEXT, -- Detailed analysis
width INTEGER,
height INTEGER,
file_size INTEGER,
created_date INTEGER,
source TEXT, -- 'generated', 'uploaded', 'screenshot'
metadata TEXT -- JSON object
);
```
---
## 🎨 UI Components
### New Screens
1. **Memory Dashboard** (`/memory`)
- Knowledge graph visualization
- Memory timeline
- Entity browser
- Search interface
2. **Document Library** (`/documents`)
- Grid/list view
- Upload area
- Search and filter
- Document viewer
3. **Image Gallery** (`/images`)
- Masonry layout
- Generation form
- Image details panel
- Edit tools
4. **Web Research** (`/web`)
- Search interface
- Article list
- Preview panel
- Saved articles
### Enhanced Components
1. **Chat Interface**
- Memory context indicator
- Document reference links
- Image inline display
- Web search results
2. **Settings**
- Memory settings (retention, privacy)
- API keys (OpenAI, Brave)
- Storage management
- Feature toggles
---
## 🔧 Technical Architecture
### State Management
```typescript
// New Stores
memoryStore // Memory & knowledge graph
documentStore // Document library
imageStore // Image gallery
webStore // Web search & articles
// Enhanced Stores
chatStore // Add memory injection
settingsStore // Add new API keys
```
### Backend (Rust)
```rust
// New modules
src-tauri/src/
├── memory/
├── embeddings.rs
└── vectordb.rs
├── documents/
├── parser.rs
├── storage.rs
└── search.rs
└── images/
├── generator.rs
└── storage.rs
```
### API Integration
```typescript
// New API clients
OpenAI Embeddings API // Text embeddings
OpenAI Vision API // Image analysis
DALL-E 3 API // Image generation
Brave Search API // Web search
```
---
## 📦 Dependencies
### Frontend
```json
{
"chromadb": "^1.7.0", // Vector database
"better-sqlite3": "^9.0.0", // SQLite
"cheerio": "^1.0.0-rc.12", // Web scraping
"@mozilla/readability": "^0.5.0", // Content extraction
"d3": "^7.8.5", // Knowledge graph viz
"react-force-graph": "^1.43.0", // Graph component
"pdfjs-dist": "^3.11.174", // PDF preview
"react-image-gallery": "^1.3.0" // Image gallery
}
```
### Backend (Rust)
```toml
[dependencies]
chromadb = "0.1" # Vector DB client
rusqlite = "0.30" # SQLite
pdf-extract = "0.7" # PDF parsing
lopdf = "0.31" # PDF manipulation
image = "0.24" # Image processing
```
---
## 🚀 Implementation Timeline
### Week 1: Foundation (8-10 hours)
- **Days 1-2**: Vector database setup
- **Day 3**: Embedding pipeline
- **Day 4**: Memory store and basic UI
- **Day 5**: Testing and refinement
### Week 2: Documents & Vision (10-12 hours)
- **Days 1-2**: Document storage and parsing
- **Day 3**: Full-text search implementation
- **Day 4**: Vision API integration
- **Day 5**: Image generation UI
### Week 3: Web & Polish (6-8 hours)
- **Days 1-2**: Web search integration
- **Day 3**: Content extraction
- **Day 4**: UI polish and testing
- **Day 5**: Documentation
**Total Estimated Time**: 24-30 hours
---
## 🎯 Success Metrics
### Functionality
- [ ] Can remember facts from past conversations
- [ ] Can search semantically through history
- [ ] Can reference uploaded documents
- [ ] Can generate images from prompts
- [ ] Can analyze uploaded images
- [ ] Can search the web for information
- [ ] Can summarize web articles
### Performance
- [ ] Memory search: <500ms
- [ ] Document search: <200ms
- [ ] Image generation: <10s (API-dependent)
- [ ] Web search: <2s
- [ ] No UI lag with large knowledge base
### User Experience
- [ ] Intuitive memory management
- [ ] Easy document upload and search
- [ ] Seamless image generation workflow
- [ ] Useful web search integration
- [ ] Clear indication of memory usage
---
## 🔒 Privacy & Security
### Data Storage
- All data stored locally by default
- Encrypted sensitive information
- User control over data retention
- Clear data deletion options
### API Keys
- Secure storage in Tauri config
- Never logged or exposed
- Optional API usage (user can disable features)
### Memory System
- User can view all stored memories
- One-click memory deletion
- Configurable retention periods
- Export capabilities for transparency
---
## 🧪 Testing Strategy
### Unit Tests
- Vector database operations
- Document parsing
- Search functionality
- Embedding generation
### Integration Tests
- End-to-end memory storage/retrieval
- Document upload workflow
- Image generation pipeline
- Web search flow
### Manual Testing
- Memory accuracy
- Search relevance
- UI responsiveness
- Cross-platform compatibility
---
## 📝 Documentation
### User Documentation
- Memory system guide
- Document library tutorial
- Image generation how-to
- Web search commands reference
### Developer Documentation
- Vector database architecture
- Embedding pipeline details
- API integration guides
- Database schemas
---
## 🎉 Phase 3 Vision
By the end of Phase 3, EVE will:
- **Remember everything** - Long-term conversational memory
- **Reference knowledge** - Built-in document library
- **See and create** - Vision and image generation
- **Stay current** - Real-time web information
This transforms EVE from a **conversational assistant** into a **knowledge companion** that grows smarter over time and has access to both personal knowledge and real-time information.
---
## 🔜 Post-Phase 3
After Phase 3 completion, we'll move to:
- **Phase 4**: Developer tools, plugins, customization
- **v1.0**: Production release with all core features
- **Beyond**: Mobile apps, team features, advanced AI
---
**Status**: Ready to Start
**Prerequisites**: Phase 2 Complete
**Next Step**: Begin Long-Term Memory implementation
**Created**: October 6, 2025, 11:20pm UTC+01:00