Files
eve-alpha/docs/planning/PHASE3_PLAN.md
2025-10-06 23:25:21 +01:00

14 KiB

Phase 3 - Knowledge Base & Memory (v0.3.0)

Target Version: v0.3.0
Estimated Duration: 20-30 hours
Priority: High
Status: 📋 Planning


🎯 Phase 3 Goals

Transform EVE from a conversational assistant into an intelligent knowledge companion with:

  1. Long-term memory - Remember past conversations and user preferences
  2. Document library - Manage and reference documents
  3. Vision capabilities - Generate and analyze images
  4. Web access - Real-time information retrieval

📊 Feature Breakdown

1. Long-Term Memory System

Priority: Critical
Estimated Time: 8-10 hours

Objectives

  • Store and retrieve conversational context across sessions
  • Semantic search through all conversations
  • Auto-extract and store key information
  • Build personal knowledge graph

Technical Approach

A. Vector Database Integration

  • Options:
    1. ChromaDB (lightweight, local-first)
    2. LanceDB (Rust-based, fast)
    3. SQLite + vector extension
  • Recommendation: ChromaDB for ease of use
  • Storage: Embed messages, extract entities, store relationships

B. Embedding Pipeline

User Message → OpenAI Embeddings API → Vector Store
                    ↓
            Semantic Search ← Query
                    ↓
            Retrieved Context → Enhanced Prompt

C. Implementation Plan

  1. Set up vector database (ChromaDB)
  2. Create embedding service (src/lib/embeddings.ts)
  3. Background job to embed existing messages
  4. Add semantic search to conversation store
  5. UI for memory search and management
  6. Context injection for relevant memories

D. Files to Create

  • src/lib/embeddings.ts - Embedding service
  • src/lib/vectordb.ts - Vector database client
  • src/stores/memoryStore.ts - Memory state management
  • src/components/MemorySearch.tsx - Search UI
  • src/components/MemoryPanel.tsx - Memory management UI

E. Features

  • Vector database setup
  • Automatic message embedding
  • Semantic search interface
  • Memory extraction (entities, facts)
  • Knowledge graph visualization
  • Context injection in prompts
  • Memory management UI

2. Document Library

Priority: High
Estimated Time: 6-8 hours

Objectives

  • Upload and store reference documents
  • Full-text search across documents
  • Automatic document summarization
  • Link documents to conversations

Technical Approach

A. Document Storage

  • Backend: Tauri file system access
  • Location: {app_data_dir}/documents/
  • Indexing: SQLite FTS5 for full-text search
  • Metadata: Title, author, date, tags, summary

B. Document Processing Pipeline

Upload → Parse (PDF/DOCX/MD) → Extract Text → Embed Chunks
           ↓                        ↓              ↓
       Metadata              Full-Text Index   Vector Store

C. Implementation Plan

  1. Rust commands for file management
  2. Document parser library integration
  3. SQLite database for metadata and FTS
  4. Chunking and embedding for semantic search
  5. Document viewer component
  6. Library management UI

D. Files to Create

  • src-tauri/src/documents.rs - Document management (Rust)
  • src/lib/documentParser.ts - Document parsing
  • src/stores/documentStore.ts - Document state
  • src/components/DocumentLibrary.tsx - Library UI
  • src/components/DocumentViewer.tsx - Document viewer

E. Features

  • Upload documents (PDF, DOCX, TXT, MD)
  • Full-text search
  • Document categorization
  • Automatic summarization
  • Reference in conversations
  • Document viewer
  • Export/backup library

F. Dependencies

{
  "pdf-parse": "^1.1.1",           // PDF parsing
  "mammoth": "^1.6.0",              // DOCX parsing
  "better-sqlite3": "^9.0.0"        // SQLite
}

3. Vision & Image Generation

Priority: High
Estimated Time: 4-6 hours

Objectives

  • Generate images from text prompts
  • Analyze uploaded images
  • Edit and manipulate existing images
  • Screenshot annotation tools

Technical Approach

A. Image Generation

  • Provider: DALL-E 3 (via OpenAI API)
  • Alternative: Stable Diffusion (local)
  • Storage: {app_data_dir}/generated_images/

B. Image Analysis

  • Provider: GPT-4 Vision (OpenAI)
  • Features:
    • Describe images
    • Extract text (OCR)
    • Answer questions about images
    • Compare multiple images

C. Implementation Plan

  1. OpenAI Vision API integration
  2. DALL-E 3 API integration
  3. Image storage and management
  4. Image generation UI
  5. Image analysis in chat
  6. Gallery component

D. Files to Create

  • src/lib/vision.ts - Vision API client
  • src/lib/imageGeneration.ts - DALL-E client
  • src/components/ImageGenerator.tsx - Generation UI
  • src/components/ImageGallery.tsx - Gallery view
  • src/stores/imageStore.ts - Image state

E. Features

  • Text-to-image generation
  • Image analysis and description
  • OCR text extraction
  • Image-based conversations
  • Generation history
  • Image editing tools (basic)
  • Screenshot capture and analysis

F. Dependencies

{
  "openai": "^4.0.0"  // Already installed
}

4. Web Access & Real-Time Information

Priority: Medium
Estimated Time: 6-8 hours

Objectives

  • Search the web for current information
  • Extract and summarize web content
  • Integrate news and articles
  • Fact-checking capabilities

Technical Approach

A. Web Search

  • Options:
    1. Brave Search API (privacy-focused, free tier)
    2. SerpAPI (Google results, paid)
    3. Custom scraper (legal concerns)
  • Recommendation: Brave Search API

B. Content Extraction

  • Library: Mozilla Readability or Cheerio
  • Process: Fetch → Parse → Clean → Summarize
  • Caching: Store extracted content locally

C. Implementation Plan

  1. Web search API integration
  2. Content extraction service
  3. URL preview component
  4. Web search command in chat
  5. Article summarization
  6. Citation tracking

D. Files to Create

  • src/lib/webSearch.ts - Search API client
  • src/lib/webScraper.ts - Content extraction
  • src/components/WebSearchPanel.tsx - Search UI
  • src/components/ArticlePreview.tsx - Preview component
  • src/stores/webStore.ts - Web content state

E. Features

  • Web search from chat
  • URL content extraction
  • Article summarization
  • News aggregation
  • Fact verification
  • Source citations
  • Link preview cards

F. Commands

// In-chat commands
/search [query]       // Web search
/summarize [url]      // Summarize article
/news [topic]         // Get latest news
/fact-check [claim]   // Verify information

G. Dependencies

{
  "cheerio": "^1.0.0-rc.12",       // HTML parsing
  "@mozilla/readability": "^0.5.0", // Content extraction
  "node-fetch": "^3.3.2"            // HTTP requests
}

🗂️ Database Schema

Memory Database (Vector Store)

interface Memory {
  id: string
  conversationId: string
  messageId: string
  content: string
  embedding: number[]           // 1536-dim vector
  entities: string[]            // Extracted entities
  timestamp: number
  importance: number            // 0-1 relevance score
  metadata: {
    speaker: 'user' | 'assistant'
    tags: string[]
    references: string[]        // Related memory IDs
  }
}

Document Database (SQLite)

CREATE TABLE documents (
  id TEXT PRIMARY KEY,
  title TEXT NOT NULL,
  filename TEXT NOT NULL,
  filepath TEXT NOT NULL,
  content TEXT,                 -- Full text for FTS
  summary TEXT,
  file_type TEXT,               -- pdf, docx, txt, md
  file_size INTEGER,
  upload_date INTEGER,
  tags TEXT,                    -- JSON array
  metadata TEXT                 -- JSON object
);

CREATE VIRTUAL TABLE documents_fts USING fts5(
  content,
  title,
  tags
);

Image Database (SQLite)

CREATE TABLE images (
  id TEXT PRIMARY KEY,
  filename TEXT NOT NULL,
  filepath TEXT NOT NULL,
  prompt TEXT,                  -- For generated images
  description TEXT,             -- AI-generated description
  analysis TEXT,                -- Detailed analysis
  width INTEGER,
  height INTEGER,
  file_size INTEGER,
  created_date INTEGER,
  source TEXT,                  -- 'generated', 'uploaded', 'screenshot'
  metadata TEXT                 -- JSON object
);

🎨 UI Components

New Screens

  1. Memory Dashboard (/memory)

    • Knowledge graph visualization
    • Memory timeline
    • Entity browser
    • Search interface
  2. Document Library (/documents)

    • Grid/list view
    • Upload area
    • Search and filter
    • Document viewer
  3. Image Gallery (/images)

    • Masonry layout
    • Generation form
    • Image details panel
    • Edit tools
  4. Web Research (/web)

    • Search interface
    • Article list
    • Preview panel
    • Saved articles

Enhanced Components

  1. Chat Interface

    • Memory context indicator
    • Document reference links
    • Image inline display
    • Web search results
  2. Settings

    • Memory settings (retention, privacy)
    • API keys (OpenAI, Brave)
    • Storage management
    • Feature toggles

🔧 Technical Architecture

State Management

// New Stores
memoryStore       // Memory & knowledge graph
documentStore     // Document library
imageStore        // Image gallery
webStore          // Web search & articles

// Enhanced Stores
chatStore         // Add memory injection
settingsStore     // Add new API keys

Backend (Rust)

// New modules
src-tauri/src/
  ├── memory/
     ├── embeddings.rs
     └── vectordb.rs
  ├── documents/
     ├── parser.rs
     ├── storage.rs
     └── search.rs
  └── images/
      ├── generator.rs
      └── storage.rs

API Integration

// New API clients
OpenAI Embeddings API   // Text embeddings
OpenAI Vision API       // Image analysis
DALL-E 3 API           // Image generation
Brave Search API       // Web search

📦 Dependencies

Frontend

{
  "chromadb": "^1.7.0",                    // Vector database
  "better-sqlite3": "^9.0.0",              // SQLite
  "cheerio": "^1.0.0-rc.12",               // Web scraping
  "@mozilla/readability": "^0.5.0",        // Content extraction
  "d3": "^7.8.5",                          // Knowledge graph viz
  "react-force-graph": "^1.43.0",          // Graph component
  "pdfjs-dist": "^3.11.174",               // PDF preview
  "react-image-gallery": "^1.3.0"          // Image gallery
}

Backend (Rust)

[dependencies]
chromadb = "0.1"              # Vector DB client
rusqlite = "0.30"             # SQLite
pdf-extract = "0.7"           # PDF parsing
lopdf = "0.31"                # PDF manipulation
image = "0.24"                # Image processing

🚀 Implementation Timeline

Week 1: Foundation (8-10 hours)

  • Days 1-2: Vector database setup
  • Day 3: Embedding pipeline
  • Day 4: Memory store and basic UI
  • Day 5: Testing and refinement

Week 2: Documents & Vision (10-12 hours)

  • Days 1-2: Document storage and parsing
  • Day 3: Full-text search implementation
  • Day 4: Vision API integration
  • Day 5: Image generation UI

Week 3: Web & Polish (6-8 hours)

  • Days 1-2: Web search integration
  • Day 3: Content extraction
  • Day 4: UI polish and testing
  • Day 5: Documentation

Total Estimated Time: 24-30 hours


🎯 Success Metrics

Functionality

  • Can remember facts from past conversations
  • Can search semantically through history
  • Can reference uploaded documents
  • Can generate images from prompts
  • Can analyze uploaded images
  • Can search the web for information
  • Can summarize web articles

Performance

  • Memory search: <500ms
  • Document search: <200ms
  • Image generation: <10s (API-dependent)
  • Web search: <2s
  • No UI lag with large knowledge base

User Experience

  • Intuitive memory management
  • Easy document upload and search
  • Seamless image generation workflow
  • Useful web search integration
  • Clear indication of memory usage

🔒 Privacy & Security

Data Storage

  • All data stored locally by default
  • Encrypted sensitive information
  • User control over data retention
  • Clear data deletion options

API Keys

  • Secure storage in Tauri config
  • Never logged or exposed
  • Optional API usage (user can disable features)

Memory System

  • User can view all stored memories
  • One-click memory deletion
  • Configurable retention periods
  • Export capabilities for transparency

🧪 Testing Strategy

Unit Tests

  • Vector database operations
  • Document parsing
  • Search functionality
  • Embedding generation

Integration Tests

  • End-to-end memory storage/retrieval
  • Document upload workflow
  • Image generation pipeline
  • Web search flow

Manual Testing

  • Memory accuracy
  • Search relevance
  • UI responsiveness
  • Cross-platform compatibility

📝 Documentation

User Documentation

  • Memory system guide
  • Document library tutorial
  • Image generation how-to
  • Web search commands reference

Developer Documentation

  • Vector database architecture
  • Embedding pipeline details
  • API integration guides
  • Database schemas

🎉 Phase 3 Vision

By the end of Phase 3, EVE will:

  • Remember everything - Long-term conversational memory
  • Reference knowledge - Built-in document library
  • See and create - Vision and image generation
  • Stay current - Real-time web information

This transforms EVE from a conversational assistant into a knowledge companion that grows smarter over time and has access to both personal knowledge and real-time information.


🔜 Post-Phase 3

After Phase 3 completion, we'll move to:

  • Phase 4: Developer tools, plugins, customization
  • v1.0: Production release with all core features
  • Beyond: Mobile apps, team features, advanced AI

Status: Ready to Start
Prerequisites: Phase 2 Complete
Next Step: Begin Long-Term Memory implementation

Created: October 6, 2025, 11:20pm UTC+01:00