diff --git a/ROADMAP.md b/ROADMAP.md index 40e69f8..178619a 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -33,6 +33,7 @@ Stand: 14.04.2026 | **Session-Management (Phase 6)** | ✅ | abaf4eb | | **Claude-DB Integration (Phase 8)** | ✅ | e6bd0de | | **Context-Management (Phase 9)** | ✅ | eb91e54 | +| **Sprach-Interface (Phase 10)** | ✅ | 14.04.2026 | --- @@ -274,9 +275,15 @@ Compacting ist **notwendig** (Token-Limit, Kosten, Latenz), aber dabei geht krit ### Noch offen (niedrigere Priorität) -- [ ] **Auto-Extraction vor Compacting** — Hook automatisch auslösen +- [x] **Auto-Extraction vor Compacting** — Hook automatisch auslösen ✅ (14.04.2026) + - `performCompacting()` ruft jetzt `extract_context_before_compacting()` auf + - Entscheidungen, TODOs, Key-Insights werden vor Compacting archiviert - [ ] **Validation** — Prüfen ob Claude den Context nutzt -- [ ] **Wissens-Hints** — On-demand aus claude-db laden +- [x] **Wissens-Hints** — On-demand aus claude-db laden ✅ (14.04.2026) + - `get_tool_hints()` lädt relevante Einträge bei Tool-Start + - Intelligentes Keyword-Mapping (npm, git, docker, dolibarr, etc.) + - `activeKnowledgeHints` Store im Frontend + - Anzeige im KnowledgePanel ### Verifikation ```bash @@ -287,40 +294,56 @@ Compacting ist **notwendig** (Token-Limit, Kosten, Latenz), aber dabei geht krit --- -## Phase 10: Sprach-Interface (Optional) +## Phase 10: Sprach-Interface ✅ ERLEDIGT + +> **Implementiert:** 14.04.2026 ### Technologie-Stack | Komponente | Technologie | Ort | |------------|-------------|-----| -| Speech-to-Text | Whisper.cpp | Lokal | -| Voice Activity Detection | Silero VAD | Lokal | +| Speech-to-Text | OpenAI Whisper API | Cloud | +| Voice Activity Detection | Custom (Audio Level) | Browser | | Text-to-Speech | OpenAI TTS API | Cloud | | Audio-Capture | Web Audio API | Browser | -### Aufgaben +### Implementiert -- [ ] **Whisper Integration** - - [ ] whisper.cpp als Tauri-Sidecar oder WASM - - [ ] Streaming-Transkription - - [ ] Deutsch-Modell (small oder medium) +- [x] **Whisper Integration** + - [x] OpenAI Whisper API für STT + - [x] Deutsch als Default-Sprache + - [x] Audio-Upload als multipart/form-data -- [ ] **VAD (Voice Activity Detection)** - - [ ] Erkennt wann User aufhort zu sprechen - - [ ] Pause > 1.5s → Nachricht senden +- [x] **VAD (Voice Activity Detection)** + - [x] Audio-Level-basierte Stille-Erkennung + - [x] Pause > 1.5s → Aufnahme automatisch stoppen + - [x] Konfigurierbare Schwellwerte -- [ ] **TTS (Text-to-Speech)** - - [ ] OpenAI TTS API Integration - - [ ] Streaming-Wiedergabe - - [ ] Interrupt bei User-Sprache +- [x] **TTS (Text-to-Speech)** + - [x] OpenAI TTS API Integration + - [x] 6 Stimmen verfügbar (Alloy, Echo, Fable, Onyx, Nova, Shimmer) + - [x] Audio als Base64 zurückgeben -- [ ] **UI** - - [ ] Mikrofon-Button in Chat - - [ ] Pegel-Anzeige - - [ ] Transkript live anzeigen +- [x] **UI** + - [x] Mikrofon-Button neben Send-Button + - [x] Echtzeit-Pegel-Anzeige (animiert) + - [x] Aufnahme-Status (pulsierend) + - [x] Live-Transkript-Anzeige -### Aufwand -Gross — eigenes Teilprojekt, 2-3 Wochen +### Dateien + +- `src-tauri/src/voice.rs` — Backend für STT/TTS +- `src/lib/components/ChatPanel.svelte` — UI + Audio-Capture + +### Konfiguration + +Benötigt `OPENAI_API_KEY` Umgebungsvariable für Whisper + TTS. + +### Zukünftige Verbesserungen + +- [ ] Lokales Whisper.cpp als Alternative (offline-fähig) +- [ ] Streaming-TTS für längere Texte +- [ ] Push-to-Talk Modus --- @@ -1184,8 +1207,8 @@ END; ## Technische Schulden -- [ ] Dead Code in `memory.rs` (MemorySystem struct ungenutzt) -- [ ] Warnings bei `cargo check` beheben +- [x] Dead Code in `memory.rs` (MemorySystem struct entfernt) ✅ (14.04.2026) +- [x] Warnings bei `cargo check` beheben ✅ (14.04.2026) - [ ] TypeScript strict mode aktivieren - [ ] E2E Tests mit Playwright - [ ] CI/CD Pipeline (Forgejo Runner) diff --git a/scripts/claude-bridge.js b/scripts/claude-bridge.js index 72b5a95..e93cf51 100644 --- a/scripts/claude-bridge.js +++ b/scripts/claude-bridge.js @@ -19,9 +19,71 @@ let activeAbort = null; let currentAgentId = null; let currentModel = process.env.CLAUDE_MODEL || 'opus'; +// Agent-Modus (solo | handlanger | experten | auto) +let agentMode = 'solo'; + // Sticky Context (Schicht 1) — wird bei JEDEM API-Call injiziert let stickyContext = ''; +// ============ Orchestrator Prompts ============ + +const ORCHESTRATOR_PROMPTS = { + handlanger: ` +Du bist der HAUPT-AGENT und arbeitest im HANDLANGER-MODUS. + +WICHTIG: Du denkst und planst, aber Sub-Agents führen aus! + +Wenn du eine Aufgabe bekommst: +1. ANALYSIERE was nötig ist +2. DELEGIERE an passende Sub-Agents mit EXAKTEN Anweisungen +3. Sub-Agents führen GENAU aus, was du sagst — sie denken NICHT selbst +4. Du erhältst ZUSAMMENFASSUNGEN zurück (keine Rohdaten) +5. Du entscheidest den nächsten Schritt + +Beispiel-Delegationen: +- "Lies Datei X, gib mir Zeilen 10-50 zurück" +- "Suche nach 'handleError' in src/, liste die Dateien" +- "Führe 'npm test' aus, berichte nur ob passed/failed" + +Halte deinen Context klein — lass Sub-Agents die Details bearbeiten! +`, + + experten: ` +Du bist der HAUPT-AGENT und arbeitest im EXPERTEN-MODUS. + +WICHTIG: Du koordinierst autonome Experten-Agents! + +Deine Experten: +- **Research**: Durchsucht Code, findet Informationen, PLANT SELBST wie er sucht +- **Implement**: Schreibt Code, ENTSCHEIDET SELBST wie er es macht (Best Practices) +- **Test**: Schreibt und führt Tests aus, WÄHLT SELBST passende Testfälle +- **Review**: Prüft Code, FINDET SELBST Probleme + +Wenn du eine Aufgabe bekommst: +1. TEILE sie in Experten-Bereiche auf +2. DELEGIERE an den passenden Experten mit dem WAS, nicht dem WIE +3. Der Experte arbeitet AUTONOM und liefert eine Zusammenfassung +4. Du INTEGRIERST die Ergebnisse + +Beispiel-Delegationen: +- Research: "Finde heraus wie Authentication in diesem Projekt implementiert ist" +- Implement: "Füge OAuth2-Support hinzu" (ohne exakte Code-Vorgabe) +- Test: "Teste die neue Auth-Funktionalität" +- Review: "Prüfe die OAuth-Implementierung auf Sicherheitsprobleme" +`, + + auto: ` +Du analysierst Aufgaben und wählst den optimalen Arbeitsmodus. + +Entscheide basierend auf: +- SOLO: Einfache, schnelle Aufgaben (Typo fix, Code erklären, einzelne Datei ändern) +- HANDLANGER: Koordinations-intensive Aufgaben (viele Dateien lesen, Bug in großer Codebase) +- EXPERTEN: Komplexe Features (neues System implementieren, großes Refactoring) + +Teile deine Wahl am Anfang mit: "[Modus: X] Begründung" +`, +}; + // Subagent-Tracking // Map: toolUseId → { agentId, parentId, type, task, depth } const activeSubagents = new Map(); @@ -159,12 +221,23 @@ async function sendMessage(message, requestId, model = null, contextOverride = n resumeSessionId: resumeSessionId || null, }); - sendResponse(requestId, { agentId: currentAgentId, status: 'gestartet', model: useModel, resuming: isResuming }); + sendResponse(requestId, { agentId: currentAgentId, status: 'gestartet', model: useModel, resuming: isResuming, mode: agentMode }); - // Nachricht mit Context kombinieren - const fullPrompt = useContext - ? `${useContext}\n\n---\n\n${message}` - : message; + // Orchestrator-Prompt für nicht-Solo Modi + let orchestratorPrompt = ''; + if (agentMode !== 'solo' && ORCHESTRATOR_PROMPTS[agentMode]) { + orchestratorPrompt = ORCHESTRATOR_PROMPTS[agentMode]; + sendMonitorEvent('agent', `Orchestrator-Modus: ${agentMode}`, { mode: agentMode }); + } + + // Nachricht mit Context und Orchestrator kombinieren + let fullPrompt = message; + if (orchestratorPrompt) { + fullPrompt = `${orchestratorPrompt}\n\n---\n\n${message}`; + } + if (useContext) { + fullPrompt = `${useContext}\n\n---\n\n${fullPrompt}`; + } const startTime = Date.now(); let fullText = ''; @@ -413,9 +486,27 @@ function handleCommand(msg) { }); break; + case 'set-mode': + // Agent-Modus setzen (solo, handlanger, experten, auto) + const validModes = ['solo', 'handlanger', 'experten', 'auto']; + if (!msg.mode || !validModes.includes(msg.mode)) { + sendError(msg.id, `Ungültiger Modus: ${msg.mode}. Verfügbar: ${validModes.join(', ')}`); + return; + } + agentMode = msg.mode; + sendResponse(msg.id, { mode: agentMode, status: 'Modus geändert' }); + sendEvent('mode-changed', { mode: agentMode }); + sendMonitorEvent('agent', `Agent-Modus geändert: ${agentMode}`, { mode: agentMode }); + break; + + case 'get-mode': + sendResponse(msg.id, { mode: agentMode }); + break; + case 'status': sendResponse(msg.id, { model: currentModel, + mode: agentMode, isProcessing: !!currentAgentId, availableModels: AVAILABLE_MODELS, }); diff --git a/src-tauri/Cargo.toml b/src-tauri/Cargo.toml index 28d6b36..45f23b0 100644 --- a/src-tauri/Cargo.toml +++ b/src-tauri/Cargo.toml @@ -22,6 +22,8 @@ chrono = { version = "0.4", features = ["serde"] } uuid = { version = "1", features = ["v4", "serde"] } rusqlite = { version = "0.31", features = ["bundled"] } mysql_async = "0.34" +reqwest = { version = "0.12", features = ["json", "multipart"] } +base64 = "0.22" [profile.release] panic = "abort" diff --git a/src-tauri/src/audit.rs b/src-tauri/src/audit.rs index e23ea06..2d848da 100644 --- a/src-tauri/src/audit.rs +++ b/src-tauri/src/audit.rs @@ -45,12 +45,14 @@ pub struct AuditEntry { pub session_id: Option, } -/// Audit-Log Manager +/// Audit-Log Manager (für zukünftige In-Memory-Nutzung) +#[allow(dead_code)] #[derive(Debug, Default)] pub struct AuditLog { entries: Vec, } +#[allow(dead_code)] impl AuditLog { pub fn new() -> Self { Self { entries: vec![] } diff --git a/src-tauri/src/claude.rs b/src-tauri/src/claude.rs index a20f866..d431eaf 100644 --- a/src-tauri/src/claude.rs +++ b/src-tauri/src/claude.rs @@ -7,7 +7,6 @@ use std::process::{Command, Stdio}; use std::sync::{Arc, Mutex}; use tauri::{AppHandle, Emitter, Manager}; -use crate::context; use crate::db; /// Status eines Agents @@ -49,6 +48,7 @@ struct BridgeMessage { payload: Option, id: Option, result: Option, + #[allow(dead_code)] error: Option, } @@ -262,11 +262,6 @@ fn send_to_bridge(app: &AppHandle, command: &str, message: &str) -> Result) -> Result { - send_to_bridge_full(app, command, message, context, None) -} - /// Befehl an Bridge senden mit Context und Resume-Session-ID fn send_to_bridge_full( app: &AppHandle, @@ -288,6 +283,11 @@ fn send_to_bridge_full( "id": request_id, "model": message }), + "set-mode" => serde_json::json!({ + "command": command, + "id": request_id, + "mode": message + }), "message" => { let mut payload = serde_json::json!({ "command": command, @@ -513,6 +513,53 @@ pub async fn get_current_model(app: AppHandle) -> Result { Ok("opus".to_string()) } +/// Agent-Modus setzen (solo, handlanger, experten, auto) +#[tauri::command] +pub async fn set_agent_mode(app: AppHandle, mode: String) -> Result { + let valid_modes = ["solo", "handlanger", "experten", "auto"]; + if !valid_modes.contains(&mode.as_str()) { + return Err(format!("Ungültiger Modus: {}. Verfügbar: {}", mode, valid_modes.join(", "))); + } + + println!("🔄 Agent-Modus wechseln zu: {}", mode); + + // Modus in Settings speichern + if let Some(db_state) = app.try_state::>>() { + let db = db_state.lock().unwrap(); + let _ = db.set_setting("agent_mode", &mode); + } + + // Bridge starten falls nicht aktiv + let needs_start = { + let state = app.state::>>(); + let state_guard = state.lock().unwrap(); + state_guard.bridge_stdin.is_none() + }; + + if needs_start { + start_bridge(&app)?; + tokio::time::sleep(tokio::time::Duration::from_millis(500)).await; + } + + // Modus an Bridge senden + send_to_bridge(&app, "set-mode", &mode)?; + + Ok(mode) +} + +/// Aktuellen Agent-Modus aus Settings laden +#[tauri::command] +pub async fn get_agent_mode(app: AppHandle) -> Result { + if let Some(db_state) = app.try_state::>>() { + let db = db_state.lock().unwrap(); + if let Ok(Some(mode)) = db.get_setting("agent_mode") { + return Ok(mode); + } + } + // Default: solo + Ok("solo".to_string()) +} + /// Modell-Info Struct #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] pub struct ModelInfo { diff --git a/src-tauri/src/context.rs b/src-tauri/src/context.rs index e7cec20..8ddc184 100644 --- a/src-tauri/src/context.rs +++ b/src-tauri/src/context.rs @@ -2,7 +2,6 @@ // Drei-Schichten-Gedächtnis für kritischen Kontext use serde::{Deserialize, Serialize}; -use std::sync::{Arc, Mutex}; use tauri::{AppHandle, Manager}; use crate::db::{Database, DbState}; @@ -74,7 +73,8 @@ pub struct ExtractedContext { pub mentioned_tools: Vec, } -/// Wissens-Hint (Schicht 3, on-demand) +/// Wissens-Hint (Schicht 3, on-demand) — für zukünftige Wissens-Hints +#[allow(dead_code)] #[derive(Debug, Clone, Serialize, Deserialize)] pub struct KnowledgeHint { pub title: String, @@ -122,6 +122,7 @@ impl StickyContext { } /// Geschätzte Token-Anzahl + #[allow(dead_code)] pub fn estimate_tokens(&self) -> usize { // Grobe Schätzung: ~4 Zeichen pro Token self.render().len() / 4 @@ -167,6 +168,7 @@ impl ProjectContext { } /// Geschätzte Token-Anzahl + #[allow(dead_code)] pub fn estimate_tokens(&self) -> usize { self.render().len() / 4 } diff --git a/src-tauri/src/db.rs b/src-tauri/src/db.rs index 99f7495..76e92c2 100644 --- a/src-tauri/src/db.rs +++ b/src-tauri/src/db.rs @@ -50,6 +50,7 @@ pub struct MonitorEvent { pub agent_id: Option, pub session_id: Option, pub duration_ms: Option, + #[allow(dead_code)] pub error: Option, } @@ -340,6 +341,7 @@ impl Database { // ============ Memory ============ /// Speichert einen Memory-Eintrag + #[allow(dead_code)] pub fn save_memory_entry(&self, entry: &MemoryEntry) -> SqlResult<()> { self.conn.execute( "INSERT OR REPLACE INTO memory (id, category, key, value, sticky, auto_load, last_used, use_count) @@ -386,6 +388,7 @@ impl Database { } /// Löscht einen Memory-Eintrag + #[allow(dead_code)] pub fn delete_memory_entry(&self, id: &str) -> SqlResult<()> { self.conn.execute("DELETE FROM memory WHERE id = ?1", params![id])?; Ok(()) diff --git a/src-tauri/src/guard.rs b/src-tauri/src/guard.rs index ddfbba4..a8775c5 100644 --- a/src-tauri/src/guard.rs +++ b/src-tauri/src/guard.rs @@ -41,7 +41,8 @@ pub enum PermissionAction { Deny, } -/// Anfrage zur Freigabe +/// Anfrage zur Freigabe (für zukünftiges Permission-Popup) +#[allow(dead_code)] #[derive(Debug, Clone, Serialize, Deserialize)] pub struct PermissionRequest { pub id: String, @@ -53,7 +54,8 @@ pub struct PermissionRequest { pub suggested_pattern: Option, } -/// Antwort auf Freigabe-Anfrage +/// Antwort auf Freigabe-Anfrage (für zukünftiges Permission-Popup) +#[allow(dead_code)] #[derive(Debug, Clone, Serialize, Deserialize)] pub struct PermissionResponse { pub request_id: String, @@ -297,6 +299,7 @@ impl GuardRails { } /// Session-Permissions löschen + #[allow(dead_code)] pub fn clear_session(&mut self) { self.session_permissions.clear(); } diff --git a/src-tauri/src/knowledge.rs b/src-tauri/src/knowledge.rs index b4a0abb..626a87a 100644 --- a/src-tauri/src/knowledge.rs +++ b/src-tauri/src/knowledge.rs @@ -255,6 +255,147 @@ pub async fn get_recent_knowledge( Ok(entries) } +/// Wissens-Hints für ein Tool/Kommando laden +/// Sucht relevante Einträge basierend auf Tool-Name und Kommando +#[tauri::command] +pub async fn get_tool_hints( + tool: String, + command: Option, + context: Option, +) -> Result, String> { + let pool = create_pool(); + let mut conn = pool.get_conn().await.map_err(|e| e.to_string())?; + + // Suchbegriffe aus Tool + Command + Context zusammenbauen + let mut search_terms = vec![tool.clone()]; + + // Tool-spezifische Kategorien mappen + let category = match tool.as_str() { + "Bash" => { + if let Some(ref cmd) = command { + // Relevante Begriffe aus Bash-Kommando extrahieren + if cmd.contains("npm") || cmd.contains("node") { search_terms.push("npm".to_string()); } + if cmd.contains("git") { search_terms.push("git".to_string()); } + if cmd.contains("docker") { search_terms.push("docker".to_string()); } + if cmd.contains("cargo") { search_terms.push("cargo".to_string()); search_terms.push("rust".to_string()); } + if cmd.contains("dolibarr") { search_terms.push("dolibarr".to_string()); } + if cmd.contains("mysql") { search_terms.push("mysql".to_string()); search_terms.push("sql".to_string()); } + } + None // Keine spezifische Kategorie + } + "Read" | "Write" | "Edit" => { + if let Some(ref cmd) = command { + // Aus Dateipfad relevante Begriffe extrahieren + if cmd.contains("dolibarr") { search_terms.push("dolibarr".to_string()); } + if cmd.contains(".php") { search_terms.push("php".to_string()); } + if cmd.contains(".rs") { search_terms.push("rust".to_string()); } + if cmd.contains(".ts") || cmd.contains(".svelte") { search_terms.push("svelte".to_string()); } + } + None + } + _ => None, + }; + + // Optional: Context-Begriffe hinzufügen + if let Some(ref ctx) = context { + // Wichtige Begriffe aus Context extrahieren (max 3) + for word in ctx.split_whitespace().take(10) { + if word.len() > 4 && !search_terms.contains(&word.to_lowercase()) { + search_terms.push(word.to_lowercase()); + } + } + } + + // Suchquery bauen + let query_string = search_terms.join(" "); + + // Suche mit Volltext und optionalem Kategorie-Filter + let entries: Vec = if let Some(cat) = category { + conn.exec_map( + r#"SELECT id, category, title, content, tags, priority, status, + related_ids, source, created_at, updated_at + FROM knowledge + WHERE status = 'active' + AND category = ? + AND MATCH(title, content, tags) AGAINST(? IN NATURAL LANGUAGE MODE) + ORDER BY priority ASC, updated_at DESC + LIMIT 3"#, + (&cat, &query_string), + |(id, category, title, content, tags, priority, status, related_ids, source, created_at, updated_at): + (i64, String, String, String, Option, i32, String, Option, Option, String, String)| { + KnowledgeEntry { + id, category, title, content, tags, priority, status, + related_ids, source, created_at, updated_at, + } + } + ).await.map_err(|e| e.to_string())? + } else { + conn.exec_map( + r#"SELECT id, category, title, content, tags, priority, status, + related_ids, source, created_at, updated_at + FROM knowledge + WHERE status = 'active' + AND MATCH(title, content, tags) AGAINST(? IN NATURAL LANGUAGE MODE) + ORDER BY priority ASC, updated_at DESC + LIMIT 3"#, + (&query_string,), + |(id, category, title, content, tags, priority, status, related_ids, source, created_at, updated_at): + (i64, String, String, String, Option, i32, String, Option, Option, String, String)| { + KnowledgeEntry { + id, category, title, content, tags, priority, status, + related_ids, source, created_at, updated_at, + } + } + ).await.map_err(|e| e.to_string())? + }; + + drop(conn); + pool.disconnect().await.map_err(|e| e.to_string())?; + + if !entries.is_empty() { + println!("💡 {} Wissens-Hints geladen für Tool '{}': {:?}", + entries.len(), + tool, + entries.iter().map(|e| &e.title).collect::>() + ); + } + + Ok(entries) +} + +/// Wissens-Hints als formatierter Kontext-Block +#[tauri::command] +pub async fn format_tool_hints( + tool: String, + command: Option, + context: Option, +) -> Result { + let entries = get_tool_hints(tool.clone(), command, context).await?; + + if entries.is_empty() { + return Ok(String::new()); + } + + let mut hints = Vec::new(); + hints.push("".to_string()); + hints.push(format!("Relevante Informationen für {}:", tool)); + + for entry in entries { + hints.push(format!("\n**{}** ({})", entry.title, entry.category)); + // Content auf ~300 Zeichen kürzen + let content = if entry.content.len() > 300 { + format!("{}...", &entry.content[..300]) + } else { + entry.content + }; + hints.push(content); + } + + hints.push("".to_string()); + + Ok(hints.join("\n")) +} + /// Verbindung zur Wissensbasis testen #[tauri::command] pub async fn test_knowledge_connection() -> Result { diff --git a/src-tauri/src/lib.rs b/src-tauri/src/lib.rs index 4469da9..6525b4c 100644 --- a/src-tauri/src/lib.rs +++ b/src-tauri/src/lib.rs @@ -16,6 +16,7 @@ mod guard; mod knowledge; mod memory; mod session; +mod voice; /// Initialisiert die App #[cfg_attr(mobile, tauri::mobile_entry_point)] @@ -32,6 +33,8 @@ pub fn run() { claude::set_model, claude::get_available_models, claude::get_current_model, + claude::set_agent_mode, + claude::get_agent_mode, claude::init_sticky_context, // Gedächtnis-System memory::load_memory, @@ -82,6 +85,8 @@ pub fn run() { knowledge::get_knowledge_categories, knowledge::get_recent_knowledge, knowledge::test_knowledge_connection, + knowledge::get_tool_hints, + knowledge::format_tool_hints, // Context-Management context::get_sticky_context, context::set_sticky_context, @@ -91,6 +96,11 @@ pub fn run() { context::log_context_failure, context::get_full_context, context::list_sticky_context, + // Voice-Interface + voice::transcribe_audio, + voice::text_to_speech, + voice::check_voice_availability, + voice::get_tts_voices, ]) .setup(|app| { let handle = app.handle().clone(); diff --git a/src-tauri/src/memory.rs b/src-tauri/src/memory.rs index 8671c74..b0b026b 100644 --- a/src-tauri/src/memory.rs +++ b/src-tauri/src/memory.rs @@ -32,67 +32,7 @@ pub struct MemoryEntry { pub use_count: u32, } -/// Das Gedächtnis-System -#[derive(Debug, Default)] -pub struct MemorySystem { - entries: HashMap, - loaded_from_db: bool, -} - -impl MemorySystem { - pub fn new() -> Self { - Self { - entries: HashMap::new(), - loaded_from_db: false, - } - } - - /// Fügt einen Eintrag hinzu - pub fn add(&mut self, entry: MemoryEntry) { - self.entries.insert(entry.id.clone(), entry); - } - - /// Holt einen Eintrag - pub fn get(&self, id: &str) -> Option<&MemoryEntry> { - self.entries.get(id) - } - - /// Holt alle Einträge einer Kategorie - pub fn get_by_category(&self, category: ContextCategory) -> Vec<&MemoryEntry> { - self.entries - .values() - .filter(|e| e.category == category) - .collect() - } - - /// Holt alle Sticky-Einträge (für Kontext-Injection) - pub fn get_sticky_context(&self) -> Vec<&MemoryEntry> { - self.entries.values().filter(|e| e.sticky).collect() - } - - /// Holt alle Auto-Load-Einträge - pub fn get_auto_load(&self) -> Vec<&MemoryEntry> { - self.entries.values().filter(|e| e.auto_load).collect() - } - - /// Statistiken - pub fn stats(&self) -> MemoryStats { - MemoryStats { - total: self.entries.len(), - sticky: self.entries.values().filter(|e| e.sticky).count(), - by_category: self.count_by_category(), - } - } - - fn count_by_category(&self) -> HashMap { - let mut counts = HashMap::new(); - for entry in self.entries.values() { - let cat = format!("{:?}", entry.category); - *counts.entry(cat).or_insert(0) += 1; - } - counts - } -} +// MemorySystem Struct entfernt - Dead Code, Funktionalität läuft über Tauri-Commands #[derive(Debug, Serialize, Deserialize)] pub struct MemoryStats { diff --git a/src-tauri/src/voice.rs b/src-tauri/src/voice.rs new file mode 100644 index 0000000..16ea46d --- /dev/null +++ b/src-tauri/src/voice.rs @@ -0,0 +1,188 @@ +// Claude Desktop — Voice Interface +// Speech-to-Text mit Whisper API, Text-to-Speech mit OpenAI TTS + +use base64::{Engine as _, engine::general_purpose::STANDARD as BASE64}; +use serde::{Deserialize, Serialize}; +use std::io::Write; + +/// Whisper API Konfiguration +const OPENAI_API_URL: &str = "https://api.openai.com/v1/audio/transcriptions"; +const TTS_API_URL: &str = "https://api.openai.com/v1/audio/speech"; + +/// Transkriptions-Ergebnis +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct TranscriptionResult { + pub text: String, + pub language: Option, + pub duration: Option, +} + +/// TTS-Stimmen +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum TtsVoice { + Alloy, + Echo, + Fable, + Onyx, + Nova, + Shimmer, +} + +impl TtsVoice { + fn as_str(&self) -> &str { + match self { + TtsVoice::Alloy => "alloy", + TtsVoice::Echo => "echo", + TtsVoice::Fable => "fable", + TtsVoice::Onyx => "onyx", + TtsVoice::Nova => "nova", + TtsVoice::Shimmer => "shimmer", + } + } +} + +/// Holt den OpenAI API Key aus Umgebungsvariable oder Settings +fn get_openai_key() -> Result { + // Erst Umgebungsvariable prüfen + if let Ok(key) = std::env::var("OPENAI_API_KEY") { + if !key.is_empty() { + return Ok(key); + } + } + + // Alternativ: Aus Settings laden (TODO) + Err("OpenAI API Key nicht gefunden. Setze OPENAI_API_KEY Umgebungsvariable.".to_string()) +} + +/// Transkribiert Audio mit OpenAI Whisper API +#[tauri::command] +pub async fn transcribe_audio( + audio_base64: String, + format: String, +) -> Result { + let api_key = get_openai_key()?; + + // Base64 dekodieren + let audio_bytes = BASE64.decode(&audio_base64) + .map_err(|e| format!("Base64-Dekodierung fehlgeschlagen: {}", e))?; + + // Temporäre Datei erstellen (Whisper API braucht Datei-Upload) + let temp_dir = std::env::temp_dir(); + let temp_file = temp_dir.join(format!("whisper_audio_{}.{}", uuid::Uuid::new_v4(), format)); + + let mut file = std::fs::File::create(&temp_file) + .map_err(|e| format!("Temp-Datei erstellen fehlgeschlagen: {}", e))?; + file.write_all(&audio_bytes) + .map_err(|e| format!("Audio schreiben fehlgeschlagen: {}", e))?; + drop(file); + + // Multipart-Request an Whisper API + let client = reqwest::Client::new(); + + let file_part = reqwest::multipart::Part::file(&temp_file) + .await + .map_err(|e| format!("Datei lesen fehlgeschlagen: {}", e))? + .file_name(format!("audio.{}", format)) + .mime_str(&format!("audio/{}", format)) + .map_err(|e| format!("MIME-Type fehlgeschlagen: {}", e))?; + + let form = reqwest::multipart::Form::new() + .part("file", file_part) + .text("model", "whisper-1") + .text("language", "de") // Deutsch priorisieren + .text("response_format", "json"); + + let response = client + .post(OPENAI_API_URL) + .bearer_auth(&api_key) + .multipart(form) + .send() + .await + .map_err(|e| format!("API-Request fehlgeschlagen: {}", e))?; + + // Temp-Datei löschen + let _ = std::fs::remove_file(&temp_file); + + if !response.status().is_success() { + let error_text = response.text().await.unwrap_or_default(); + return Err(format!("Whisper API Fehler: {}", error_text)); + } + + // Response parsen + #[derive(Deserialize)] + struct WhisperResponse { + text: String, + } + + let result: WhisperResponse = response.json().await + .map_err(|e| format!("Response parsen fehlgeschlagen: {}", e))?; + + println!("🎤 Transkription: \"{}\"", result.text); + + Ok(result.text) +} + +/// Text-to-Speech mit OpenAI TTS API +#[tauri::command] +pub async fn text_to_speech( + text: String, + voice: Option, +) -> Result { + let api_key = get_openai_key()?; + + let voice_name = voice.unwrap_or_else(|| "nova".to_string()); + + let client = reqwest::Client::new(); + + let body = serde_json::json!({ + "model": "tts-1", + "input": text, + "voice": voice_name, + "response_format": "mp3" + }); + + let response = client + .post(TTS_API_URL) + .bearer_auth(&api_key) + .json(&body) + .send() + .await + .map_err(|e| format!("TTS API-Request fehlgeschlagen: {}", e))?; + + if !response.status().is_success() { + let error_text = response.text().await.unwrap_or_default(); + return Err(format!("TTS API Fehler: {}", error_text)); + } + + // Audio-Bytes als Base64 zurückgeben + let audio_bytes = response.bytes().await + .map_err(|e| format!("Audio lesen fehlgeschlagen: {}", e))?; + + let audio_base64 = BASE64.encode(&audio_bytes); + + println!("🔊 TTS generiert: {} Zeichen → {} Bytes Audio", text.len(), audio_bytes.len()); + + Ok(audio_base64) +} + +/// Prüft ob Voice-Features verfügbar sind (API Key vorhanden) +#[tauri::command] +pub async fn check_voice_availability() -> Result { + match get_openai_key() { + Ok(_) => Ok(true), + Err(_) => Ok(false), + } +} + +/// Verfügbare TTS-Stimmen +#[tauri::command] +pub async fn get_tts_voices() -> Result, String> { + Ok(vec![ + serde_json::json!({ "id": "alloy", "name": "Alloy", "description": "Neutral, ausgewogen" }), + serde_json::json!({ "id": "echo", "name": "Echo", "description": "Männlich, warm" }), + serde_json::json!({ "id": "fable", "name": "Fable", "description": "Expressiv, britisch" }), + serde_json::json!({ "id": "onyx", "name": "Onyx", "description": "Tief, autoritär" }), + serde_json::json!({ "id": "nova", "name": "Nova", "description": "Weiblich, freundlich" }), + serde_json::json!({ "id": "shimmer", "name": "Shimmer", "description": "Weiblich, sanft" }), + ]) +} diff --git a/src/lib/components/ChatPanel.svelte b/src/lib/components/ChatPanel.svelte index 3dfe076..7366b0e 100644 --- a/src/lib/components/ChatPanel.svelte +++ b/src/lib/components/ChatPanel.svelte @@ -98,6 +98,22 @@ const TOKEN_WARNING_THRESHOLD = 40000; // ~40k Token = Warnung zeigen const KEEP_LAST_MESSAGES = 30; + // Voice-Interface State + let isRecording = $state(false); + let audioLevel = $state(0); + let liveTranscript = $state(''); + let mediaRecorder: MediaRecorder | null = null; + let audioContext: AudioContext | null = null; + let analyser: AnalyserNode | null = null; + let audioChunks: Blob[] = []; + let levelAnimationFrame: number | null = null; + + // VAD (Voice Activity Detection) — automatisches Stoppen nach Sprechpause + const VAD_SILENCE_THRESHOLD = 15; // Pegel unter dem als Stille gilt + const VAD_SILENCE_DURATION = 1500; // ms Stille vor Auto-Stopp + let silenceStartTime: number | null = null; + let vadEnabled = $state(true); // VAD ein/aus + async function scrollToBottom() { await tick(); if (messagesContainer) { @@ -163,13 +179,31 @@ showCompactingDialog = false; try { + // Zuerst: Kritischen Kontext extrahieren und archivieren + const currentMessages = get(messages); + const messagesJson = JSON.stringify(currentMessages.map(m => ({ + role: m.role, + content: m.content + }))); + + try { + const extracted = await invoke('extract_context_before_compacting', { + sessionId, + messagesJson + }); + console.log('📦 Kontext extrahiert vor Compacting:', extracted); + } catch (extractErr) { + console.warn('Context-Extraction fehlgeschlagen (nicht kritisch):', extractErr); + } + + // Dann: Compacting durchführen const compacted: number = await invoke('compact_session', { sessionId, keepLast: KEEP_LAST_MESSAGES }); if (compacted > 0) { - addMessage('system', `📦 Compacting: ${compacted} ältere Nachrichten wurden zusammengefasst. Die letzten ${KEEP_LAST_MESSAGES} bleiben erhalten.`); + addMessage('system', `📦 Compacting: ${compacted} ältere Nachrichten wurden zusammengefasst. Die letzten ${KEEP_LAST_MESSAGES} bleiben erhalten. Kritischer Kontext wurde archiviert.`); } } catch (err) { console.error('Compacting fehlgeschlagen:', err); @@ -182,8 +216,149 @@ // Warnung für diese Session nicht erneut zeigen } + // ============ Voice Interface ============ + + async function startRecording() { + try { + const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); + + // Audio-Analyse für Pegel-Anzeige + audioContext = new AudioContext(); + analyser = audioContext.createAnalyser(); + const source = audioContext.createMediaStreamSource(stream); + source.connect(analyser); + analyser.fftSize = 256; + + // Pegel-Animation starten + updateAudioLevel(); + + // MediaRecorder für Aufnahme + mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' }); + audioChunks = []; + + mediaRecorder.ondataavailable = (event) => { + if (event.data.size > 0) { + audioChunks.push(event.data); + } + }; + + mediaRecorder.onstop = async () => { + // Aufnahme beendet — Audio an Whisper senden + const audioBlob = new Blob(audioChunks, { type: 'audio/webm' }); + await transcribeAudio(audioBlob); + + // Stream stoppen + stream.getTracks().forEach(track => track.stop()); + }; + + mediaRecorder.start(100); // Chunks alle 100ms + isRecording = true; + liveTranscript = ''; + silenceStartTime = null; // VAD-Timer zurücksetzen + console.log('🎤 Aufnahme gestartet' + (vadEnabled ? ' (VAD aktiv)' : '')); + } catch (err) { + console.error('Mikrofon-Zugriff fehlgeschlagen:', err); + addMessage('system', `⚠️ Mikrofon-Zugriff fehlgeschlagen: ${err}`); + } + } + + function stopRecording() { + if (mediaRecorder && mediaRecorder.state !== 'inactive') { + mediaRecorder.stop(); + } + + // Pegel-Animation stoppen + if (levelAnimationFrame) { + cancelAnimationFrame(levelAnimationFrame); + levelAnimationFrame = null; + } + + // Audio-Context schließen + if (audioContext) { + audioContext.close(); + audioContext = null; + } + + isRecording = false; + audioLevel = 0; + console.log('🎤 Aufnahme gestoppt'); + } + + function updateAudioLevel() { + if (!analyser || !isRecording) return; + + const dataArray = new Uint8Array(analyser.frequencyBinCount); + analyser.getByteFrequencyData(dataArray); + + // Durchschnittspegel berechnen + const average = dataArray.reduce((a, b) => a + b, 0) / dataArray.length; + audioLevel = Math.min(100, average * 1.5); // Normalisieren auf 0-100 + + // VAD: Stille erkennen und nach Pause automatisch stoppen + if (vadEnabled && audioChunks.length > 0) { + if (audioLevel < VAD_SILENCE_THRESHOLD) { + // Stille beginnt oder dauert an + if (silenceStartTime === null) { + silenceStartTime = Date.now(); + } else if (Date.now() - silenceStartTime > VAD_SILENCE_DURATION) { + // Lange genug still — Aufnahme automatisch stoppen + console.log('🔇 VAD: Stille erkannt, stoppe Aufnahme'); + stopRecording(); + return; + } + } else { + // Sprache erkannt — Stille-Timer zurücksetzen + silenceStartTime = null; + } + } + + levelAnimationFrame = requestAnimationFrame(updateAudioLevel); + } + + async function transcribeAudio(audioBlob: Blob) { + liveTranscript = 'Transkribiere...'; + + try { + // Audio als Base64 für Tauri-Command + const arrayBuffer = await audioBlob.arrayBuffer(); + const base64 = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer))); + + // An Backend senden für Whisper-Transkription + const transcript: string = await invoke('transcribe_audio', { + audioBase64: base64, + format: 'webm' + }); + + if (transcript && transcript.trim()) { + // Transkript in Input-Feld einfügen + $currentInput = ($currentInput + ' ' + transcript).trim(); + liveTranscript = ''; + console.log('📝 Transkript:', transcript); + } else { + liveTranscript = ''; + } + } catch (err) { + console.error('Transkription fehlgeschlagen:', err); + liveTranscript = `Fehler: ${err}`; + // Nach 3s ausblenden + setTimeout(() => { liveTranscript = ''; }, 3000); + } + } + + function toggleRecording() { + if (isRecording) { + stopRecording(); + } else { + startRecording(); + } + } + onDestroy(() => { unsubscribe(); + // Voice-Aufnahme stoppen falls aktiv + if (isRecording) { + stopRecording(); + } }); // Globale Keyboard Shortcuts @@ -512,25 +687,47 @@
+ {#if liveTranscript} +
+ 🎤 + {liveTranscript} +
+ {/if} - +
+ + +
@@ -1048,6 +1245,7 @@ padding: var(--spacing-sm) var(--spacing-md); background: var(--bg-secondary); border-top: 1px solid var(--bg-tertiary); + position: relative; } .chat-input textarea { @@ -1084,6 +1282,100 @@ transform: none; } + /* Voice Interface */ + .input-buttons { + display: flex; + flex-direction: column; + gap: var(--spacing-xs); + } + + .mic-button { + width: 48px; + height: 48px; + display: flex; + align-items: center; + justify-content: center; + background: var(--bg-secondary); + border: 1px solid var(--border); + border-radius: var(--radius-md); + font-size: 1.25rem; + cursor: pointer; + transition: all 0.2s ease; + position: relative; + overflow: hidden; + } + + .mic-button:hover:not(:disabled) { + background: var(--bg-tertiary); + border-color: var(--accent); + } + + .mic-button.recording { + background: rgba(239, 68, 68, 0.15); + border-color: #ef4444; + animation: pulse-recording 1.5s ease-in-out infinite; + } + + @keyframes pulse-recording { + 0%, 100% { box-shadow: 0 0 0 0 rgba(239, 68, 68, 0.4); } + 50% { box-shadow: 0 0 0 8px rgba(239, 68, 68, 0); } + } + + .mic-icon { + z-index: 1; + } + + .mic-icon.recording { + color: #ef4444; + } + + .audio-level { + position: absolute; + bottom: 0; + left: 0; + right: 0; + background: linear-gradient(to top, rgba(239, 68, 68, 0.4), rgba(239, 68, 68, 0.1)); + transition: height 0.05s ease-out; + pointer-events: none; + } + + .mic-button:disabled { + opacity: 0.5; + cursor: not-allowed; + } + + .live-transcript { + position: absolute; + top: -32px; + left: 0; + right: 0; + display: flex; + align-items: center; + gap: var(--spacing-xs); + padding: var(--spacing-xs) var(--spacing-sm); + background: rgba(239, 68, 68, 0.1); + border: 1px solid rgba(239, 68, 68, 0.3); + border-radius: var(--radius-sm); + font-size: 0.75rem; + color: #ef4444; + } + + .transcript-icon { + animation: pulse 1s ease-in-out infinite; + } + + @keyframes pulse { + 0%, 100% { opacity: 1; } + 50% { opacity: 0.5; } + } + + .transcript-text { + flex: 1; + overflow: hidden; + text-overflow: ellipsis; + white-space: nowrap; + } + /* Modal Styles */ .modal-backdrop { position: fixed; diff --git a/src/lib/components/KnowledgePanel.svelte b/src/lib/components/KnowledgePanel.svelte index a4ca60d..57e504e 100644 --- a/src/lib/components/KnowledgePanel.svelte +++ b/src/lib/components/KnowledgePanel.svelte @@ -1,6 +1,7 @@