Model Name | Architecture | Provider | Purpose | Input | Output | Usage Context |
---|---|---|---|---|---|---|
Intent recognition | GPT-3.5 (fine-tuned) | OpenAI | Parse natural language into structured intent + steps | User prompt (text/voice transcription) | Intent type, parameters, plugin mapping | Core intent recognition and multi-step task decomposition |
Speech-to-text | Whisper (streaming) | OpenAI | Convert real-time speech to text | Audio stream (user voice input) | Clean text transcript | Voice-to-intent pipeline for user commands |
Wake Word Detection | Custom-trained CNN (via OpenWW) | In-house (OpenWW) | Detect wake word for voice activation | Streaming audio (PCM) | Boolean + audio slice | Triggers voice pipeline in real-time |
Summarizer | GPT-4 (prompt-based) | OpenAI | Summarize articles, emails, notes | Long-form text (articles, notes, email threads, etc.) | Short summary, metadata highlights | Plugin feature for text-heavy content |
Plugin reranker | Embedding reranker + LLM | In-house | Rerank plugin or action options from multiple candidates | List of plugin/action candidates + context | Ranked list of actions/plugins | Improves suggestion relevance after intent resolution |
Model Name | Avg Inference Time | Retraining Frequency | Context Volume | Explainability | Real-Time Compatible |
---|---|---|---|---|---|
Intent recognition | ~600 ms | Weekly | ~500 tokens + metadata | Limited (LLM) | Yes |
Speech-to-text | ~200 ms (streamed) | Not required | Audio stream | Not applicable (STT) | Yes |
Wake Word Detection | ~20 ms | Not required | Audio stream | Not applicable (STT) | Yes |
Summarizer | ~800 ms | Not required | ~1,000–2,000 tokens | Natural-language explanation (LLM) / Human-readable output | Yes |
Plugin reranker | ~150 ms | Weekly | Up to 10 candidates per prompt | Top-k scoring log | Yes |