Thinking Machines (founded by ex-OpenAI tech lead Mira Murati) just shared a research preview of “interaction models”—a new way to build AI conversation. Instead of relying on external software to fake real-time chat, their model architecture is designed for interactions that are immediate, interruptible, and can overlap. Most AI chat experiences are turn-based: you finish, then the model thinks, then it responds. That works for text, but it falls apart for voice/video—where humans constantly correct, add context, and speak over each other. Their model, TML-Interaction-Small, uses multi-stream + micro-turn handling at ~200ms granularity. In demos, it can respond to speech while also reacting quickly to visual cues. The key claim: it can “feel” time passing, so the exchange is coordinated like a human call, not just a back-and-forth transcript. Architecturally, it uses a dual-model setup: a near-real-time interaction model for dialogue, plus a background model for heavier reasoning, tool calls, retrieval, and web-style tasks—then stitches results back into the conversation. Benchmarks: 0.40s response latency on FD-bench V1, and 77.8 on interaction quality (vs GPT-realtime-2.0 minimal: 46.8; Gemini-3.1-flash-live minimal: 54.3). Next: limited tests, then broader rollout. #AIResearch #RealtimeAI #SpeechAI #Multimodal #HumanComputerInteraction #MiraMurati
Want to learn more? Visit Explore the world, stay updated on travel insights and international affairs, and discover authentic stories from real life
评论
发表评论