How it Works

FlowAI is a voice‑first, multi‑agent operating system for work. It turns live conversation into structured, reliable execution through a layered, real‑time architecture designed for low latency, safety, and scale.
Design Principles
Voice as OS: duplex audio, barge‑in, and streaming everywhere.
Multi‑agent orchestration: plan → act → reflect loops with deterministic control.
Enterprise‑grade guardrails: privacy, auditability, least‑privilege by default.
Real‑time first: sub‑second perceived latency with progressive streaming.
High‑Level Layers
Client & Edge
Web, Mobile, Phone (PSTN/SIP) via WebRTC or secure WebSocket.
Edge modules: VAD (voice activity detection), noise suppression, wake‑word, optional on‑device ASR/TTS for ultra‑low latency.
End‑to‑end encryption; session keys rotate per call.
Real‑Time I/O Gateway
Bi‑directional streams for audio, tokens, and tool events.
Auth (OAuth2/OIDC), rate‑limit, backpressure, session continuity, and resume after network blips.
Speech Intelligence Stack
Streaming ASR with endpointing + partial hypothesis updates.
Semantic parsing (intent, entities, slots), diarization for multi‑speaker calls.
Streaming TTS with text‑to‑audio as soon as first tokens are ready; barge‑in aware to stop speaking when the user interrupts.
Conversation OS
Turn manager with interrupt handling, context windows, and “working memory” summarization.
Safety filters (toxicity/PII), on‑the‑fly redaction, policy checks before any external action.
Flow Orchestrator (Core)
The heart of FlowAI: a graph‑based runtime (we call it FlowGraph) that composes agents, tools, and data sources.
Planner agent creates a plan (DAG of steps) from the user’s goal.
Router assigns each step to the best agent/model via a Model Router (latency/cost/capability aware).
Tool‑Calling Engine enforces JSON‑schema I/O, retries, idempotency, and compensating actions (Sagas) for multi‑step transactions.
Reflector evaluates outcomes, fixes errors, and iterates until success or policy stop.
Agent Layer
Coordinator (generalist) + Specialists (e.g., Research, Scheduling, CRM, Finance, DevOps).
Each agent = policy + skills + prompt pack + test suite + telemetry.
Skill Adapters expose capabilities (search, email, calendar, DB, RPA, browser, code exec) behind stable schemas.
Knowledge & Memory Layer
Tenant‑scoped vector stores (org, team, personal) with versioned documents.
Connectors: Drive/Box/SharePoint/Notion/Confluence/Slack/Email/DBs.
Retrieval policies (freshness, source weighting), hybrid search (BM25 + embeddings), tool‑augmented RAG.
Memory types: episodic (sessions), semantic (facts), procedural (how to do X in your org).
Action & Integration Layer
100+ connectors (SaaS, CRM/ERP, messaging, ticketing, cloud, payments).
Secrets in a KMS‑backed vault, per‑connector least privilege, consented OAuth flows.
Human‑in‑the‑loop: approvals for sensitive actions (e.g., “draft vs send”).
Workflow Engine
Event bus + task queues for parallelization and exactly‑once semantics.
Schedulers (cron/timers), retries with exponential backoff, DLQs.
Transactional outbox pattern to keep external systems and FlowGraph in sync.
Governance, Security, Observability
RBAC/ABAC, SCIM for user provisioning, tenant isolation at network and data layers.
PII detection/redaction, data minimization, DSR APIs (export/delete).
Full audit trails (who/what/when), prompt & tool logs with tamper‑evident hashing.
Tracing (OpenTelemetry), SLOs, anomaly alerts, cost & carbon meters.
Scalability & Resilience
Microservices on Kubernetes, multi‑region active‑active, autoscaling.
Circuit breakers, bulkheads, graceful degradation (fallback models/tools).
Semantic cache for responses/plans; cold‑start warmers for TTS/ASR.
Latency Budget (Real‑Time Targets)
ASR partials: ~150–250 ms to first transcript.
NLU + planning: ~50–150 ms (cached policies/models).
Tool call roundtrip (in‑VPC): ~80–200 ms; SaaS calls vary.
TTS first audio chunk: ~150–250 ms.
Perceived latency goal: first voice response < 700 ms, continuous streaming thereafter.

Developer Surface
Streams:
POST /v1/sessions.connect
(WebSocket/WebRTC) for audio + events.Agents:
POST /v1/agents.invoke
with tool schemas; function/JSON schema enforced.Workflows:
POST /v1/flows
(define FlowGraph),POST /v1/flows/run
.Knowledge:
POST /v1/knowledge/sync
,GET /v1/knowledge/search
.Governance:
GET /v1/audit
,POST /v1/policies
,POST /v1/approvals
.
Last updated