How it Works

FlowAI is a voice‑first, multi‑agent operating system for work. It turns live conversation into structured, reliable execution through a layered, real‑time architecture designed for low latency, safety, and scale.

Design Principles

Voice as OS: duplex audio, barge‑in, and streaming everywhere.
Multi‑agent orchestration: plan → act → reflect loops with deterministic control.
Enterprise‑grade guardrails: privacy, auditability, least‑privilege by default.
Real‑time first: sub‑second perceived latency with progressive streaming.

High‑Level Layers

Client & Edge
- Web, Mobile, Phone (PSTN/SIP) via WebRTC or secure WebSocket.
- Edge modules: VAD (voice activity detection), noise suppression, wake‑word, optional on‑device ASR/TTS for ultra‑low latency.
- End‑to‑end encryption; session keys rotate per call.
Real‑Time I/O Gateway
- Bi‑directional streams for audio, tokens, and tool events.
- Auth (OAuth2/OIDC), rate‑limit, backpressure, session continuity, and resume after network blips.
Speech Intelligence Stack
- Streaming ASR with endpointing + partial hypothesis updates.
- Semantic parsing (intent, entities, slots), diarization for multi‑speaker calls.
- Streaming TTS with text‑to‑audio as soon as first tokens are ready; barge‑in aware to stop speaking when the user interrupts.
Conversation OS
- Turn manager with interrupt handling, context windows, and “working memory” summarization.
- Safety filters (toxicity/PII), on‑the‑fly redaction, policy checks before any external action.
Flow Orchestrator (Core)
- The heart of FlowAI: a graph‑based runtime (we call it FlowGraph) that composes agents, tools, and data sources.
- Planner agent creates a plan (DAG of steps) from the user’s goal.
- Router assigns each step to the best agent/model via a Model Router (latency/cost/capability aware).
- Tool‑Calling Engine enforces JSON‑schema I/O, retries, idempotency, and compensating actions (Sagas) for multi‑step transactions.
- Reflector evaluates outcomes, fixes errors, and iterates until success or policy stop.
Agent Layer
- Coordinator (generalist) + Specialists (e.g., Research, Scheduling, CRM, Finance, DevOps).
- Each agent = policy + skills + prompt pack + test suite + telemetry.
- Skill Adapters expose capabilities (search, email, calendar, DB, RPA, browser, code exec) behind stable schemas.
Knowledge & Memory Layer
- Tenant‑scoped vector stores (org, team, personal) with versioned documents.
- Connectors: Drive/Box/SharePoint/Notion/Confluence/Slack/Email/DBs.
- Retrieval policies (freshness, source weighting), hybrid search (BM25 + embeddings), tool‑augmented RAG.
- Memory types: episodic (sessions), semantic (facts), procedural (how to do X in your org).
Action & Integration Layer
- 100+ connectors (SaaS, CRM/ERP, messaging, ticketing, cloud, payments).
- Secrets in a KMS‑backed vault, per‑connector least privilege, consented OAuth flows.
- Human‑in‑the‑loop: approvals for sensitive actions (e.g., “draft vs send”).
Workflow Engine
- Event bus + task queues for parallelization and exactly‑once semantics.
- Schedulers (cron/timers), retries with exponential backoff, DLQs.
- Transactional outbox pattern to keep external systems and FlowGraph in sync.
Governance, Security, Observability
- RBAC/ABAC, SCIM for user provisioning, tenant isolation at network and data layers.
- PII detection/redaction, data minimization, DSR APIs (export/delete).
- Full audit trails (who/what/when), prompt & tool logs with tamper‑evident hashing.
- Tracing (OpenTelemetry), SLOs, anomaly alerts, cost & carbon meters.
Scalability & Resilience
- Microservices on Kubernetes, multi‑region active‑active, autoscaling.
- Circuit breakers, bulkheads, graceful degradation (fallback models/tools).
- Semantic cache for responses/plans; cold‑start warmers for TTS/ASR.

^{Latency Budget (Real‑Time Targets)}

ASR partials: ~150–250 ms to first transcript.
NLU + planning: ~50–150 ms (cached policies/models).
Tool call roundtrip (in‑VPC): ~80–200 ms; SaaS calls vary.
TTS first audio chunk: ~150–250 ms.
Perceived latency goal: first voice response < 700 ms, continuous streaming thereafter.

Developer Surface

Streams: POST /v1/sessions.connect (WebSocket/WebRTC) for audio + events.
Agents: POST /v1/agents.invoke with tool schemas; function/JSON schema enforced.
Workflows: POST /v1/flows (define FlowGraph), POST /v1/flows/run.
Knowledge: POST /v1/knowledge/sync, GET /v1/knowledge/search.
Governance: GET /v1/audit, POST /v1/policies, POST /v1/approvals.

PreviousMarket Opportunity NextFlow of a Typical Interaction

Last updated 15 days ago