How it Works

FlowAI is a voice‑first, multi‑agent operating system for work. It turns live conversation into structured, reliable execution through a layered, real‑time architecture designed for low latency, safety, and scale.

Design Principles

  • Voice as OS: duplex audio, barge‑in, and streaming everywhere.

  • Multi‑agent orchestration: plan → act → reflect loops with deterministic control.

  • Enterprise‑grade guardrails: privacy, auditability, least‑privilege by default.

  • Real‑time first: sub‑second perceived latency with progressive streaming.

High‑Level Layers

  1. Client & Edge

    • Web, Mobile, Phone (PSTN/SIP) via WebRTC or secure WebSocket.

    • Edge modules: VAD (voice activity detection), noise suppression, wake‑word, optional on‑device ASR/TTS for ultra‑low latency.

    • End‑to‑end encryption; session keys rotate per call.

  2. Real‑Time I/O Gateway

    • Bi‑directional streams for audio, tokens, and tool events.

    • Auth (OAuth2/OIDC), rate‑limit, backpressure, session continuity, and resume after network blips.

  3. Speech Intelligence Stack

    • Streaming ASR with endpointing + partial hypothesis updates.

    • Semantic parsing (intent, entities, slots), diarization for multi‑speaker calls.

    • Streaming TTS with text‑to‑audio as soon as first tokens are ready; barge‑in aware to stop speaking when the user interrupts.

  4. Conversation OS

    • Turn manager with interrupt handling, context windows, and “working memory” summarization.

    • Safety filters (toxicity/PII), on‑the‑fly redaction, policy checks before any external action.

  5. Flow Orchestrator (Core)

    • The heart of FlowAI: a graph‑based runtime (we call it FlowGraph) that composes agents, tools, and data sources.

    • Planner agent creates a plan (DAG of steps) from the user’s goal.

    • Router assigns each step to the best agent/model via a Model Router (latency/cost/capability aware).

    • Tool‑Calling Engine enforces JSON‑schema I/O, retries, idempotency, and compensating actions (Sagas) for multi‑step transactions.

    • Reflector evaluates outcomes, fixes errors, and iterates until success or policy stop.

  6. Agent Layer

    • Coordinator (generalist) + Specialists (e.g., Research, Scheduling, CRM, Finance, DevOps).

    • Each agent = policy + skills + prompt pack + test suite + telemetry.

    • Skill Adapters expose capabilities (search, email, calendar, DB, RPA, browser, code exec) behind stable schemas.

  7. Knowledge & Memory Layer

    • Tenant‑scoped vector stores (org, team, personal) with versioned documents.

    • Connectors: Drive/Box/SharePoint/Notion/Confluence/Slack/Email/DBs.

    • Retrieval policies (freshness, source weighting), hybrid search (BM25 + embeddings), tool‑augmented RAG.

    • Memory types: episodic (sessions), semantic (facts), procedural (how to do X in your org).

  8. Action & Integration Layer

    • 100+ connectors (SaaS, CRM/ERP, messaging, ticketing, cloud, payments).

    • Secrets in a KMS‑backed vault, per‑connector least privilege, consented OAuth flows.

    • Human‑in‑the‑loop: approvals for sensitive actions (e.g., “draft vs send”).

  9. Workflow Engine

    • Event bus + task queues for parallelization and exactly‑once semantics.

    • Schedulers (cron/timers), retries with exponential backoff, DLQs.

    • Transactional outbox pattern to keep external systems and FlowGraph in sync.

  10. Governance, Security, Observability

    • RBAC/ABAC, SCIM for user provisioning, tenant isolation at network and data layers.

    • PII detection/redaction, data minimization, DSR APIs (export/delete).

    • Full audit trails (who/what/when), prompt & tool logs with tamper‑evident hashing.

    • Tracing (OpenTelemetry), SLOs, anomaly alerts, cost & carbon meters.

  11. Scalability & Resilience

    • Microservices on Kubernetes, multi‑region active‑active, autoscaling.

    • Circuit breakers, bulkheads, graceful degradation (fallback models/tools).

    • Semantic cache for responses/plans; cold‑start warmers for TTS/ASR.

Latency Budget (Real‑Time Targets)

  • ASR partials: ~150–250 ms to first transcript.

  • NLU + planning: ~50–150 ms (cached policies/models).

  • Tool call roundtrip (in‑VPC): ~80–200 ms; SaaS calls vary.

  • TTS first audio chunk: ~150–250 ms.

  • Perceived latency goal: first voice response < 700 ms, continuous streaming thereafter.

Developer Surface

  • Streams: POST /v1/sessions.connect (WebSocket/WebRTC) for audio + events.

  • Agents: POST /v1/agents.invoke with tool schemas; function/JSON schema enforced.

  • Workflows: POST /v1/flows (define FlowGraph), POST /v1/flows/run.

  • Knowledge: POST /v1/knowledge/sync, GET /v1/knowledge/search.

  • Governance: GET /v1/audit, POST /v1/policies, POST /v1/approvals.

Last updated