3. Communication & Collaboration

Sutra gives you multiple ways to interact with your agents — from quick 1-on-1 chats to structured multi-agent discussions where your AI team collaborates in real time. Every conversation is transparent, cost-tracked, and enriched with context from the agent's memory system.

Chat: Direct Interaction

The Chat interface (/chat) is the primary way a human user interacts 1-on-1 with a specific agent.

Placeholder: Chat Interface

Advanced Chat Features

Real-Time Streaming: Responses are streamed via WebSockets for low latency.
Tool Execution Visibility: When an agent decides to use a tool (e.g., google_search), a specialized UI block appears in the chat. You can see exactly what query the agent ran and the raw JSON result it received. This makes debugging agent reasoning entirely transparent.
Token Analytics: Each message displays a micro-badge showing exactly how many prompt and completion tokens were consumed, helping you monitor costs.
Context Injection: Agents automatically pull in relevant Core and Recall memory transparently as you chat, ensuring they remember what you discussed last week.

Tip: Use the token analytics display to spot agents that are consuming more tokens than expected. If an agent's responses are consistently long, consider tightening its system prompt with explicit length constraints (e.g., "Keep responses under 300 words unless the user asks for detail").

Conversation Windowing

Long conversations can quickly exceed an LLM's context window, degrading quality and increasing costs. Sutra handles this automatically with conversation windowing.

How it works: Sutra keeps the most recent 20 messages (configurable) in the agent's context window. When a conversation exceeds this limit, older messages are summarized by a lightweight LLM into a concise overview, and only the summary plus the recent messages are sent to the agent.
Summary caching: Summaries are cached for 1 hour (configurable) in Redis, so repeated queries in the same conversation don't trigger redundant summarization calls.
Result: Agents maintain coherent context even in conversations spanning hundreds of messages, without ballooning token costs. The trade-off is that very old messages lose some granularity — but the most recent context is always preserved in full.

Prompt Caching

For repeated or similar queries, Sutra uses Redis-backed prompt caching to avoid re-invoking the LLM. When the same model receives the same system prompt and recent messages, the cached response is returned instantly — reducing both latency and cost. The cache uses a 30-minute TTL (configurable) and only applies to non-tool-using, deterministic prompts to ensure freshness.

Discussions: Multi-Agent Collaboration

Some problems are too complex for a single agent. Discussions (/discussions) allow multiple agents to collaborate in a shared virtual room.

Placeholder: Discussion Room

Discussion Formats

When creating a discussion, you define its format and participants:

Brainstorming: Agents are encouraged to freely generate ideas. Useful for creative tasks or architecture planning.
Debate: Assign agents opposing personas (e.g., a "Security Advocate" vs. a "Feature Velocity Developer"). They will argue the merits of an approach, leading to a highly refined final conclusion.
Standup: A structured format where agents sequentially report on their current tasks, progress, and blockers.
Review: Agents take turns providing structured feedback on a topic. Useful for code reviews, design reviews, or any scenario where every specialist needs to weigh in — for example, a security agent, a performance agent, and a UX agent each reviewing the same feature proposal.
Retrospective: Agents reflect on what went well and what could improve. Ideal for post-project analysis or sprint retrospectives where you want multiple perspectives on process improvements.

Human Intervention

Discussions aren't closed off. A human operator can observe the debate in real-time and inject messages to steer the conversation, correct misunderstandings, or formally conclude the discussion and ask an agent to summarize the findings.

Discussions support auto-summarization — when a discussion concludes, Sutra automatically generates a summary with extracted action items. Summaries are stored as part of the discussion record and can be referenced by agents in future discussions.

Email Integration

Agents don't have to be confined to the Sutra dashboard. Through the Email module (/email), agents can communicate directly with external stakeholders.

Placeholder: Email Configuration

Outbound Email

By equipping an agent with the send_email tool, they can autonomously send reports, alerts, or summaries. For example, a scheduled agent can analyze your logs and email a daily summary to the engineering team.

All outbound emails pass through the approval system by default. You can mark the send_email tool as auto-approved for specific agents if you trust their output — for instance, a reporting agent that sends the same formatted digest every morning. High-stakes emails (client-facing, external) should always require human review.

Inbound Email (Beta)

With Gmail integration, agents can read incoming emails, classify them, and draft or send replies. This allows you to set up an autonomous IT Support agent that reads incoming emails to support@yourcompany.com, queries the internal knowledge base, and replies to the user—all autonomously.

Inbound email processing works best when paired with the Knowledge Base. Upload your FAQ documents, product guides, and troubleshooting manuals, and the agent will search them before composing a reply — giving customers accurate, sourced answers instead of generic responses.

Tip: Start with a "draft-only" configuration where the agent saves replies to Gmail Drafts instead of sending them directly. This lets you review the quality of responses for a few weeks before enabling fully autonomous replies.