7. System & Configuration

Getting your configuration right means your agents have the models, tools, and resources they need — while keeping costs predictable and your data secure. This section covers API keys, smart model routing, rate limits, integrations, and extensibility via MCP servers.

Settings & API Keys

The core Settings page (/settings) handles all low-level platform configurations.

Placeholder: Settings Page

LLM Providers

Cloud Providers: Securely input your API keys for OpenAI, Anthropic, Google Gemini, Groq, OpenRouter, etc. Keys are encrypted via Fernet symmetric encryption before storage.
Ollama (Local Models): If running Sutra on a secure internal network, you can connect to a local Ollama instance. The settings page displays the connection status and allows you to view/manage pulled models (e.g., llama3, qwen).

You can add multiple providers simultaneously. This is recommended — it gives Sutra's Smart Router more options for fallback and cost optimization. Most teams start with one cloud provider (like OpenAI or Anthropic) plus a local Ollama instance for development and testing.

System Configuration

Tune low-level backend behaviors without restarting the server:

Resilience — Circuit Breaker: Sutra wraps LLM provider calls in a three-state circuit breaker (CLOSED → OPEN → HALF_OPEN). If a provider fails 5 times within 60 seconds (configurable), the circuit opens and all requests to that provider are immediately rejected for a 30-second cooldown — preventing cascading failures. After cooldown, one test request is allowed through; if it succeeds, the circuit closes and normal operation resumes.
Resilience — Retry with Backoff: Failed LLM calls are automatically retried up to 2 times with exponential backoff (base delay 1s, max 10s). This handles transient network errors and rate limits without manual intervention.
Memory & Cache: Configure prompt cache TTL (default 30 minutes) and conversation window size (default 20 messages). The prompt cache avoids re-invoking LLMs for identical queries, while conversation windowing summarizes older messages to keep context windows manageable.
Security: Configure CORS origins, CSP headers, and session timeout durations.
Logging Level: Toggle between DEBUG (verbose, for development), INFO (standard), and ERROR (minimal, for production) to control how much detail appears in your system logs.

LLM Purposes (Smart Routing)

Purposes (/purposes) represent a major leap in orchestrating multiple LLMs reliably.

Placeholder: Purposes Routing

Why Use Purposes?

If you hardcode an agent to use gpt-4o, and OpenAI's API experiences an outage, your agent crashes. Instead, you create a Purpose called "High Reasoning" and define a fallback waterfall: 1. Priority 1: claude-3-5-sonnet 2. Priority 2: gpt-4o 3. Priority 3: llama3-70b (via Groq)

If Priority 1 hits a rate limit or a 500 error, Sutra's Smart Router automatically retries the prompt using Priority 2, ensuring 100% uptime for your autonomous workflows.

You can create as many Purposes as your organization needs. Common configurations include:

Heavy Reasoning: Claude Sonnet → GPT-4o → Llama 3 70B. For complex analysis, code generation, and multi-step problem solving.
Fast Drafting: GPT-4o Mini → Claude Haiku → Llama 3 8B. For quick, high-volume tasks like summarization, classification, and simple Q&A.
Creative Writing: Claude Sonnet → GPT-4o. For marketing copy, blog posts, and content that requires a natural, engaging tone.
Code Generation: Claude Sonnet → GPT-4o → DeepSeek Coder. For writing, reviewing, and refactoring code.

Tip: Monitor the Financials dashboard after setting up Purposes to see how often fallbacks are triggered. If your Priority 1 model is consistently hitting rate limits, you may need to upgrade your API tier or redistribute load across more providers.

Rate Limits

Manage usage and prevent runaway costs using Rate Limits (/rate-limits).

Placeholder: Rate Limits

Controlling Consumption

Global Limits: Set maximum daily budgets across the entire platform.
Provider Limits: Cap usage for expensive providers. For example, limit OpenAI to 500,000 tokens per day.
Throttling: When limits are approaching, Sutra gracefully throttles non-urgent background batch jobs while keeping real-time chat responsive.

Rate limits work hand-in-hand with the Smart Router. When a provider approaches its limit, Sutra doesn't just block requests — it reroutes them to the next available provider in the Purpose waterfall. This means your real-time agents stay responsive even during heavy batch processing.

Integrations

Sutra can live where your team lives (/integrations).

Placeholder: Integrations

Messaging Platforms

Slack: Install the Sutra Slack App. You can @mention specific agents in your Slack channels, and they will reply in-thread.
Telegram & WhatsApp: Generate bot tokens to create dedicated mobile chat interfaces for your agents, perfect for on-the-go queries.

Developer Tools

GitHub: Authenticate the GitHub App to allow agents to read repository contents, review PRs, and submit code changes directly.

MCP Servers (Model Context Protocol)

Sutra supports the Model Context Protocol, allowing you to infinitely extend your agents' capabilities (/mcp-servers).

Placeholder: MCP Servers

Adding External Tools

If you have an internal microservice or an external tool provider that supports MCP: 1. Add the MCP Server URL to Sutra. 2. Sutra automatically fetches the list of available tools from the server. 3. You can then whitelist these specific tools on your agents, instantly granting them the ability to query your internal microservices or proprietary APIs.

MCP (Model Context Protocol) is an open standard, which means you're not locked into Sutra-specific tooling. Any MCP-compatible server works — whether it's a community-maintained integration, a tool you built in-house, or a commercial provider. This makes Sutra infinitely extensible without waiting for native integration support.

Example: If your company has an internal inventory API, you can wrap it in a lightweight MCP server (a few dozen lines of Python). Once connected, any Sutra agent can query product availability, check warehouse stock, or update order statuses — all through natural language.

Tool Extensions (Plugin System)

Beyond MCP, Sutra has a built-in extension system for adding custom tools directly in Python. Extensions are auto-discovered from the backend/app/tools/extensions/ directory at startup.

Creating an Extension

Each extension is a single Python file that defines two things:

EXTENSION_MANIFEST — A dictionary with metadata:
- id: Unique identifier (e.g., "alpaca_trading")
- name: Human-readable name
- description: What the extension does
- credential_fields: List of secrets the extension needs (e.g., API keys)
- config_fields: List of non-secret configuration options
- tool_ids: List of tool IDs this extension provides
create_tools(agent_id: str) — A factory function that returns a list of LangChain BaseTool instances.

Once you place a valid extension file in the extensions/ directory, Sutra automatically discovers it on startup. The new tools appear in the agent configuration UI, and you can enable them per agent just like built-in tools.

Extensions can use the get_extension_creds() helper to securely retrieve credentials stored in the Integrations UI — no hardcoded API keys needed.

Tip: Use extensions for organization-specific integrations that don't need a full MCP server — internal APIs, proprietary data sources, or niche third-party services. For general-purpose tools that others might benefit from, consider contributing an MCP server instead.