9. Self-Improvement

Most AI systems stay exactly as good (or bad) as the day you configured them. Sutra is different. The Evolve Engine continuously analyzes your platform's health, identifies issues, monitors the competitive landscape, and generates actionable improvement suggestions — so your autonomous organization keeps getting better without constant manual investigation.

The Evolve Engine

The Evolve page (/evolve) provides complete visibility into Sutra's self-analysis and improvement suggestions.

Placeholder: Evolve Dashboard

1. Daily Platform Analysis

Every day at 6 AM (configurable), the Evolve Engine runs an automated analysis of your entire platform: - Platform Stats: Total agent count, active agents, total invocations, and overall system health. - Error Patterns: Per-agent error rates, common failure modes, and recurring issues. - Agent Performance: Invocation counts, average latency, and error rates for each agent. - System-Level Errors: Route failures, startup issues, and infrastructure problems.

The engine feeds all of this data into an LLM-driven analysis that generates specific, actionable improvement suggestions. Each suggestion includes a title, description, priority level, evidence from the analysis, and a recommended action type.

2. Competitor Monitoring

Every Monday at 9 AM (configurable), the Evolve Engine monitors competitor open-source repositories — including crewAI, AutoGen, LangGraph, and Dify — for new releases and feature announcements.

  • Release Tracking: Detects new versions and release notes from competitor GitHub repositories.
  • Gap Analysis: Generates suggestions when competitors ship features that Sutra doesn't yet offer, helping you stay competitive.

3. Suggestion Lifecycle

Each suggestion follows a structured lifecycle with human oversight: - Proposed: The engine generates a suggestion based on its analysis. - Pending Approval: The suggestion is submitted for human review via the approval queue. - Approved: A human operator confirms the suggestion is worth pursuing. - In Progress: The suggestion is being implemented (as a Forge request, task, or goal). - Completed: The improvement has been implemented and verified.

Suggestions are categorized by type — platform health, error pattern, performance, competitor gap, or feature idea — and prioritized as low, medium, high, or critical. The Evolve dashboard lets you filter, review, approve, or dismiss suggestions, giving you full control over which improvements get implemented.

Getting Started: The Evolve Engine starts generating useful suggestions as soon as your agents have some execution history. Let your agents run for a few days to build up enough data, then check the Evolve dashboard for the first round of insights. The most immediately actionable suggestions are usually in the "error pattern" and "platform health" categories.

By leveraging the Evolve Engine, your autonomous organization continuously identifies its own weaknesses and proposes fixes — much like a dedicated operations team running daily health checks.

Manual Optimization

The Evolve Engine handles automated analysis, but you can also improve agents manually at any time. The Conversations log in the Monitoring dashboard is your best friend here — review agent responses, spot patterns in mistakes, and update system prompts accordingly.

Common manual optimizations include:

  • Adding edge-case rules: "When the user asks about pricing, always check the latest pricing document before responding — never rely on training data."
  • Tightening output format: "Always respond with a structured JSON object containing 'summary', 'action_items', and 'next_steps' fields."
  • Improving tool usage: "Before answering any technical question, search the knowledge base first. Only use your own knowledge if the search returns no relevant results."

Manual changes and Evolve Engine analysis work together. When you update an agent's configuration, the Evolve Engine's next daily analysis will use the updated state as its baseline, tracking whether the change improved performance metrics.