Hermes Agent: Nous Research's Self-Learning AI Runtime
The 95,000-Star Weekend That Reset the Agent Conversation
For most of 2025 and the opening weeks of 2026, the open-source agent conversation belonged to OpenClaw. Its gateway-first design, sprawling community skill marketplace, and willingness to plug into 24+ messaging platforms made it the default choice for anyone running a self-hosted AI across chat apps. Then, on February 25, 2026, a quieter, older kind of research lab shipped a repository that didn't feel like a reaction at all — it felt like a rewrite of the premise.
Nous Research, the collective behind the Hermes model family and the Psyche decentralized training stack, dropped NousResearch/hermes-agent with a tagline that sounded closer to marketing copy than architecture: the agent that grows with you. Seven weeks later, the repository had crossed 95,600 GitHub stars — a trajectory that, by the count of at least one independent tracker, matched the historical growth curves of LangChain and AutoGen combined. By April 20, that number sat at 104,791 stars, 14,957 forks, and a release cadence of roughly every two weeks.
The velocity is worth pausing on. v0.8.0 shipped on April 8 with 209 merged PRs and 82 closed issues. v0.9.0 followed five days later — mobile via Termux, iMessage, WeChat, a Fast Mode for OpenAI and Anthropic, and the deepest security hardening the project had ever shipped. v0.10.0, the "Tool Gateway" release, landed on April 16. That's the shipping rhythm of a well-funded research lab, not a weekend side project — and it's forcing a re-read of what an AI agent is actually supposed to do in 2026.
Who Built Hermes Agent, and Why It Looks the Way It Does
To understand why Hermes Agent feels structurally different from its competitors, you have to understand Nous Research. The lab emerged informally in 2022 across Discord and Twitter as an internet-native collective, and was formalized in 2023 by Jeff Quesnelle, Karan Malhotra, Teknium, and Shivani Mitra. From the beginning, the bet was open-source-first, decentralization-focused, and philosophically allergic to the idea that frontier AI should concentrate inside a small number of closed labs.
That identity threads all the way through Hermes Agent. The Hermes series of fine-tunes — Hermes 1, 2, 3, 4, and the new Hermes 4.3 36B trained on the Psyche decentralized network — were always positioned around three things: reduced refusals, high steerability, and aggressively reliable tool use. The agent framework inherits the same philosophy. It's not trying to be a polished consumer product. It's trying to be a runtime that treats the user as a professional, exposes every lever, and doesn't decide for you which model, which server, or which messaging platform you're allowed to run against.
That shows up most clearly in Nous's continued investment in research-grade infrastructure underneath what is, on the surface, a personal-assistant framework. Hermes Agent ships with Atropos RL environment integration, batch trajectory generation, and trajectory compression — the kind of tooling that only makes sense if you're planning to use the agent to generate training data for the next generation of tool-calling models. Most agent frameworks are applications. Hermes is an application and a data pipeline, built by a lab that trains models.
The Closed Learning Loop: What It Is, How It Works
The single architectural idea that separates Hermes Agent from every other open-source agent framework on the market is the closed learning loop. Most frameworks are built around a three-step cycle: receive task, plan and execute, return result. The state resets on the next task. You re-explain your codebase, your preferences, your stack, your constraints. Over a hundred sessions, you type the same context a hundred times.
Hermes Agent adds a fourth and fifth step to that loop. After the response is returned, the agent checks whether the session is worth persisting — and if a task involved five or more tool calls, it autonomously writes a skill file that documents how it was solved. That skill file is indexed into memory and becomes available to every future session.
The loop is five sequential stages in practice:
Step one is the trigger. A message arrives from the CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, iMessage, WeChat, or a scheduled cron job — they all enter the same synchronous execution engine. Step two is retrieval. The agent queries its persistent memory through SQLite FTS5 full-text search, which returns relevant past skills and notes at roughly 10ms latency across 10,000+ documents. Step three is reasoning and action: the model plans, invokes tools, executes, and streams back output. Step four is the part that doesn't exist anywhere else in the category — the agent receives an internal system-level nudge to evaluate the session and decide whether anything is worth writing down. Step five is persistence: the new skill, the memory update, and any user-model drift are committed back to disk.
This is not 40 percent better output quality. Independent benchmarks from TokenMix.ai confirm that agents carrying 20+ self-created skills complete similar future research tasks roughly 40% faster than fresh agent instances on the same job. The honest caveat everyone in the reviewer space has been careful to put next to that number: the improvement is domain-specific. A skill learned from "summarize a GitHub PR" doesn't transfer to "plan a database migration." Cross-domain generalization remains an unsolved problem, and Hermes doesn't pretend otherwise. What it does promise — and the benchmark evidence supports — is that repeat use inside a narrow domain compounds in a way that pure prompt-engineered agents simply cannot match.
Four Layers of Memory: Session, Persistent, Skills, Honcho
The learning loop is the process. The memory system is the substrate it runs on, and it's split across four distinct layers, each solving a different problem.
The first layer is session memory — ordinary context-window management inside the current conversation. Nothing novel here, but worth noting that Hermes includes /compress, /usage, and /insights slash commands so you can monitor and manage context explicitly rather than waiting for it to silently overflow.
The second layer is persistent memory, stored in a local SQLite database with FTS5 full-text search. This is where completed task outcomes, agent-curated notes, and explicit user-saved memories live. Retrieval benchmarks at roughly 10ms across 10,000+ documents, and the architecture scales comfortably to around 100K documents before you'd want to swap in a dedicated vector DB like Qdrant, Weaviate, or Chroma. Everything sits in ~/.hermes/ on your own machine — no cloud round-trips, no telemetry, no third-party memory provider.
The third layer is the skill document store, which is the output of the learning loop. Skills are Markdown files following the agentskills.io open standard — portable, human-readable, and shareable across Hermes deployments. Crucially, only skill names and brief descriptions load into the system prompt by default; full skill bodies load on demand. That design is the reason the skill library can grow from 40 to 200 without the context cost moving in any meaningful way. As of v0.10.0, 96 bundled skills plus 22 optional ones ship with every install, across 26+ categories spanning MLOps, GitHub workflows, research, scraping, code execution, diagramming, and note-taking.
The fourth layer is Honcho, an optional user-modeling layer built via integration with Plastic Labs' Honcho dialectic system. Honcho passively accumulates your preferences, communication style, tech stack, abbreviations, frequent collaborators, and domain knowledge across sessions. It's the layer that gives Hermes its "grows with you" feel after several hundred interactions — and it's also the layer most users end up enabling only after they've decided the agent is a long-term fixture rather than an experiment. For task-specific automation deployments, the other three layers are usually enough.
One trade-off is worth naming. The memory system is automatic but opaque. You cannot easily export "everything Hermes knows about me" as a single human-readable file, which is what some privacy-minded users expect. OpenClaw's transparent file-based memory model is friendlier on that axis. Hermes chose convenience and retrieval performance over full inspectability, and that's a real design decision you should factor into your deployment, especially if you're operating under GDPR, HIPAA, or CMMC compliance constraints.
The 118-Skill Catalog and the agentskills.io Standard
The skills system is the interface between the learning loop and day-to-day utility. A skill, in Hermes terms, is a Markdown document that describes how to accomplish a specific procedure — which tools to invoke, in what order, with what parameters, and what pitfalls to avoid. The format is deliberately boring: it's a human-readable file, not a binary artifact or a framework-specific DSL. That boringness is the point. Skills are portable across Hermes installs, shareable through the community Skills Hub at agentskills.io, and diff-able in version control.
Two kinds of skills live alongside each other. The bundled catalog — 96 skills shipped in v0.10.0, plus 22 optional ones — covers the ground most operators will recognize: GitHub repository workflows, web scraping, data wrangling, diagram generation, note-taking integration, MLOps pipelines, code execution sandboxes, and the recently-added red-teaming category. These are curated and security-reviewed by Nous Research, which is the core reason Hermes has posted zero agent-specific CVEs as of April 2026. Auto-created skills are the other half — generated by the learning loop itself when the agent notices a complex task is worth preserving. The two streams interleave, and the agent searches across both when it hits a new problem.
The agentskills.io standard is the wider bet here. Because skills are plain Markdown, they're not locked to Hermes. The same file can run inside any framework that implements the open standard, and Nous has been deliberate about contributing the spec back to the community rather than treating it as a moat. As of mid-April, the community Hub was carrying 643 skills — small compared to OpenClaw's 13,000+ community skill repository, but curated in a way that OpenClaw's sprawling marketplace demonstrably isn't.
One skill-quality gotcha is worth surfacing: auto-generated skills from simple tasks (five to ten tool calls) tend to be tight and reusable. Skills generated from very complex multi-phase tasks (fifty-plus tool calls) sometimes over-generalize or bake in too much session-specific context. Manual review of auto-generated skills during the first month of use is worth the time.
One Gateway, Six Channels: Where Hermes Lives
The gateway is the component that turns Hermes from a terminal toy into something you can actually use during a workday. It's a single persistent background service that keeps the agent running and reachable from every messaging platform you've paired with it, routing all incoming messages through a unified session layer so conversation state survives across channel switches.
The shipping platforms in v0.10.0 are Telegram, Discord, Slack, WhatsApp, Signal, CLI, and Email (IMAP/SMTP). The v0.9.0 release added iMessage (via the BlueBubbles bridge), WeChat, and WeCom, pushing the supported-platform count to 16. Home Assistant is in the mix too, which means the gateway can also talk to smart-home events rather than just human users. Voice memos get transcribed automatically, cross-platform continuation works — you can start a conversation in Telegram, pick it up in your terminal, and finish it in Slack without the agent losing the thread.
Setup is the part that historically sinks users on competing frameworks, and Nous has paid attention. hermes gateway setup walks through OAuth tokens and webhook configuration per platform. hermes gateway install registers the service with systemd so it survives reboots. Approval buttons in Slack and Telegram mean sensitive command execution can require a tap on your phone before anything touches your server. MCP OAuth 2.1 with PKCE is in place, so when the gateway proxies through the Model Context Protocol, credentials aren't flying around in the clear.
There's a local web dashboard too, introduced in v0.9.0, for operators who want a graphical view of their agent's state without exposing anything externally. Combined with the Fast Mode priority queue added for OpenAI and Anthropic endpoints, this is the release where the operational ergonomics finally caught up with the architectural ambitions.
Six Terminal Backends: Where Hermes Runs
If the gateway is about where you talk to Hermes, the terminal backend is about where the work actually executes. Hermes ships six execution environments, and the choice between them is one of the more interesting operator decisions the framework forces.
Local terminal is the default — commands run directly on the machine where Hermes is installed. Docker wraps the execution environment in a container with read-only root, dropped capabilities, and PID limits, which is the setup most security-conscious operators default to. SSH remote hands execution off to any reachable server, which is how you get a laptop-native CLI that actually runs against a production box. Daytona and Modal are the serverless options — your agent's working environment hibernates when idle and wakes on demand, which is the scenario where people quote the "runs for nearly nothing between sessions" figure. Singularity covers HPC workloads for research teams running on academic GPU clusters.
The decoupling matters. Hermes deliberately separates where you talk to it from where it runs. You can sit in bed, send a Telegram message, and have it land on a Modal serverless worker that hibernates ninety-nine percent of the time. The $5/month DigitalOcean droplet baseline is real for personal use, but the architecture doesn't assume you'll stay there — it scales up to GPU clusters with the same CLI commands and the same skill library.
Model-Agnostic by Design: Every LLM, No Lock-In
The second-oldest argument in agent-framework design is model lock-in. Hermes sidesteps it by being explicitly, aggressively model-agnostic. hermes model switches providers without code changes, configuration rewrites, or skill re-authoring.
The supported backends ship out of the box: Nous Portal (Hermes 4 70B at roughly $0.13/$0.40 per MTok, Hermes 4 405B at $1.00/$3.00), OpenRouter with its 200+ model menu, Kimi/Moonshot, MiniMax, Xiaomi MiMo, z.ai/GLM, Google AI Studio (added as a native provider in v0.8.0), xAI (also v0.9.0), direct OpenAI, and any custom OpenAI-compatible endpoint. Local deployments via vLLM or Ollama are fully supported, which is the path regulated environments take when air-gapped operation is non-negotiable. Hermes 4.3 36B GGUF variants are the most common choice for that setup — the model card puts it at 93.8% on MATH-500, 87.7% on MMLU, 86.4% on BBH, 71.9% on AIME 24, and 65.5% on GPQA Diamond, which is strong enough for agent workloads at a parameter count that runs on a single prosumer GPU.
The "live model switching across all platforms" feature added in v0.8.0 deserves a mention. You can swap models mid-conversation — drop from Claude Opus 4.7 to Hermes 4 70B to save cost once the hard-reasoning phase of a task is done, and the agent continues on the same session without forgetting what it was doing.
The Tool Gateway: What Shipped in v0.10.0
The v0.10.0 release on April 16 was branded the "Tool Gateway" release, and it's the first version where Nous Portal starts feeling like a full-stack inference and tooling platform rather than just a model endpoint. Paid Portal subscribers now get managed access to a curated tool set through their existing Portal credentials, with no additional API keys to juggle. The initial tool set covers Firecrawl-backed web search, FAL and FLUX 2 Pro image generation, OpenAI TTS, and Browser Use for browser automation.
The design is per-tool opt-in via a new use_gateway config field, and the runtime prefers the gateway over direct API keys when both are configured. That means an operator can set up their own Firecrawl, ElevenLabs, and Browser Use keys, subscribe to Nous Portal, and have the framework automatically fall back to direct keys if the Portal subscription lapses. It's a small piece of plumbing with a big implication: paid Nous Portal subscribers get a managed tool experience that competitors are still making you wire up by hand.
Alongside the Tool Gateway, v0.10.0 also shipped background task auto-notifications (start a long build, get pinged in-agent when it finishes, without polling), free MiMo v2 Pro access on Portal, self-optimized GPT/Codex guidance, smarter inactivity timeouts, approval buttons for sensitive commands, and 209 merged PRs with 82 resolved issues. That last number is the part worth internalizing. This is a project absorbing the kind of contribution volume you usually see on projects five years older.
Security, Privacy, and the Zero-CVE Posture
The security story is where Hermes and OpenClaw most sharply diverge, and it's not a gentle divergence. As of April 2026, Hermes Agent has zero publicly disclosed agent-specific CVEs. In the same window, OpenClaw disclosed nine CVEs across four days in March 2026, including one rated CVSS 9.9. Both are structural outcomes of their respective designs, not accidents.
OpenClaw's 13,000+ community skill marketplace is its breadth advantage, but accepting skill submissions at that volume with minimal review is also why the blast radius of a single malicious skill has been measurable. Hermes's 118 curated skills represent a deliberate restraint. Every built-in skill goes through Nous Research's security review before it ships. Auto-generated skills live in the user's local directory and never propagate to other installs unless explicitly exported.
The infrastructure-level hardening shipped steadily through v0.8 and v0.9: path-traversal fixes, shell-injection mitigations, SSRF protections, RCE-adjacent code paths closed, unified proxy support across SOCKS/Discord/etc., hermes backup and hermes import for safe state migration. Container isolation uses read-only root filesystems, dropped Linux capabilities, and PID limits by default. DM pairing — binding a messaging-app user ID to a specific Hermes instance — prevents the "stranger on Telegram triggers my agent" class of attack.
None of that means the framework is ready for customer-facing production deployment without your own review. It's two months old, the API is not stable between v0.x releases, and Nous Research publishes its own security advisory feed for a reason. For personal, small-team, and internal-tool deployments, the evidence is that the curated model holds up. For public-facing critical workloads, pin versions, monitor the advisory feed, and run your own audit.
Beyond Task Automation: The MLOps and RL Pipeline
Most reviews of Hermes stop at the personal-assistant story. That's a mistake, because the MLOps layer underneath is what actually reveals Nous Research's hand. The project ships three pieces of research-grade infrastructure that would be a full framework on their own at most other labs.
Batch trajectory generation runs thousands of tool-calling trajectories in parallel with automatic checkpointing. Workers, batch sizes, and toolset distributions are all configurable. The output is ShareGPT-formatted conversation logs ready for fine-tuning. Atropos integration connects Hermes directly to Nous Research's reinforcement learning framework — eleven tool-call parsers cover training for essentially any model architecture you'd want to target. Trajectory compression shrinks training-data samples into usable token budgets, which is the part that makes RL on long agent trajectories practical rather than prohibitively expensive.
Pull this thread and you see the strategy. Hermes Agent isn't just a tool. It's also the data collection layer for training the next generation of tool-calling models. Every run, every successful tool sequence, every generated skill is a candidate trajectory for fine-tuning smaller, cheaper, purpose-built models. For a lab whose business is models, not applications, that dual purpose is the whole point.
Hermes Agent vs. OpenClaw: Two Theories of the Agent
No honest write-up skips the comparison. OpenClaw and Hermes Agent are the two most-watched open-source agent frameworks of 2026, and they represent fundamentally different bets about what an agent is supposed to be.
OpenClaw is gateway-first. Its center of gravity is the messaging layer — a ReAct-style brain sits behind a gateway that routes traffic from 24+ platforms, with Markdown memory and plug-in skills authored by users. It's the choice if you value breadth of integration above all else: wide team deployments, public-facing support bots, consumer-grade simplicity. Its star count sits around 345,000 as of early April 2026, reflecting that consumer reach.
Hermes Agent is learning-first. Its center of gravity is the agent loop — persistent memory, autonomous skill creation, user modeling, and research-grade trajectory infrastructure. It's the choice if you're optimizing for depth over breadth: solo developers using a single agent daily, research-heavy workflows, teams that will live with the same agent for six or more months. Its 95.6K stars (104K+ now) reflect a narrower but more intense developer-researcher audience.
Consistent community verdicts have emerged across the Reddit and Discord chatter: OpenClaw wins on ecosystem breadth and initial consumer simplicity, but users report spending more time on Docker, YAML, and 24/7 uptime infrastructure than on actual agent workflows. Hermes wins on setup smoothness, memory depth, and security posture, but covers fewer messaging platforms and has a steeper learning curve if you want to get into skill authoring or profile isolation. For some operators the correct answer is simply: run both. OpenClaw as the universal chat-app gateway, Hermes as the research and automation brain.
The one-line distinction, if you need one: OpenClaw is where you deploy agents to your users. Hermes is where you deploy an agent to yourself.
What Hermes Agent Actually Costs
The framework itself is free. MIT license, no enterprise tier, no usage caps, no Nous Research markup on anything. What you pay for is the LLM backend and optional infrastructure.
Running on budget models (Hermes 4 70B at $0.13/$0.40 per MTok, GPT-5.4 Mini, Claude Haiku 4.5, MiMo v2 Pro which is free on Portal), independent measurement from TokenMix.ai puts average per-task cost at roughly $0.30 for complex agent work. Per-task cost lands in the $0.05–$3.00 range depending on the model and the complexity of the workflow. The fixed overhead is high — tool definitions alone consume around 50% of input tokens, which is structural to how agent frameworks work and not unique to Hermes.
Typical monthly cost scenarios settle into a predictable pattern. A personal assistant doing 30 calls/day on budget models runs $15–30/month. A daily research automation pushing 100 calls/day costs $80–150. A team support agent handling 500 calls/day lands at $200–400. Heavy autonomous workflows at 2,000 calls/day clock in at $800–1,500. A $5 DigitalOcean droplet covers always-on hosting for personal use. Team deployments with scheduled automations want 2 vCPU and 4GB RAM.
The cost-optimization path everyone converges on is multi-model routing: send routine classification, summarization, and FAQ matching to cheap models like GPT-5.4 Nano ($0.07/MTok), and escalate only complex reasoning to Claude Opus 4.7 or GPT-5.4 Standard. Operators report 40–60% bill reductions with no measurable quality loss on routine operations once this is dialed in.
Getting Started in Five Commands
Installation is a single curl-to-bash, which works on Linux, macOS, and WSL2. Native Windows is not supported — run it under WSL2 or skip the framework.
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bashThe installer handles Python 3.11 via uv, Node.js, all dependencies, and the hermes CLI binary. No sudo, no prerequisites except git. After that, five commands cover the entire operator surface: hermes to start the interactive CLI, hermes setup to run the full wizard, hermes model to choose your LLM provider, hermes gateway setup to wire up messaging platforms, and hermes update to pull the latest version.
If you're migrating from OpenClaw, hermes claw migrate pulls across your SOUL.md persona, MEMORY.md and USER.md entries, user-created skills (into ~/.hermes/skills/openclaw-imports/), command allowlist, messaging settings, allowlisted API keys, TTS assets, and workspace instructions. --dry-run previews the move before it commits. This migration path is the quiet reason a lot of the OpenClaw audience has been willing to kick the tires — it's a weekend move, not a rebuild.
Self-learning is disabled by default, which is the single most common first-run surprise. If you want the learning loop to actually run, hermes config set memory.persistent true and hermes config set skills.autogen true are the two settings that matter. Without them, Hermes behaves like a standard single-session agent and the "grows with you" promise doesn't materialize.
Known Limitations and Gotchas
An honest read on where the framework is still rough. Self-learning off by default is the first. Code generation is the second — Hermes is explicitly conversational-agent-first, and for production code Cursor, Windsurf, or Claude Code outperform it. Using Hermes to generate application code is technically possible but not the intended path, and skill quality for code-heavy workflows reflects that.
API stability between v0.x releases is not guaranteed, and at two months old, the framework is shipping breaking changes every two weeks on the main release branch. Production users should pin exact versions and read every changelog. Platform coverage is narrower than OpenClaw's — six first-class messaging platforms plus Matrix and experimental iMessage/WeChat vs. OpenClaw's 24+. If your user base is on LINE, Teams, or anything outside the core six, verify support before you commit.
Memory opacity is the soft limitation. There's no single-file export for "everything Hermes knows about me," which creates friction for GDPR compliance workflows and for users who want to audit what's been retained. The skill-quality issue mentioned earlier — auto-generated skills from 50+ tool-call sessions over-generalizing — is real and requires manual skill pruning during the first month.
Documentation is improving but still catching up to the shipping pace. Sections are marked incomplete, and the community is young enough that Stack Overflow answers effectively don't exist yet. The Discord and GitHub Discussions are where help actually lives.
When Hermes Is the Right Choice — and When It Isn't
Pick Hermes Agent if you're a solo developer wanting a daily personal AI assistant, a researcher running an agent in the same domain for six-plus months, a privacy-sensitive enterprise deploying on-premise with local LLM inference, a team looking to generate training data or run RL experiments, or an operator who wants a single agent that compounds rather than a fleet that resets. The self-improvement compounds slowly, but it compounds, and that compounding is the actual product.
Pick OpenClaw — or custom LangGraph, or Cursor/Windsurf/Claude Code, or SAP Joule/Microsoft Copilot Studio — if you need 20+ chat platform integrations out of the box, a mature customer-facing agent with predictable enterprise support, a code-generation-first workflow, a sub-500ms latency budget where agent framework overhead is disqualifying, or a production deployment against customer data where two-month-old v0.x software is a non-starter.
The heuristic that actually survives contact with real teams: if you'll use the agent for fewer than three months or need breadth over depth, Hermes is the wrong pick. If you'll live with it for six-plus months and value depth over breadth, Hermes is the framework that will still be delivering marginal improvement at month nine while every other option stopped compounding at week one.
The Release Cadence and What Comes Next
The two-week shipping rhythm is not slowing down. v0.11.0 is already in the 180+ commit range on the main branch, with the Tool Gateway design expanding toward more tool categories, pluggable context engines going mainstream, and continued hardening of the plugin system introduced in v0.8. The Psyche decentralized training network is an active investment — Hermes 4.3 being the first model trained on it suggests Nous is treating decentralized training as production infrastructure, not a one-off experiment, and subsequent model releases will almost certainly lean on it.
If the Nous pattern from Hermes 3 holds, a Hermes 5 release is a reasonable expectation later in 2026, with an expanded post-training corpus and likely larger parameter counts as the Psyche network matures. The Nous Tool Gateway is the piece to watch most closely — if it continues expanding to cover retrieval, specialized code execution, and domain-specific data sources, Hermes Agent on Nous Portal becomes meaningfully more capable for paying subscribers without any external service configuration at all.
The EU AI Act timeline matters too. The August 2, 2026 starting date means Hermes deployments targeting European users will need audit logging, transparency documentation, and a skill-vetting process on top of the default install. Microsoft's Agent Governance Toolkit — released under MIT on April 2 — covers many of the required compliance patterns for open-source frameworks and maps cleanly onto the OWASP Agentic Top 10. For regulated deployments, that stack plus Hermes is a reasonable starting point while the framework itself matures past v1.0.
The Bigger Picture: Agents That Compound
The Stanford HAI AI Index 2026 data point that keeps circulating is worth repeating: agents moved from question-answering to task-completion in 2025, but still fail roughly one in three attempts on structured benchmarks. On OSWorld, agent accuracy rose from around 12% to 66.3% in a year, which is remarkable progress but still six percentage points below human performance. In an environment where raw model intelligence is closing in on a ceiling, the value increasingly accrues to the layer above the model: memory, workflow recovery, tool orchestration, and repeatability.
Hermes Agent is a bet on that layer. The closed learning loop, the four-layer memory, the skill system, the trajectory pipeline — these are all expressions of a thesis that says the next interesting AI product won't be a smarter model, but a model that stops forgetting. That thesis may or may not be correct. But the fact that a seven-week-old open-source project is growing faster than LangChain and AutoGen combined suggests a meaningful slice of the developer community is ready to hear it.
For anyone still rebuilding context on every turn — re-explaining the codebase, re-specifying the stack, re-describing the preferences — Hermes Agent is the first open-source framework where that complaint has a convincing architectural answer. Whether that answer is the one you want depends entirely on whether depth-over-breadth is the trade-off you'd actually make. For the audience that would: this is the runtime worth installing, running for a month, and re-evaluating once you've watched 20 auto-generated skills accumulate. The compounding is real. The question is whether you stay long enough to feel it.