When Hermes finishes a task, it doesn't move on. Hermes creates a Skill object — structured code, test cases, and example pairs — and stores it for every similar task that follows. Skills compound over time. The agent in month three handles edge cases that month one missed, not because the model was retrained, but because the Skill library grew.
What does Hermes build when it completes a task?
Hermes creates a Skill object from each completed task. A Skill object is a structured document containing four elements:
- Task category — how Hermes classifies this type of task (e.g., "candidate-application", "invoice-chase", "client-status-update")
- Code approach — the method Hermes used to complete the task, stored as executable code
- Test cases — input/output pairs derived from the task, used to validate future skill applications
- Example pairs — a set of specific inputs and the correct outputs for each
Skill creation is automatic — no configuration required after context definition. The trigger is task completion, not a scheduled training run. A Skill built from a candidate application processed on a Monday is available to apply to the next candidate application that arrives on Tuesday.
The Skill object structure means Hermes does not need to process a task from scratch each time. When a new candidate application arrives, Hermes checks whether an existing Skill matches the task category. If it does, Hermes applies the code approach, runs it against the test cases, and uses the example pairs to calibrate the output. The result is an output that reflects everything Hermes has learned from every similar task — not just the most recent one.
This is the mechanism behind the claim that Hermes "gets more capable the longer it runs." Capability improvement is not a change to the underlying model — it is an expanding library of specific approaches for specific task types. Each task completed is a vote for what the right approach looks like. Each correction is a vote for what it does not look like. Over time, the Skill for a well-run workflow reflects hundreds of examples of what good output looks like and dozens of examples of what to avoid.
Nous Research describes Hermes as "an intelligent personal assistant that gets more capable the longer it runs" — this is the mechanism behind that claim: each task completed adds to a growing library of structured approaches.[¹]
How do skills improve over time?
Skills compound. Month three handles edge cases month one missed.
Skills improve through accumulation. Each time Hermes encounters a task that matches an existing Skill category, Hermes applies the Skill, evaluates whether the output matches the expected format, and adds the new input-output pair to the Skill's example set. A Skill that has processed 50 candidate applications is more accurate on format variants than one that has processed 5.
The Hermes learning mechanism operates at the inference layer — not by retraining the underlying model. Skills are Skill objects: code, tests, and examples for each task category. The model itself does not change. What changes is the library of approaches Hermes has for your specific workflows.
The practical effect is a performance curve. In the first two weeks, Hermes handles the most common task variants correctly. In months two and three, the same instance handles edge cases that previously required escalation — because every completed task has added examples to the relevant Skill. A Hermes instance running a recruiting agency's candidate triage workflow in month three understands variant email formats, partial resumes, and forwarded applications in ways the month-one instance could not, because every one of those has added to the candidate-application Skill.
| Timeline | Skill state | Handled without escalation | Still escalated |
|---|---|---|---|
| Days 1–7 | Skills initialized from context definition | Most common task variants in defined categories | Any input that differs significantly from context examples |
| Weeks 2–4 | Skills accumulating from live completions | Common variants plus first-seen deviations | Unusual input formats, partial information, novel request types |
| Month 2–3 | Skills maturing across high-volume categories | Most edge cases from real use | Genuinely novel situations with no prior examples |
| Month 4+ | Steady state for established workflows | All established patterns reliably | Only completely new workflow types |
The rate of progression through this table depends on task volume. A workflow processing 50 tasks per week reaches month-two performance faster than one processing 5 per week. Volume amplifies the compounding — but the direction of improvement is the same regardless of volume.
Where are skills stored and can they be shared?
Skills are stored at agentskills.io, an open standard for agent skill exchange.[²] The agentskills.io registry stores Skill objects as structured files — code, tests, and examples — compatible with other agent systems including Cursor, GitHub Copilot, and Claude Code.
A Skill built by one Hermes instance can be exported to agentskills.io and imported by another. A business running two regional Hermes instances shares Skills between them — a candidate-application Skill built from the UK office's email patterns is available to the Germany office. Skills don't have to be built twice for the same task category.
The open standard also means Skills aren't locked to Hermes. A Skill built from a recruiting workflow can be made available to other teams using different agent systems that support the agentskills.io format. For a full guide to deploying Hermes and setting up context definition — the step that determines skill quality — see the Hermes setup guide.
For businesses running multiple Hermes instances — different offices, different teams, or different subsidiaries — the skill sharing mechanism has a direct operational implication. Skills do not have to be built independently in each instance. An instance that has processed 300 candidate applications has built a candidate-application Skill that reflects 300 examples of what correct outputs look like. That Skill is exportable to any other instance. An instance that starts with an imported Skill inherits the coverage of 300 prior examples rather than starting from the context definition alone.
The practical benefit is most visible when a new workflow is introduced to an established deployment. Without skill sharing, the new instance starts cold: context definition, review-only testing, and gradual skill accumulation from week one. With an imported Skill from a similar workflow at a comparable business, the new instance starts with a richer baseline and typically reaches steady state faster.
What does this mean for a business running Hermes?
The skill compounding effect has two practical implications. The mechanism is consistent with how Anthropic's 2024 "Building Effective Agents" guide describes the value of structured memory in agent systems: agents that store and retrieve structured prior work — rather than relying on context alone — produce more consistent outputs over time and require less human correction per task.[³]
First, Hermes gets better at your specific workflows, not at general tasks. Skills encode what your tasks actually look like — your clients, your output formats, your escalation patterns. A Hermes instance that has processed your recruitment workflows for three months understands your specific candidate types and reply conventions in a way a freshly deployed instance does not. The improvement is specific to your context, not a generic upgrade.
This specificity is not a limitation — it is the design. A general-purpose skill library built from thousands of businesses' email patterns would include conventions that do not match your business and would need to be overridden constantly. A skill library built from your own task completions reflects the specific tone, format, and exception handling that your clients and team expect. The longer the instance has run, the more those expectations are encoded in the Skill library rather than inferred from the context definition alone.
Second, error rates decrease over time. A task category that required correction 30% of the time in month one will require fewer corrections by month three — as long as the context was defined accurately at the start. Poorly defined context produces Skills that encode incorrect approaches. Getting context definition right in week one is the most important lever for skill quality in month three. For a broader understanding of how AI agents work, see what is an AI agent. For Hermes's full capabilities, see what is Hermes.
What affects skill quality
Three factors determine how good Hermes's skills become over time.
Context definition accuracy. Skills encode from the first completed task. If the context definition's example inputs are generic rather than drawn from real task data, the skills built from the first 50 tasks reflect that genericity. The correction rate in month one is the most visible signal of context definition quality — a high correction rate in week two suggests the examples did not represent actual input variation.
Task volume. A skill improves by accumulating examples. A workflow processing 50 tasks per week produces usable skill coverage in 2–3 weeks. A workflow processing 5 tasks per week takes longer. Volume cannot substitute for good context definition — but it amplifies the improvement once context is correctly set.
Correction quality. Every time a human corrects a Hermes output, the correction becomes an example pair in the skill. Corrections that show exactly what was wrong and what the correct output looks like produce better skills than approvals without feedback. A correction that edits the output without explaining why ("here's the right version") is less useful than one that annotates the change ("this used first-person — should be third because the message goes to the client").
Skill drift and maintenance
Skills are stored, not static — but they are also not self-maintaining. A skill built from context that no longer reflects the workflow will produce increasingly incorrect outputs over time. The skill itself has not degraded; the context it was built from has drifted from the current expectation.
This happens when client communication standards change, when a new team member handles outputs differently than the previous one, or when the workflow itself evolves. The signal is a rising correction rate on a task category that was previously stable. A task category that required correction 5% of the time in month two and now requires 20% correction in month six has experienced context drift, not model degradation.
The fix is updating the context definition for that task category: add new examples that reflect current standards, remove examples that reflect outdated patterns, and update the output format annotation. After the update, skill quality typically recovers within two to three weeks of live operation.
| Drift signal | Likely cause | Fix |
|---|---|---|
| Rising correction rate on stable category | Output standards changed; context not updated | Update example pairs and output format annotation |
| Correct outputs on main path, failures on edge cases | Edge cases not represented in context examples | Add 3–5 examples of edge case inputs and their correct outputs |
| Consistent error on one specific field or format | Field format changed in connected system | Update field mapping reference in context definition |
| High escalation rate despite rising task volume | Context still reflects early examples only | Add recent task examples to expand Skill coverage |
Frequently asked questions
How does Hermes learn from completed tasks? Hermes creates a Skill object when a task is completed. The Skill object contains the task category, the code approach used, test cases derived from the task, and example input-output pairs. On the next similar task, Hermes applies the Skill and adds the new example to it. Skills improve as more examples accumulate. Corrections made by human reviewers are also incorporated as example pairs — so the correction process is not separate from the learning process; it is part of it.
Does skill quality degrade if Hermes is not used for a period? Skill objects are stored persistently at agentskills.io and do not degrade from inactivity. A Hermes instance that is paused for a month and restarted resumes with the same Skill library it had at pause. The underlying model is not retrained and does not forget. The only situation where skill quality degrades is when the business's workflows or output standards change and the context definition is not updated to reflect those changes.
Does Hermes retrain the underlying model? No. The Hermes learning mechanism operates at the inference layer. The underlying language model does not change. Skills are stored as structured Skill objects at agentskills.io — code, tests, and examples — that Hermes applies when handling similar tasks. The model stays fixed; the Skill library grows.
Can skills built by one Hermes instance be used by another? Yes. Skills are stored at agentskills.io, an open standard for agent skill exchange. A Skill built by one Hermes instance can be exported and imported by another. Skills are also compatible with other agent systems that support the agentskills.io format, including Cursor and Claude Code.
How long does it take for Hermes to improve noticeably? The most common task variants are typically handled correctly within the first two to four weeks as Skills accumulate from real task completions. Edge case handling improves through months two and three. The rate of improvement depends on task volume — more completed tasks produce more Skill examples faster — and on correction quality, since corrected outputs contribute the most informative examples to the Skill library.
Notes
- Nous Research, Hermes documentation. https://hermes-agent.nousresearch.com/docs/
- agentskills.io, open standard for agent skills. https://agentskills.io
- Anthropic. "Building Effective Agents." Anthropic Research, December 2024. https://www.anthropic.com/research/building-effective-agents
For a full guide to deploying Hermes and writing the context definition that sets skill quality from day one, see the Hermes setup guide.