Skill Performance
"Performance" for skills isn't latency in the traditional sense. It's: does the skill produce reliable behavior; does it consume reasonable context; does it integrate without slowing the conversation?
This page covers the performance considerations.
Context cost
When a skill is invoked, its content loads into Claude's context. Long skills consume more tokens.
The trade-off
- Detailed skill: comprehensive but expensive
- Brief skill: cheaper but may miss cases
For frequently-invoked skills, context cost compounds.
Patterns
- **Brief SKILL.md + reference docs**: load only what's needed
- **Conditional sections**: "if doing X, see references/x.md"
- **External scripts**: complex logic in scripts, not instructions
A 200-line SKILL.md is fine. A 2000-line one suggests refactoring.
Invocation reliability
Does Claude invoke the skill when expected?
Description specificity
Vague descriptions miss invocations. Specific descriptions match better.
Trigger keywords
Including the words users actually say in the description helps matching.
Skip conditions
Explicit "don't invoke when X" prevents wrong invocations.
Workflow efficiency
Skills that compose into workflows shouldn't slow workflows down.
Avoid redundancy
Skill A and Skill B both do similar setup. The second invocation re-does work the first did.
Solution: smaller, more focused skills that don't overlap.
Avoid heavy initialization
Skills that produce long preamble before doing anything useful waste tokens.
Quick wins
If a skill can resolve quickly (the user's question is simple), let it. Don't always go through full procedure for trivial cases.
Tool call efficiency
Skills using tools (Bash, Read, etc.) generate tool calls that consume context.
Batch operations
If a skill needs to read 5 files, batch the reads in one message instead of sequential.
Parallel where independent
Multiple independent operations in parallel (one message, multiple tool calls).
Avoid over-investigation
A skill that explores extensively before doing the actual work consumes context heavily. Match investigation depth to need.
Specific patterns
Lazy reference loading
```markdown
For details on X, see references/x.md (read only if needed)
```
Don't preload all references. Load conditionally.
Cached skill output
For some skills, the output can be cached:
```
First invocation: full computation
Later invocations in same session: use cached result
```
Achievable through conversation memory rather than persistent caching.
Concise instructions
Compare:
```markdown
The objective of this skill is to comprehensively review the code that has been
written, applying a thoughtful and rigorous analysis to identify issues that
may exist in the implementation, with particular attention to...
```
vs:
```markdown
Review code for: bugs, style, security, performance.
```
The second uses 90% fewer tokens; communicates the same thing.
Examples vs. abstract description
Examples are often more compact than the equivalent description. "Like this:" + 5 lines beats "the convention is..." + 30 lines.
When skills are slow
Symptoms:
- Conversations using skill take longer
- Token usage spikes when skill invoked
- Multiple skill invocations cause cumulative slowdown
Diagnosis:
- Read SKILL.md as Claude would; identify wasteful parts
- Check tool call patterns
- Look for over-investigation
Skill versioning and performance
Older versions of a skill may be more compact than newer ones (which accumulate features). Sometimes a refactor reduces context cost while maintaining capability.
Track skill size over time. Bloat is a signal.
Multi-skill performance
When multiple skills load in one conversation, context fills.
For long workflows, consider:
- Process skills first (brainstorming, planning) — may be done before implementation skills
- Don't keep planning skills active during implementation
- Use compact summaries between phases
Subagents for parallelism
For independent work, subagents allow parallel processing:
```
Spawn 3 subagents to investigate 3 different files
Each completes its own work
Main agent synthesizes results
```
Faster wall-clock; isolated context per subagent; results returned compactly.
For independent work, this is dramatically faster than sequential.
Common failure patterns
- **Skill bloat over time.** Each iteration adds; nothing removed.
- **Examples that are essays.** Long examples for trivial cases.
- **Verbose instructions.** Lots of words; little signal.
- **Heavy tools per invocation.** Each call generates many tool calls.
- **No measurement.** Don't know which skills are expensive.
A reasonable approach
For skill design:
1. Start brief; expand only when needed
2. References for depth; SKILL.md for essentials
3. Examples that are compact and concrete
4. Periodic refactor: what can be removed?
5. Measure: which skills cost most context?
Further Reading
- [CustomSkillsArchitecture](CustomSkillsArchitecture) — Skill basics
- [SkillIntegration](SkillIntegration) — Where skills compose
- [TokenMetrics](TokenMetrics) — Adjacent measurement
- [ToolOutputOptimization](ToolOutputOptimization) — Tool-output side
- [AgenticAi Hub](AgenticAiHub) — Cluster index