Skill Performance

"Performance" for skills isn't latency in the traditional sense. It's: does the skill produce reliable behavior; does it consume reasonable context; does it integrate without slowing the conversation?

This page covers the performance considerations.

Context cost

When a skill is invoked, its content loads into Claude's context. Long skills consume more tokens.

The trade-off

Detailed skill: comprehensive but expensive
Brief skill: cheaper but may miss cases

For frequently-invoked skills, context cost compounds.

Patterns

Brief SKILL.md + reference docs: load only what's needed
Conditional sections: "if doing X, see references/x.md"
External scripts: complex logic in scripts, not instructions

A 200-line SKILL.md is fine. A 2000-line one suggests refactoring.

Invocation reliability

Does Claude invoke the skill when expected?

Description specificity

Vague descriptions miss invocations. Specific descriptions match better.

Trigger keywords

Including the words users actually say in the description helps matching.

Skip conditions

Explicit "don't invoke when X" prevents wrong invocations.

Workflow efficiency

Skills that compose into workflows shouldn't slow workflows down.

Avoid redundancy

Skill A and Skill B both do similar setup. The second invocation re-does work the first did.

Solution: smaller, more focused skills that don't overlap.

Avoid heavy initialization

Skills that produce long preamble before doing anything useful waste tokens.

Quick wins

If a skill can resolve quickly (the user's question is simple), let it. Don't always go through full procedure for trivial cases.

Tool call efficiency

Skills using tools (Bash, Read, etc.) generate tool calls that consume context.

Batch operations

If a skill needs to read 5 files, batch the reads in one message instead of sequential.

Parallel where independent

Multiple independent operations in parallel (one message, multiple tool calls).

Avoid over-investigation

A skill that explores extensively before doing the actual work consumes context heavily. Match investigation depth to need.

Specific patterns

Lazy reference loading

For details on X, see references/x.md (read only if needed)

Don't preload all references. Load conditionally.

Cached skill output

For some skills, the output can be cached:

First invocation: full computation
Later invocations in same session: use cached result

Achievable through conversation memory rather than persistent caching.

Concise instructions

Compare:

The objective of this skill is to comprehensively review the code that has been
written, applying a thoughtful and rigorous analysis to identify issues that
may exist in the implementation, with particular attention to...

vs:

Review code for: bugs, style, security, performance.

The second uses 90% fewer tokens; communicates the same thing.

Examples vs. abstract description

Examples are often more compact than the equivalent description. "Like this:" + 5 lines beats "the convention is..." + 30 lines.

When skills are slow

Symptoms:

Conversations using skill take longer
Token usage spikes when skill invoked
Multiple skill invocations cause cumulative slowdown

Diagnosis:

Read SKILL.md as Claude would; identify wasteful parts
Check tool call patterns
Look for over-investigation

Skill versioning and performance

Older versions of a skill may be more compact than newer ones (which accumulate features). Sometimes a refactor reduces context cost while maintaining capability.

Track skill size over time. Bloat is a signal.

Multi-skill performance

When multiple skills load in one conversation, context fills.

For long workflows, consider:

Process skills first (brainstorming, planning) — may be done before implementation skills
Don't keep planning skills active during implementation
Use compact summaries between phases

Subagents for parallelism

For independent work, subagents allow parallel processing:

Spawn 3 subagents to investigate 3 different files
Each completes its own work
Main agent synthesizes results

Faster wall-clock; isolated context per subagent; results returned compactly.

For independent work, this is dramatically faster than sequential.

Common failure patterns

Skill bloat over time. Each iteration adds; nothing removed.
Examples that are essays. Long examples for trivial cases.
Verbose instructions. Lots of words; little signal.
Heavy tools per invocation. Each call generates many tool calls.
No measurement. Don't know which skills are expensive.

A reasonable approach

For skill design:

Start brief; expand only when needed
References for depth; SKILL.md for essentials
Examples that are compact and concrete
Periodic refactor: what can be removed?
Measure: which skills cost most context?