Skill Performance

"Performance" for skills isn't latency in the traditional sense. It's: does the skill produce reliable behavior; does it consume reasonable context; does it integrate without slowing the conversation?

This page covers the performance considerations.

Context cost

When a skill is invoked, its content loads into Claude's context. Long skills consume more tokens.

The trade-off

- Detailed skill: comprehensive but expensive

- Brief skill: cheaper but may miss cases

For frequently-invoked skills, context cost compounds.

Patterns

- **Brief SKILL.md + reference docs**: load only what's needed

- **Conditional sections**: "if doing X, see references/x.md"

- **External scripts**: complex logic in scripts, not instructions

A 200-line SKILL.md is fine. A 2000-line one suggests refactoring.

Invocation reliability

Does Claude invoke the skill when expected?

Description specificity

Vague descriptions miss invocations. Specific descriptions match better.

Trigger keywords

Including the words users actually say in the description helps matching.

Skip conditions

Explicit "don't invoke when X" prevents wrong invocations.

Workflow efficiency

Skills that compose into workflows shouldn't slow workflows down.

Avoid redundancy

Skill A and Skill B both do similar setup. The second invocation re-does work the first did.

Solution: smaller, more focused skills that don't overlap.

Avoid heavy initialization

Skills that produce long preamble before doing anything useful waste tokens.

Quick wins

If a skill can resolve quickly (the user's question is simple), let it. Don't always go through full procedure for trivial cases.

Tool call efficiency

Skills using tools (Bash, Read, etc.) generate tool calls that consume context.

Batch operations

If a skill needs to read 5 files, batch the reads in one message instead of sequential.

Parallel where independent

Multiple independent operations in parallel (one message, multiple tool calls).

Avoid over-investigation

A skill that explores extensively before doing the actual work consumes context heavily. Match investigation depth to need.

Specific patterns

Lazy reference loading

```markdown

For details on X, see references/x.md (read only if needed)

```

Don't preload all references. Load conditionally.

Cached skill output

For some skills, the output can be cached:

```

First invocation: full computation

Later invocations in same session: use cached result

```

Achievable through conversation memory rather than persistent caching.

Concise instructions

Compare:

```markdown

The objective of this skill is to comprehensively review the code that has been

written, applying a thoughtful and rigorous analysis to identify issues that

may exist in the implementation, with particular attention to...

```

vs:

```markdown

Review code for: bugs, style, security, performance.

```

The second uses 90% fewer tokens; communicates the same thing.

Examples vs. abstract description

Examples are often more compact than the equivalent description. "Like this:" + 5 lines beats "the convention is..." + 30 lines.

When skills are slow

Symptoms:

- Conversations using skill take longer

- Token usage spikes when skill invoked

- Multiple skill invocations cause cumulative slowdown

Diagnosis:

- Read SKILL.md as Claude would; identify wasteful parts

- Check tool call patterns

- Look for over-investigation

Skill versioning and performance

Older versions of a skill may be more compact than newer ones (which accumulate features). Sometimes a refactor reduces context cost while maintaining capability.

Track skill size over time. Bloat is a signal.

Multi-skill performance

When multiple skills load in one conversation, context fills.

For long workflows, consider:

- Process skills first (brainstorming, planning) — may be done before implementation skills

- Don't keep planning skills active during implementation

- Use compact summaries between phases

Subagents for parallelism

For independent work, subagents allow parallel processing:

```

Spawn 3 subagents to investigate 3 different files

Each completes its own work

Main agent synthesizes results

```

Faster wall-clock; isolated context per subagent; results returned compactly.

For independent work, this is dramatically faster than sequential.

Common failure patterns

- **Skill bloat over time.** Each iteration adds; nothing removed.

- **Examples that are essays.** Long examples for trivial cases.

- **Verbose instructions.** Lots of words; little signal.

- **Heavy tools per invocation.** Each call generates many tool calls.

- **No measurement.** Don't know which skills are expensive.

A reasonable approach

For skill design:

1. Start brief; expand only when needed

2. References for depth; SKILL.md for essentials

3. Examples that are compact and concrete

4. Periodic refactor: what can be removed?

5. Measure: which skills cost most context?

Further Reading

- [CustomSkillsArchitecture](CustomSkillsArchitecture) — Skill basics

- [SkillIntegration](SkillIntegration) — Where skills compose

- [TokenMetrics](TokenMetrics) — Adjacent measurement

- [ToolOutputOptimization](ToolOutputOptimization) — Tool-output side

- [AgenticAi Hub](AgenticAiHub) — Cluster index