Skill Debugging

A skill exists; Claude isn't using it correctly. The skill doesn't get invoked when it should; gets invoked when it shouldn't; or produces wrong output.

This page covers diagnosis and fixes.

Common symptoms and causes

Skill not invoked when expected

The user asks for something the skill should handle; Claude doesn't invoke it.

Causes:

Description too vague: doesn't match the user's request
Missing keywords: skill description doesn't mention what user said
Conflicting skills: another skill matched first
Skill not enabled: in user's config but not loaded

Fix: refine the description. Add explicit triggers ("when user asks for X").

Skill invoked at wrong times

Claude invokes the skill for unrelated requests.

Causes:

Description too broad: matches too many things
Generic keywords: triggers on common words

Fix: narrow the description. Add explicit skip conditions ("not for Python files").

Wrong output despite invocation

Claude invokes the skill; produces unexpected results.

Causes:

Ambiguous instructions: multiple interpretations
Conflicting guidance: skill says X but user instructions say Y
Missing context: skill assumes information not present
Examples mismatched: examples don't match the task

Fix: clarify instructions. Add explicit examples. Specify pre-conditions.

The diagnostic process

Reproduce

Get a clear test case. The user request that should invoke the skill; the actual behavior.

If reproduction is intermittent, the issue may be in description matching — sometimes Claude invokes; sometimes doesn't.

Inspect the skill

Read SKILL.md. Pretend you're Claude:

When would I invoke this?
What would I do if invoked?
What's ambiguous?

Test in isolation

Invoke the skill explicitly; see what happens. If invocation produces wrong output, the issue is in instructions. If invocation never happens via natural language, the issue is in description.

Iterate

Skills rarely work perfectly first time. Adjust; test; adjust.

Specific debugging patterns

Description matching

Description: "helps with code style" User: "make this more readable"

Did the description match? If no: what would have matched? Add those terms.

Trigger conditions

Add explicit triggers:

TRIGGER when:
- User asks to refactor for readability
- User says "clean up this code"
- File contains code-smell patterns

Skip conditions

Add explicit skips:

SKIP when:
- File is configuration (yaml, json without code)
- User explicitly says "don't refactor"

Examples that disambiguate

If users invoke the skill in unexpected ways, add examples:

## Examples

User: "make this prettier" → invoke this skill
User: "format this JSON" → DON'T invoke; use formatter instead

Cross-references

If the skill's behavior depends on other skills:

This skill assumes brainstorming has happened. If not, invoke brainstorming first.

Versioning

When a skill changes, you may want to track versions. Skill files don't have built-in versioning, but you can:

Add a version field to frontmatter (informal)
Date the SKILL.md
Use git history

For widely-distributed skills, semantic versioning helps users know about breaking changes.

Testing skills

Manual testing

Try the skill in real conversations. Does it work for the cases you care about?

Test cases

Document expected behavior:

## Tests

Case 1: User asks "do X"
  Expected: skill invokes; produces Y

Case 2: User asks "do unrelated thing"
  Expected: skill does NOT invoke

Run through them periodically.

Edge cases

Specifically test:

Ambiguous requests
Requests that should NOT invoke (negative tests)
Compositions with other skills
Long conversations where context is full

Common pitfalls

Over-specifying

A skill with 2000 lines of instructions has problems. Too much for Claude to apply consistently. Refactor: shorter primary instructions; references to detail.

Under-specifying

A skill with 5 lines of "use best practices" doesn't constrain behavior. Specific instructions produce specific outputs.

Skill instructions that conflict with user instructions

User CLAUDE.md says one thing; skill says another. Per system prompt, user instructions win — but the skill author may not have known.

Document explicit precedence in skill if relevant.

Skills that update assumed state

Skill says "do A then verify" but doesn't actually verify. User assumes verification happened. False sense of completion.

Make verification explicit and traceable.

Test-only skills shipping to production

Skills used during development that shouldn't run in normal use. Either don't ship them, or guard with explicit triggers.

A debugging checklist

When a skill misbehaves:

✓ Is the description specific to when I want it invoked?
✓ Does it have explicit triggers and skip conditions?
✓ Are instructions concrete or abstract?
✓ Are there examples?
✓ Does it assume context that may not exist?
✓ Does it conflict with other skills or user instructions?
✓ Have I tested it in real conversations?

Most issues fall into one of these.

Skill Debugging

Common symptoms and causes

Skill not invoked when expected

Skill invoked at wrong times

Wrong output despite invocation

The diagnostic process

Reproduce

Inspect the skill

Test in isolation

Iterate

Specific debugging patterns

Description matching

Trigger conditions

Skip conditions

Examples that disambiguate

Cross-references

Versioning

Testing skills

Manual testing

Test cases

Edge cases

Common pitfalls

Over-specifying

Under-specifying

Skill instructions that conflict with user instructions

Skills that update assumed state

Test-only skills shipping to production

A debugging checklist

Further Reading