Skill Debugging

A skill exists; Claude isn't using it correctly. The skill doesn't get invoked when it should; gets invoked when it shouldn't; or produces wrong output.

This page covers diagnosis and fixes.

Common symptoms and causes

Skill not invoked when expected

The user asks for something the skill should handle; Claude doesn't invoke it.

Causes:

- **Description too vague**: doesn't match the user's request

- **Missing keywords**: skill description doesn't mention what user said

- **Conflicting skills**: another skill matched first

- **Skill not enabled**: in user's config but not loaded

Fix: refine the description. Add explicit triggers ("when user asks for X").

Skill invoked at wrong times

Claude invokes the skill for unrelated requests.

Causes:

- **Description too broad**: matches too many things

- **Generic keywords**: triggers on common words

Fix: narrow the description. Add explicit skip conditions ("not for Python files").

Wrong output despite invocation

Claude invokes the skill; produces unexpected results.

Causes:

- **Ambiguous instructions**: multiple interpretations

- **Conflicting guidance**: skill says X but user instructions say Y

- **Missing context**: skill assumes information not present

- **Examples mismatched**: examples don't match the task

Fix: clarify instructions. Add explicit examples. Specify pre-conditions.

The diagnostic process

Reproduce

Get a clear test case. The user request that should invoke the skill; the actual behavior.

If reproduction is intermittent, the issue may be in description matching — sometimes Claude invokes; sometimes doesn't.

Inspect the skill

Read SKILL.md. Pretend you're Claude:

- When would I invoke this?

- What would I do if invoked?

- What's ambiguous?

Test in isolation

Invoke the skill explicitly; see what happens. If invocation produces wrong output, the issue is in instructions. If invocation never happens via natural language, the issue is in description.

Iterate

Skills rarely work perfectly first time. Adjust; test; adjust.

Specific debugging patterns

Description matching

Description: "helps with code style"

User: "make this more readable"

Did the description match? If no: what would have matched? Add those terms.

Trigger conditions

Add explicit triggers:

```markdown

TRIGGER when:

- User asks to refactor for readability

- User says "clean up this code"

- File contains code-smell patterns

```

Skip conditions

Add explicit skips:

```markdown

SKIP when:

- File is configuration (yaml, json without code)

- User explicitly says "don't refactor"

```

Examples that disambiguate

If users invoke the skill in unexpected ways, add examples:

```markdown

Examples

User: "make this prettier" → invoke this skill

User: "format this JSON" → DON'T invoke; use formatter instead

```

Cross-references

If the skill's behavior depends on other skills:

```markdown

This skill assumes brainstorming has happened. If not, invoke brainstorming first.

```

Versioning

When a skill changes, you may want to track versions. Skill files don't have built-in versioning, but you can:

- Add a version field to frontmatter (informal)

- Date the SKILL.md

- Use git history

For widely-distributed skills, semantic versioning helps users know about breaking changes.

Testing skills

Manual testing

Try the skill in real conversations. Does it work for the cases you care about?

Test cases

Document expected behavior:

```markdown

Tests

Case 1: User asks "do X"

Expected: skill invokes; produces Y

Case 2: User asks "do unrelated thing"

Expected: skill does NOT invoke

```

Run through them periodically.

Edge cases

Specifically test:

- Ambiguous requests

- Requests that should NOT invoke (negative tests)

- Compositions with other skills

- Long conversations where context is full

Common pitfalls

Over-specifying

A skill with 2000 lines of instructions has problems. Too much for Claude to apply consistently. Refactor: shorter primary instructions; references to detail.

Under-specifying

A skill with 5 lines of "use best practices" doesn't constrain behavior. Specific instructions produce specific outputs.

Skill instructions that conflict with user instructions

User CLAUDE.md says one thing; skill says another. Per system prompt, user instructions win — but the skill author may not have known.

Document explicit precedence in skill if relevant.

Skills that update assumed state

Skill says "do A then verify" but doesn't actually verify. User assumes verification happened. False sense of completion.

Make verification explicit and traceable.

Test-only skills shipping to production

Skills used during development that shouldn't run in normal use. Either don't ship them, or guard with explicit triggers.

A debugging checklist

When a skill misbehaves:

1. ✓ Is the description specific to when I want it invoked?

2. ✓ Does it have explicit triggers and skip conditions?

3. ✓ Are instructions concrete or abstract?

4. ✓ Are there examples?

5. ✓ Does it assume context that may not exist?

6. ✓ Does it conflict with other skills or user instructions?

7. ✓ Have I tested it in real conversations?

Most issues fall into one of these.

Further Reading

- [CustomSkillsArchitecture](CustomSkillsArchitecture) — Skill basics

- [SkillComposition](SkillComposition) — Where issues compound

- [SkillDocumentation](SkillDocumentation) — Adjacent practice

- [AgenticAi Hub](AgenticAiHub) — Cluster index