AI Agents Workshop with Yuval KeshtcherLearn about upcoming cohorts →
AI Makers Lab

Build a Recursive Self-Improvement Skill with Claude Code

0/5
All tutorials
Marketing
Advanced
15-20 minutes
5 milestones

Build a Recursive Self-Improvement Skill with Claude Code

What you'll build

A `/improve` skill that generates marketing copy, scores it against your personal quality checklist, diagnoses what's weak, rewrites, and loops until every criterion passes — then stress-tests against a skeptical buyer.

Milestone 0 of 50% complete

The Problem

You prompt Claude for a landing page headline. It comes back sounding fine. You ship it. A week later you're staring at the conversion rate thinking "I should've caught that." The headline was clever but had no curiosity gap. The subhead was smooth but didn't hit a pain point. You knew what good looked like — you just never forced Claude to check its own work against your standards before handing it to you.

What You're Building

A Claude Code skill that never ships a first draft. When you type /improve, Claude generates your copy, scores it against a checklist you define (curiosity gap: 8/10 minimum, emotional trigger: 7/10, etc.), tells you exactly what failed, rewrites, re-scores, and keeps looping until every criterion passes. Then it throws a skeptical buyer at the output. If it survives, it ships. The pattern: generate → evaluate → diagnose → improve → repeat.


Milestone 1: Define Your Scoring Checklist

You have standards — you just haven't written them down. Let's turn your gut instinct about "good copy" into a scoring checklist Claude can use.

Prompt:

I write landing page headlines regularly and I want to build a scoring system for them. Help me define 5-6 criteria that separate great headlines from mediocre ones. For each criterion, give it a name, a description of what a 10/10 looks like, and a minimum passing score (7-9 out of 10). Think about: does it stop the scroll? Does it create a curiosity gap? Does it hit a real pain point? Does it feel specific vs generic? Save this as ~/improve-criteria.md in a clean table format.

What Claude Code does: Claude takes your vague sense of quality and turns it into measurable scoring criteria. Instead of "make it good," you now have specific thresholds: "curiosity gap: 8/10 minimum — the reader must need to know what comes next." This checklist becomes the engine that powers every iteration.

Try it: Open ~/improve-criteria.md. You should see 5-6 criteria in a table with names, descriptions, and minimum scores. Read each one — does it match how you actually judge headlines? Edit any that feel off. This is your quality bar, not Claude's.


Milestone 2: Generate and Self-Score

Now let's make Claude write a headline and immediately grade its own work — before you even look at it.

Prompt:

Read my scoring criteria from ~/improve-criteria.md. Now write a landing page headline and subhead for an online UX writing course. After writing it, score your own output against every criterion in the checklist. For each criterion, give a score out of 10 and a one-sentence explanation of why. Be brutally honest — don't inflate scores. Show the total score and which criteria passed vs failed.

What Claude Code does: Claude generates the headline, then evaluates its own output against your checklist. This is the key shift — instead of you being the quality filter, Claude catches its own weak spots first. You'll typically see 2-3 criteria fail on the first attempt. That's the point. Now you can see exactly where the gap is.

Try it: Read the scores. Do they feel accurate? If Claude scored "curiosity gap: 9/10" but you'd give it a 5, tell it: "Your curiosity gap score is inflated — I'd score that a 5. Recalibrate." Training Claude to score honestly is part of the process.


Milestone 3: The Recursive Loop

This is where it gets powerful. Instead of you reading the diagnosis and asking for a rewrite, Claude does the full loop automatically: diagnose → rewrite → re-score → repeat. We'll use a different product to prove the pattern works for anything.

Prompt:

Read ~/improve-criteria.md again. Generate a landing page headline and subhead for an AI-powered email marketing tool. Then run this loop: 1) Score it against every criterion. 2) For any criterion below the minimum, write a specific diagnosis — what exactly is weak and why. 3) Rewrite the headline addressing every diagnosis. 4) Re-score the new version. 5) If any criterion still fails, loop back to step 2. Keep going until every criterion passes. Show me each iteration so I can see the improvement.

What Claude Code does: Claude runs the generate → evaluate → diagnose → improve → repeat cycle in a single response. You'll see iteration 1 (scores, failures, diagnosis), iteration 2 (rewrite, new scores), sometimes iteration 3 — typically 2-4 rounds total. Each round gets more targeted because the diagnosis tells Claude exactly what to fix — not "make it better" but "the curiosity gap is weak because the reader already knows what an email tool does. Rewrite to create an information gap."

Try it: Compare iteration 1 to the final iteration. The improvement should be obvious. Check the scores — every criterion should now be at or above the minimum. If you disagree with a score, that's your cue to sharpen the criteria in ~/improve-criteria.md.


Milestone 4: Add Adversarial Pressure

The copy passed your checklist. But would it survive a real reader? Let's add a persona that tries to break it — different product again, same pattern.

Prompt:

Read ~/improve-criteria.md. Generate a landing page headline and subhead for a stock portfolio tracking app. Run the scoring loop until all criteria pass. Then add this final test: adopt the persona of a skeptical, distracted scroller who's seen 100 ads today. Attack the headline — would you actually stop scrolling? Would you click? What would make you keep scrolling? If the persona finds a weakness, diagnose it, rewrite the headline, and re-run both the criteria check AND the adversarial test. Only output "SHIPS" when it passes both.

What Claude Code does: After the criteria loop, Claude applies adversarial pressure — a simulated hostile reader that stress-tests the output. This catches things a checklist misses: the headline scores 9/10 on curiosity gap but uses jargon a real person would skip. The adversarial persona adds a second layer of quality control that mimics how your copy actually performs in the wild.

Try it: Look for the "SHIPS" verdict. Read the adversarial attack — does the critique feel real? If it's too soft, tell Claude: "Your adversarial persona is too nice. Be the most skeptical, distracted, ad-fatigued person on the internet." The harsher the critic, the stronger the output.


Milestone 5: Package as a Reusable /improve Skill

You've run the loop manually 3 times. Now let's save it as a skill you'll never have to set up again.

Prompt:

Prompt
Create a global Claude Code skill at ~/.claude/commands/improve.md. When I invoke /improve, it should: 1) Ask me what I'm writing (headline, email, ad copy, etc.) and what it's for (one sentence). 2) Read my scoring criteria from ~/improve-criteria.md — embed this exact path in the skill so it always knows where to find them. 3) Generate a first draft. 4) Run the recursive scoring loop — score, diagnose failures, rewrite, re-score, repeat until all criteria pass. 5) Run the adversarial test with a skeptical persona relevant to the task (distracted scroller for ads, busy inbox for emails, cynical CMO for positioning). 6) Only output the final version after it passes both the criteria loop and the adversarial test. 7) Show a brief iteration log — how many rounds it took and which criteria improved the most.

What Claude Code does: Claude creates a global skill file at ~/.claude/commands/improve.md. Global commands work in every project — type /improve anywhere and the full recursive loop runs. The skill reads your criteria file fresh every time, so when you update your quality bar, the skill automatically gets stricter. The iteration log lets you see how hard Claude worked — 1 round means your criteria are too easy, 4+ rounds means the system is earning its keep.

Try it: Run ls ~/.claude/commands/improve.md to confirm the file exists. Open a new Claude Code session in any folder and type /improve. You should see it ask what you're writing, then run the full loop. Try it on something you care about — the email you've been drafting, the ad copy sitting in your doc. Compare the final output to what you would've shipped from a single prompt.


What You Built

Remember shipping that first draft and regretting it? You just built a system that won't let that happen. Every piece of copy now goes through your personal quality checklist, gets diagnosed and rewritten until it passes, then survives a hostile reader — all before you see it.

  • ~/improve-criteria.md — Your personal quality bar, written as scorable criteria
  • ~/.claude/commands/improve.md — A reusable skill that runs the full recursive loop on any task

The pattern works for anything: emails, ad copy, positioning statements, SEO content, video hooks. Change the criteria, change the adversarial persona, same loop.


Take It Further

  • Build task-specific criteria files~/criteria/email.md, ~/criteria/landing-page.md, ~/criteria/ad-copy.md — and update the skill to pick the right one based on the task
  • Add a competitor check — before the adversarial test, have Claude search for the top 3 competitors and make sure your copy says something they don't
  • Log your wins — add a step that appends the final output + scores to ~/improve-log.md so you can track patterns in what your criteria catch over time

Ready to build your first AI agent?

Live Zoom workshop + 1 month WhatsApp follow-up with Yuval Keshtcher (Hebrew)

Learn about the Workshop