Skills, not scripts: how to build AI agents for Customer Success

Most companies starting an AI project ask the wrong opening question. They ask: "What model should we use?" or "What's our tech stack?" or "How do we wire this into our backend?"

The teams that ship AI agents that actually work ask a different question first: "How would I onboard a new hire who knew the technology but didn't know our business?"

That mental shift — from building an AI agent to teaching one — is the difference between an agent your team trusts in production and a chatbot they stopped using after the first awkward output.

The technology part is easy. There are five major model providers, well-documented APIs, and dozens of frameworks. The hard part is everything that has nothing to do with code: what does your business call a "customer"? What's the difference between high risk and medium risk? When does the agent escalate to a human? What does a good retention email look like? No model knows these things. You have to teach them.

This post is about how that teaching actually works in 2026 — what's changed in the last 18 months, why "skills" is the new abstraction, and how to think about AI agent building if you're starting one for Customer Success or any other domain.

The shift: from prompt engineering to context engineering

Two years ago, "prompt engineering" was the dominant discipline. The job was to write a clever system prompt that got the model to do what you wanted. Whole books were written about prompt patterns, role assignments, chain-of-thought tricks.

That era is over for serious agent building. Modern frontier models follow instructions well. The bottleneck has moved.

The new discipline is context engineering: deciding what information the agent should have access to, in what form, at what time, in what order. The instructions are easy. The knowledge architecture is the work.

Concretely, the questions look like this:

What does the agent need to know to handle this question?
Should that knowledge live in the system prompt (every conversation), in a retrievable document (when relevant), or in a tool response (when fetched)?
How does the agent know which document to consult for which question?
What examples should it have seen so it knows what good output looks like?
What anti-examples should it have seen so it doesn't repeat common mistakes?

None of this is prompt engineering. It's curriculum design.

Skills as the new abstraction

Anthropic introduced a pattern in 2025 they call Skills. The mechanic is simple: instead of stuffing every piece of domain knowledge into one giant system prompt, you write a folder of markdown files — each one called SKILL.md — that contains condensed wisdom about how to handle a specific kind of task.

The model decides at runtime which skills are relevant to the current request, reads them, and applies them. You don't load all of them every time. You load the ones that match.

The reason this works better than monolithic prompts: it scales the way human knowledge actually scales. Your business knowledge isn't one document. It's a collection of "how do we do X?" documents. New hires don't read everything on day one — they read what they need when they need it. Skills follow the same logic.

Even if you're not using Anthropic's specific Skills feature, the architectural pattern travels. The principle is: domain knowledge belongs in markdown files, not in code or in monolithic prompts. You should be able to update what the agent knows about retention emails without redeploying anything. A non-engineer on your CS team should be able to improve a skill by editing a doc.

What goes into a good skill

The temptation when writing your first skill is to dump everything. The agent needs to know about retention, so write a 4,000-word document on retention. This is the equivalent of handing a new hire a binder on their first day.

Skills work better when they're tight, specific, and structured. The pattern that consistently produces good ones:

Domain definitions. Every term that has a specific meaning in your business needs an explicit definition. "Active customer" means something to your team. The model doesn't know what. Write it down.

Decision rules. "If X, then Y" statements that the agent can apply. The risk score categorization (80+ very high, 60–79 high, etc.) is a decision rule. Without it, the agent invents categories per session.

Worked examples. Show, don't tell. A retention email skill should include 2–3 actual examples of good retention emails, with annotations explaining why they work. The model learns more from one good example than from a paragraph of guidelines.

Anti-examples. "Here's what a bad version looks like, and here's why it's bad." Anti-examples are more valuable than positive examples for catching common failure modes. They tell the agent what to avoid, which is half the job.

Boundaries. Explicit statements of what the agent should not do, when it should escalate, when it should ask for clarification. Without boundaries, agents over-confidently answer questions they shouldn't.

A small, well-structured skill outperforms a long, exhaustive one. The same way a one-page checklist outperforms a 50-page manual for new hires.

Treating AI like a new hire

The most useful mental model for building agents is this: imagine you're hiring someone smart, fast-learning, with no prior knowledge of your business or industry, who works 24/7 and can be cloned infinitely. How would you onboard them?

Most teams already have an answer to this question. They don't realize it applies to AI.

You'd write onboarding documentation. You'd give them shadowing sessions (examples). You'd review their first outputs and give feedback. You'd start them on simple tasks and graduate them to harder ones. You'd specialize them based on what they're best at. You'd let them know what they can decide alone and when to escalate.

That's the entire playbook for building AI agents in 2026. The technology has reached the point where the agent can learn from documentation, examples, and feedback. The bottleneck is whether you've actually written the documentation, curated the examples, and set up the feedback loop.

Three implications:

Documentation is the deliverable. A team building an agent should be writing as much as they're coding. The skills, the domain definitions, the example libraries — these are the artifacts that determine whether the agent is good. Code is the substrate. Documentation is the product.

Specialization beats generalization. A new hire who tries to do everything is mediocre at everything. A new hire focused on a specific role gets good fast. Same with agents. Don't build one agent that does ten things. Build ten skills the agent loads contextually.

Feedback loops matter more than initial setup. Your first version of any skill will be wrong in places. The agent will misunderstand things. The right response isn't "back to the drawing board" — it's "edit the skill." Skills evolve with usage, the same way employee playbooks evolve with experience.

Examples beat instructions

Pick any skill you're writing. Cut every "you should be helpful" or "make sure to be polite" sentence. Replace them with a worked example.

This is the single highest-leverage edit you can make to any AI agent system. Models pattern-match faster than they reason from rules. Three concrete examples of how to handle a churn risk customer with a usage drop teach the agent more than three paragraphs of instructions on the same topic.

The format that works:

Setup: Here's a customer with this risk profile, this MRR, these recent signals.
The right response: Here's the email we'd send. Note the specific reference to the usage drop, the offer of a 15-minute training, the absence of a discount.
Why this works: We don't open with a discount because the problem isn't price — it's adoption. We mention the specific feature because it shows we're paying attention.
A wrong version: Here's a generic "checking in" email. Why this fails: nothing in it is specific to this customer.

Repeat for the 5 to 10 most common patterns in your domain. The agent now has a vocabulary of good outputs and bad outputs — which is what it needs to generate good ones.

Tools are part of the curriculum

Often overlooked: the tools you give the agent define the work it can do. A new hire's job description is partly defined by what's on their laptop. Same here.

If your agent has a tool that lets it apply Stripe discounts but no tool that lets it pause subscriptions, the agent will reach for discounts even when a pause would be better. Not because it's lazy — because the pause isn't part of its world.

This means tool design is agent design. Decisions like:

Which actions does the agent take autonomously vs. propose for approval?
What information does each tool return — raw data, formatted output, or a summary?
How does each tool's description teach the agent when to use it?

...are choices about what the agent is. A team building an agent should spend at least as much time on tool descriptions as on the system prompt. The descriptions are the agent's job description.

Iteration: skills evolve, code stays stable

The biggest mental model shift for engineering teams: when the agent gets something wrong, you usually edit a markdown file, not the codebase.

An agent confused about how to handle a specific customer scenario? Add an example to the relevant skill. An agent over-using discount offers? Add a section to the skill explaining when discounts aren't the right answer. An agent escalating things that should be handled autonomously? Tighten the boundary section.

This is uncomfortable for engineering teams used to the loop "find bug, fix code, deploy." With agents, the loop is "find behavior, edit skill, observe." No deploy. Often no engineer involved at all — a domain expert can edit the skill themselves.

The teams that ship reliable agents treat skill iteration the same way they treat documentation iteration. Continuous, ownership distributed across the team that uses the agent, version-controlled but lightweight.

What this looks like for Customer Success

Concretely, when we built Revenue Plumber as an AI Agent for Customer Success, the work split looked something like this:

~30% engineering — tools, integrations (Stripe, HubSpot, Gmail), the action approval queue, the dashboard.
~40% domain knowledge — skills for churn risk interpretation, retention email playbooks, intervention selection, customer segmentation, common escalation patterns.
~20% example curation — building libraries of good and bad retention outreach, good and bad responses to common scenarios, anti-patterns to avoid.
~10% iteration loops — tracking where the agent gets it wrong, adding to the skills, refining boundaries.

If we'd treated this as a software project (mostly engineering, with a "prompt" dropped in at the end), we'd have shipped a chatbot. The 70% of work that wasn't code is what made the agent useful.

This is also why "we'll build our own internal AI agent for CS" is harder than most engineering teams expect. The engineering is the easy part. The skills, the examples, the iteration discipline — that's where the months go. Most internal projects underestimate this and ship something that works in a demo and fails in production.

If you're starting one

A practical sequence for teams building AI agents in 2026:

Step 1. Write the onboarding doc you'd give a new hire who needs to do this job. Plain markdown. No engineering yet. If you can't write it, you don't understand the job well enough to automate it.

Step 2. Break that doc into 5–10 skills, each focused on one specific task. Resist the urge to write one big skill.

Step 3. For each skill, add 2–3 worked examples and at least one anti-example. This is the unglamorous work that makes the agent useful.

Step 4. Decide what tools the agent has access to and what requires human approval. Write the tool descriptions like you're writing a job description.

Step 5. Wire it together. This is where engineering happens. Surprisingly little of the total effort goes here.

Step 6. Run the agent on real cases. When it gets something wrong, edit the relevant skill. Track the failure modes. Build the iteration loop.

Step 6 is the actual job. Steps 1–5 are setup. Most teams spend 90% of their effort on setup and 10% on iteration. The reverse split is what produces agents that get better over time instead of plateauing at "demoable but not trusted."

Bottom line

The mental model that produces good AI agents in 2026 isn't a software engineering one. It's a teaching one. You're onboarding a smart new hire who needs to learn your business. The skills you write are their training materials. The examples you curate are their shadowing sessions. The iteration loop is their feedback cycle.

The technology has reached the point where this works — if you do the teaching part well. The teams that ship reliable AI agents are the teams that take the teaching seriously. The teams that treat it as a coding problem ship demos and abandon them six months later.

Your domain expertise is the moat. Code is everywhere. Documented, structured, example-rich knowledge of how your specific business actually works — that's the rare thing. Agents that have access to it are useful. Agents that don't are chatbots.

See an AI agent that's been taught the CS job

Revenue Plumber is the AI agent for Customer Success built on this approach. 18 months of skills, examples, and iteration — so your team doesn't have to do that part. Plug in Stripe, watch it work the long tail your CSMs don't have time for.

Book Demo Call Start for Free