The Pressure to Automate Customer Support Is Real
If you lead a support team right now, you're fielding pressure from at least four directions at once.
Your CFO wants cost savings. They've seen the vendor claims about 60% cost reduction. Your CEO read about Klarna and wants to know why you're not doing that. Your best agents are quietly updating their CVs because they've read about Klarna too. And the vendors are circling, promising everything will be fine if you just sign the contract.
Here's what makes it harder: the people with the strongest opinions usually have something to sell. The optimists want your budget. The pessimists want your clicks. Neither has to clean up the mess if they're wrong.
One thing nobody tells you: every vendor demo you've ever seen used cherry-picked tickets. The "where is my order" query with a single tracking number. The password reset with no complications. The FAQ question that matches their training data perfectly.
When you're evaluating vendors, ask them this: "Show me the 10% of tickets your AI handles worst. Show me escalation rates from your live deployments, not pilots. Show me what happens when the customer is already angry from a previous failed attempt."
Watch how they respond. The honest ones will have answers. The others will pivot back to the demo.
AI can genuinely transform support operations. It can also fail publicly and expensively. The difference isn't the technology itself. The difference is implementation speed. The companies that failed went too fast. The companies that succeeded treated automation as a destination, not a switch.
This playbook is about the slower, boring path that actually works.
Let's look at why healthy skepticism makes sense here.
The fastest way to sabotage an AI implementation is to lead with "this will replace agents." It sounds obvious. But that's exactly how most AI vendors pitch their products. Headcount reduction. Cost savings. Efficiency. Fewer humans, more machines.
Look, the fastest way to sabotage an AI implementation is to lead with "this will replace agents."
Klarna went too fast
This fails. Here's how (and notice that these are three different failure modes, not one):
Klarna went too fast
You've probably heard about Klarna. In early 2024 their CEO was everywhere, talking about how their AI was doing the work of 700 agents. They froze hiring and let headcount fall to around 3,000. Wall Street loved it.
They're still using AI - it's not that the technology failed. They just optimized for the wrong thing. In my experience, this is the most common failure mode: treating cost reduction as the goal rather than a byproduct of quality automation.
Here's the CEO explaining what went wrong to Bloomberg: "Cost unfortunately seems to have been a too predominant evaluation factor... what you end up having is lower quality."
They're still using AI. It's not that the technology failed. They just optimized for the wrong thing. In my experience, this is the most common failure pattern: treating cost savings as the goal rather than a byproduct of good automation.
McDonald's picked the wrong channel
Some channels are harder than others. Text-based chat is easier than voice. Structured requests are easier than open-ended ones. McDonald's learned this the hard way.
Their AI drive-thru pilot went viral for all the wrong reasons: bacon on ice cream, $222 worth of nuggets nobody ordered, customers fighting with robots that couldn't understand "just a water, please." They shut it down at over 100 locations in mid-2024.
Drive-thru ordering has background noise, accents, interruptions, and ambiguous requests ("make it a meal," but which meal?). The technology wasn't ready for that environment. It might work fine for text-based support, where the input is clean and structured.
Air Canada got sued for what their bot said
Their chatbot told a customer he could claim a bereavement discount up to 90 days after flying. That policy didn't exist. When the customer asked for his money back, Air Canada said no. A tribunal said yes, and ordered the airline to pay.
Here's the part that should worry you: Air Canada tried to argue the chatbot was "a separate legal entity." The tribunal didn't buy it. If your AI says something, you said it. No audit trail, no human review, no defense.
Each company skipped the training phase. They went from "let's try AI" to "AI is handling tickets" without the middle step where humans review AI outputs, calibrate accuracy, and build confidence.
And here's something worth saying directly: "escalate to human" is not a safety net. If 40% of your AI conversations end up escalating, you haven't automated anything. You've added a step. You've made the customer explain themselves twice. You've handed your agents half-finished tickets with missing context.
What's a healthy escalation rate? It depends on your ticket mix, but in my experience, well-implemented Tier 1 automation should see under 15%. Above 25%, something's wrong with your AI's scope, training, or confidence calibration. Above 40%, you're actually making things worse.
When your agents see these stories, they don't think "that won't happen here." They think "I'm next." And given what they're reading, their concern is rational.
If you've been hesitant about AI, you're not being a luddite. You're being prudent. AI works. The real question is whether you can implement it without falling into the cost trap, the technical failure, or the liability gap.
You can. But it takes longer than the vendors suggest.
One Framework That Cuts Through the Confusion
Here's the simplest way to think about support automation:
Up to 80% of tickets can be automated. But 80% is where you arrive, not where you start.
{.text-xl .font-bold .text-center .my-8}
The research supports the 80% figure. Salesmate's 2025 analysis found 65-70% of routine customer service tasks can be handled by AI. BigSur AI reported that AI can reduce inquiry volumes by up to 70%. These are documented outcomes from real implementations, not projections.
But only 1-2% of support cases are fully automated today. The gap isn't technology. It's how companies implement it.
The Destination (Up to 80%)
Here's the simplest way to think about support automation:
Up to 80% of tickets can be automated. But 80% is where you arrive, not where you start.
Industry research suggests 65-70% of routine customer service tasks can be handled by AI, with some implementations reducing inquiry volumes by up to 70%. But these are outcomes from mature implementations, not starting points.
Most support operations have barely scratched the surface. The technology exists. The implementation patience usually doesn't.
The Destination (Up to 80%)
Here's the thing: this is what most companies skip. They treat the destination as a switch to flip.
The journey looks like this: AI handles volume while your team reviews outcomes. Not some outcomes, all of them initially. Your agents become trainers and calibrators, providing feedback that improves the system over time.
This takes weeks to months. That's how long it takes to build a system your team trusts enough to stop reviewing.
(Side note: "weeks to months" drives executives crazy. They want a number. Resist the urge to give one. Every team's timeline depends on their ticket complexity, review capacity, and how quickly their AI learns. Promising "6 weeks" and hitting 10 destroys trust faster than saying "we'll know more after the first month.")
You need to audit continuously. Quarterly at minimum. The 80% is a moving target.
The Journey
Some tickets are 60% automatable: the AI can gather information and draft a response, but a human needs to approve it. Some tickets start automatable but shift mid-conversation when the customer reveals complexity. "Where's my order?" is automatable until the customer adds "...and I need it for my daughter's birthday tomorrow and if it's late I'm disputing the charge."
The journey looks like this: AI handles volume while your team reviews outcomes. Not some outcomes - all of them initially. Your agents become trainers and calibrators, providing feedback that improves the system over time.
This takes weeks to months. That timeline isn't inefficiency. It's how you build a system your team trusts enough to stop reviewing.
The Grey Zone
The 80/20 framing suggests a clean split. Reality is messier.
Some tickets are 60% automatable - the AI can gather information and draft a response, but a human needs to approve it. Some tickets start automatable but shift mid-conversation when the customer reveals complexity. "Where's my order?" is automatable until the customer adds "...and I need it for my daughter's birthday tomorrow and if it's late I'm disputing the charge."
Here's the implementation sequence. Timelines assume a team of 10-20 agents handling 2,000+ tickets monthly. Scale proportionally: smaller teams move faster, larger teams need more coordination time.
The Constant (20% Human-Only)
Some tickets should never be automated: escalations where emotions are running high, VIP accounts where relationships drive revenue, edge cases that don't fit patterns, anything involving legal liability or compliance risk.
This 20% is where your best agents shine. Protecting it means: clear routing rules that keep these tickets away from AI, dedicated specialists who handle only complex work, and recognition systems that reward successful human interventions.
How to Automate Customer Support: A Week-by-Week Roadmap
Here's the implementation sequence. Timelines assume a team of 10-20 agents handling 2,000+ tickets monthly. Scale proportionally - smaller teams move faster, larger teams need more coordination time.
Phase 1: Audit (Week 1-2)
Before automating anything, understand what you're working with.
Pull your last 1,000 tickets. Categorise each by automation potential using these criteria:
Automatable now:
- Single intent (customer wants one thing)
- Predictable resolution (same answer works 90%+ of the time)
- No judgment required (no exceptions, no "it depends")
- Low emotional charge (informational, not frustrated)
Automatable with training:
- Patterns exist but have variations
- Resolution requires selecting from 2-3 options based on context
- May need data lookup (order status, account details)
Human-assisted:
- Multiple possible resolutions requiring judgment
- Customer history or relationship context matters
- AI can draft, but human should review before sending
Most teams find 60-70% fall into the first two categories. If you're seeing less than 50%, your ticket mix may not be ready for significant automation, or your categorization is too conservative. (Or, and I should mention this, you might just have a genuinely complex product. Some businesses have mostly edge cases. That's not a failure; it just means your automation ceiling is lower.)
Most teams find 60-70% fall into the first two categories. If you're seeing less than 50%, your ticket mix may not be ready for significant automation, or your categorization is too conservative.
While auditing, map agent strengths to human-only categories. Who de-escalates best? Who handles VIPs? Who knows the product deeply enough for edge cases? This becomes their future focus.
Start AI on your Tier 1 tickets, the "automatable now" category from your audit.
This is the phase most companies rush through or skip entirely. Don't.
Start AI on your Tier 1 tickets - the "automatable now" category from your audit.
Critical: humans review 100% of AI responses before sending. Not after. Before.
Yes, this is slower than full automation. That's the point. You're calibrating.
What you're calibrating:
Confidence thresholds. Vendors give you confidence scores, but they're based on their training data, not yours. A "95% confidence" response might be wrong 30% of the time for your specific edge cases. Track confidence vs. actual accuracy for your tickets. After 500+ reviews, you'll know what confidence threshold actually means "safe to send without review."
Category accuracy. Some ticket types will perform better than others. "Where's my order" might hit 95% accuracy. "Why was I charged twice" might hit 70%. You need this data to know where to expand.
Failure patterns. When the AI fails, how does it fail? Wrong answer? Right answer with wrong tone? Hallucinated policy? Each failure type has a different fix.
What to log (your audit trail):
- Ticket ID and timestamp
- AI-generated response (full text)
- Confidence score
- Human reviewer's action (approved / edited / rejected)
- If edited: what changed and why
- If rejected: what the human sent instead
This log protects you legally (see: Air Canada) and operationally. When something goes wrong, you need to trace exactly what happened.
What will happen that nobody warns you about:
Agents will test to break. They'll send the weirdest tickets they can find just to watch the AI fail. This looks like sabotage. It's not. It's healthy. These agents become your best trainers. They're finding gaps before customers do. Encourage it.
Agents will test to break. They'll send the weirdest tickets they can find just to watch the AI fail. This looks like sabotage. It's not. It's healthy. I've seen this happen on every implementation - these agents become your best trainers because they're finding gaps before customers do. Encourage it.
Training on top performers backfires. Your best agents break rules with good judgment. They offer refunds outside policy because they sense a churn risk. If you train AI on their responses, it learns the rule-breaking without the judgment. You get an AI that gives refunds to everyone. Mix training data: 60% solid-average performers, 40% top performers.
Actually, let me add a caveat to that ratio - it depends on how much your top performers deviate from standard process. If they're mostly following the script with occasional judgment calls, you can weight them higher. If they're creative improvisers, weight them lower.
Metrics to track:
- Accuracy rate (responses needing no edit)
- Edit rate (responses needing minor changes)
- Rejection rate (responses replaced entirely)
- Agent satisfaction with the AI (survey weekly)
Agent satisfaction matters as much as accuracy. If agents hate the AI, they'll work around it. If they trust it, they'll help improve it.
Reduce review rate gradually as accuracy improves: 100% → 75% → 50% → 25% → spot-check. Let accuracy data drive the pace, not a calendar.
Phase 3: Progressive Automation (Month 2+)
Expand to Tier 2 tickets. Same pattern: high review rates initially, decreasing as confidence builds.
This is also when you introduce outcome-based QA. Start measuring whether problems actually get solved, not just whether responses follow format.
Celebrate human wins publicly. When an agent saves a churning account or navigates a complex escalation, make sure the team knows. The signal you're sending: human judgment is more valuable than ever.
AI will make a serious mistake during this phase. I've seen it happen with every implementation: a wrong refund amount, a policy hallucination, an accidentally offensive phrasing. It's not a question of if, but when. When it happens:
AI will make a serious mistake during this phase. A wrong refund amount. A policy hallucination. An accidentally offensive phrasing. When it happens:
- Pause automation on that ticket type immediately
- Pull all similar tickets from the last 24-48 hours
- Have humans review for damage
- Contact affected customers proactively if needed
- Retrain on the failure case
- Resume with 100% review until accuracy recovers
Having this playbook ready prevents panic-driven decisions like "shut down all AI forever."
Phase 4: The 80% Destination (Month 3+)
Full automation of repetitive tickets. Human team focused on high-value work. Continuous improvement driven by QA insights across 100% of conversations.
This is the destination. Reaching it takes time. But you'll have something companies who rushed don't have: a system that works and a team that believes in it.
What This Means for Your Team
Here's the conversation most support leaders avoid having with their teams:
AI is going to change your agents' jobs. Not eliminate them - change them.
Stop trying to reassure them it won't. Their concern is reasonable. They've seen the headlines. They watched Klarna. When you say "AI won't affect your job," it sounds like what managers say right before layoffs. It destroys trust.
"A year from now, you won't be processing password resets and tracking updates. That work will be automated. Instead, you'll be doing work that actually needs you: complex problem-solving, de-escalation, relationships, training the AI to be better. Some of you will move into new roles that don't exist yet. The transition won't be instant, and we'll figure it out together."
For many agents, this is good news. The repetitive tickets (the copy-paste responses, the robotic scripts) aren't why people got into customer service. Removing that work means more time for work that's actually satisfying.
Actually, let me be more honest: for some agents, this is good news. Others genuinely prefer the predictability of routine work. Both reactions are valid. But either way, it requires new skills. And not everyone will want to make the shift.
But it requires new skills. And not everyone will want to make the shift.
AI Trainers: Agents who review AI outputs, provide feedback, and improve accuracy. Deep product knowledge becomes more valuable than ever. This is typically a lateral move in title but can lead to senior/specialist roles. In our experience, it's the agents who notice details (not the loudest or most confident) who excel here. The quiet observers catch what others miss.
Escalation Specialists: Experts in de-escalation, complex problem-solving, and retention. The human-only 20% needs dedicated specialists who do only this work. This is often a promotion path: higher stakes, higher skill requirements, higher compensation.
QA Analysts: People who interpret patterns from 100% conversation review. They turn data into insights that improve the whole operation. This is a different skill set: more analytical, less conversational. Not every strong agent will want this path.
QA Analysts: People who interpret patterns from 100% conversation review. They turn data into insights that improve the whole operation. This is a different skill set - more analytical, less conversational. Not every strong agent will want this path.
Being honest about who won't transition:
Some agents are excellent at repetitive, high-volume work. They're fast, accurate, and consistent. They don't want complex emotional conversations. They don't want to analyze data.
These agents have fewer options in an AI-augmented team. The honest conversation is: "The work you're best at is the work most likely to be automated. Here's what the new roles look like. Do any of these interest you? If not, let's talk about what makes sense."
This is hard. But it's more respectful than pretending the change isn't coming.
How to identify future AI trainers:
Look for agents who:
- Catch edge cases others miss
- Ask "why does the system work this way?" not just "what do I do?"
- Are frustrated by inconsistency ("Why did we handle this differently last week?")
- Document their workarounds
- Are curious about how things work, not just how to use them
These are often not your top performers by volume metrics. They might be slower because they're thinking more carefully. That careful thinking is exactly what AI training needs.
Where to Go From Here
If you've read this far, you're serious about getting AI right.
Three options:
Read the Support QA Playbook
The full rubric template with the 60/40 weighting, scoring guidelines for outcome-based metrics, and the implementation checklist we use with pilot companies.
Join the Pilot Programme
We're implementing this framework with a small group of companies using Hay's platform. Pilot members get 50% lifetime pricing, direct access to our founding team, and influence over the product roadmap.
We're looking for: support teams of 5+ agents, 2,000+ monthly tickets, existing helpdesk (Zendesk, Freshdesk, Intercom, or similar), and a leader who's willing to move deliberately rather than chase quick wins.
Book a Workflow Audit
If you're not sure where you fall on the automation-readiness spectrum, we'll review your current operation: ticket mix, channel complexity, team structure, and risk profile. Then we'll map out what a realistic path to 80% looks like for your specific situation.
30 minutes. We'll tell you honestly whether Hay is a fit or not.