HCI Case Study — 2025

Designing trust into
enterprise AI.

TechnoWizards handled millions of customer conversations across healthcare, banking, and logistics. The AI was capable. The system around it wasn't.

Product Designer 8 Months Figma B2B Enterprise

Read Case Study See Process

scroll

01 — Overview

What is this project?

TechnoWizards + Wiz AI

TechnoWizards is a multi-channel customer engagement platform serving enterprise teams in healthcare, banking, government, logistics, and retail — industries where a wrong AI answer isn't just annoying. It's a lawsuit.

They had a powerful chatbot. They had happy customers on simple queries. But complex cases — refunds, billing disputes, clinical authorizations — were silently breaking the company.

My Role

I led product design for the Helpdesk experience — the internal workspace where human agents and AI collaborate to resolve complex support cases. This was not a redesign. This was a 0-to-1 product built inside an 8-month window.

→Connected Workspace (the case view)
→Ask Copilot (AI assistance for agents)
→AI Delegation + Approval flows
→Pre-deployment testing (Learn loop)

RoleProduct Designer (Helpdesk) Co-designer1 Senior Designer (Wiz AI Agent) PM1 Product Manager Eng1 Engineer Duration8 months ToolsFigma

The Problem in One Sentence

The bot was smart. The system around it was broken.

TL;DR

Before

Bot deflects easy questions
Complex case escalates to Zendesk with zero context
Agent re-asks customer everything
Subjective policy decision
No audit trail
Finance writing blank checks

After

Every channel feeds one unified case
AI reads full history, retrieves exact policy
Drafts response with citations
Agent verifies and approves in 90 seconds
Full audit trail
System learns every week

THE OUTCOME, BEFORE YOU READ THE STORY.

2–5 min Resolution time

Was 15–20 min.

95%+ Policy decision accuracy

Was 65–85%.

60% Automation trust rate

Was 30–40%.

02 — The Problem

The bot that forgets everything.

You just bought a 400-pound solid oak dining table. It arrives with one of the carved legs split down the middle. You open WhatsApp and type: "My table arrived broken. I need a refund."

You

My table arrived broken. I need a refund.

Bot

Can I get your order number?

You

Order #4521

Bot

Can you send a photo of the damage?

You

📷 Photo uploaded ✓

Bot ⚠

I understand you are having an issue with your account. Let me transfer you to an agent.

— 20 minutes later —

Agent

Hi there! I see you're having an account issue. Can I get your order number?

Context Lost

This isn't a bot problem. It's a system problem.

Six failures hiding in plain sight.

Fragmented view of customer conversations

Conversations span WhatsApp, Email, Helpdesk, and Phone. Each handoff loses context. Agents re-triage from scratch. The customer feels like they're talking to an organization with severe amnesia.

Limited visibility into where automation fails

Technical

Teams see drop-off numbers but can't diagnose why. Was it the AI response? The link? The rephrase loop? There's no step-level failure analysis — just a black box.

Complex issues still flood human teams

Process

The bot filtered out easy work and concentrated complexity on the people least equipped to handle it efficiently. Multi-step cases all escalate to senior staff.

Risk and policy management rely on memory

Strategic

A refund for damaged goods. Three agents. Three answers. Same scenario. Zero consistency. No audit trail. Free text is the graveyard of data analysis.

High configuration and maintenance overhead

Process

Supporting diverse industries requires extensive custom flows. A non-technical support ops manager shouldn't need an engineer every time a policy changes.

Limited support for proactive intervention

Strategic

Teams react to failures after they surface. There's no tooling to detect emerging patterns or failure modes early.

How I understood the reality.

~150 tickets

Ticket Archaeology

Reviewed anonymized tickets and escalation logs. Traced how conversations flowed: chatbot to helpdesk to resolution (or not). Mapped where context was gained, lost, distorted.

60+ interactions

Public Interaction Analysis

Analyzed public support interactions from SaaS and services companies. Focus: cases that started in chatbots and ended with human agents. Looking for the exact moment the system failed.

6 interviews

Semi-structured Interviews

Interviews with support leads and senior agents. Questions focused on: how they use bots, helpdesks, and how they interpret policy under pressure. Key finding: all 6 felt the tools helped them log — not think or decide.

4 platforms

Competitor Analysis

Deep study of Intercom, Zendesk, Help Scout, and emerging agentic platforms. Focus: where AI ends and humans begin, whether AI is a surface feature or embedded in the core workflow.

Three numbers that changed the design.

35–40%

of complex cases

Agents re-asked questions the bot had already collected. Why? The transcript was a wall of text. Faster to ask again than read 40 lines. The interface actively encouraged terrible customer service.

25–30%

of refund/eligibility tickets

Showed clear policy ambiguity. Bot said one thing. Human did another. Decision documented only as "customer was upset, issued refund." Free text is the graveyard of data analysis.

6/6

support leaders said

"The hard, messy cases still land on my senior team, and the tools don't help us think or decide — just log and respond." Every single one.

Strategic Insight

The true ceiling wasn't technical. It was trust.

Why weren't teams automating more than 30–40% of tickets if LLMs can technically answer 80% of questions?

It's not stubbornness. It's not job preservation. It's risk calculus. Support teams capped automation at 40% because they mathematically could not trust legacy bots to handle high-stakes decisions without guardrails.

The design problem wasn't "make AI smarter." It was "make AI trustworthy enough to be given more."

Bot mishandles a password reset

Customer annoyed.

Bot mishandles a clinical authorization

Company sued.

03 — The Reframe

Why upgrading the bot wasn't the answer.

The temptation was to bolt a smarter AI onto the existing infrastructure.
But the infrastructure was the problem.

"Put a Ferrari engine in a golf cart. The moment you hit the gas, the frame tears itself apart."

Engineering analogy — internal design review

The trap

Smarter AI on a broken frame

An LLM generates conversational output at lightning speed. But if the surrounding infrastructure — data pipelines, policy guardrails, routing logic — is built for rigid rules-based interactions, the LLM will confidently generate policy violations at scale. It will hallucinate a refund policy and promise a customer $1,000 in ten seconds.

The real question

Optimizing conversations vs. decisions

Every prior helpdesk was designed to help agents close tickets faster. But speed isn't the constraint. Trust is. The real design problem was: how do you build an interface that makes high-stakes AI decisions safe enough to actually act on?

The shift

From conversation to decision infrastructure

We stopped designing for chat throughput and started designing for decision confidence. That changed everything — context architecture, policy retrieval, approval flows, audit trails. The interface had to earn trust before it could earn autonomy.

Two fundamentally different product philosophies.

Before

Optimizing Conversations

Keep dialogue smooth
Bot manages in isolation
Fast replies
No audit trail
Improves engagement

After

Optimizing Decisions

Enable correct outcomes
Shared system of record
Grounded in policy
Full audit trail
Improves trust

The new system architecture.

Customer Channels

WhatsApp · Email · SMS · Web · Phone

↓

Conversation Ingestion & Normalization

Channel adapters · Message normalization · Identity enrichment

↓

Intelligence, Routing & Risk Layer

Intent detection · Entity extraction · Risk classification

↓

Orchestration & Decision Layer

↓

Decision Point: AI or Human?

↓

AI Agent — Automated

RAG · Tool execution · Response generation

Human Helpdesk — Copilot

Connected Workspace · AI suggests, human approves

↓ both paths converge ↓

Resolution & Customer Follow-up

↓

Knowledge & Policy Hub

↓

Action Execution + Audit Log

AI-executed · Human-executed · Immutable trail

↓

Continuous Learning ↻

Feedback loop improves both paths

Three design mandates.

Unified Workflows

Bot and human conversations must exist in one system of record. No more baton thrown into the crowd.

Consistent Resolution

Policies, context, and decision-making must be applied uniformly — not dependent on which agent picks up the ticket.

Scalable Collaboration

As case volume grows, the system must help teams handle complexity with confidence — not just speed.

04 — The Design

System thinking before screen thinking.

Before any wireframe, my co-designer and I spent weeks mapping the existing support system. Not the ideal flow. The real one. How conversations entered. Where context broke. How decisions were made across teams and tools.

During beta, there was zero user telemetry data. No click tracking. No heatmaps. So we did manual archaeology of 200+ raw chat transcripts — reading timestamps to see context loss happening in real time. We didn't guess the information architecture. We reverse-engineered it from the failures of the past.

Ground the Problem

Audit tickets. Map failure modes.

Align the System

Map information flows and trust gaps.

Define Structure

Information hierarchy. Decision scaffolding.

Test Decisions

Validate with edge cases and real agents.

Validate Trust

Measure confidence. Refine and ship.

Research

Architecture

Design

Validation

Iteration

Pillar 01

Connect

Connected Workspace

The problem it solved

Every escalated case arrived in Zendesk as a wall of plain text. The first thing every agent typed was: "Can I get your order number?"

The design decision

We deliberately violated the industry norm. Intercom centers the chat thread. We rejected that hierarchy — centering case metadata, customer history, and AI decision tools instead. The interface literally enforces the behavior shift: agents read the customer's lifetime value before they read the complaint.

All channels converge into one unified case — no context loss
Customer context panel: tier, lifetime value, past cases
Links to back-office tickets (Jira, Zendesk, Salesforce)

Trade-off → We limited initial integrations to WhatsApp, Email, and Messenger. 80% of volume. Owned those three flawlessly before anything else.

Pillar 02

Assist

Ask Copilot

The problem it solved

Policy documents lived on a corporate intranet nobody read. Decisions were made from memory. When memory failed, agents guessed.

The design decision

Ask Copilot uses RAG to search the company's proprietary SOP database, retrieve the exact governing paragraph, and force the LLM to reason exclusively on that retrieved policy — not general knowledge.

Drafted reply ready to "Add to Chat"
Source citations — "Company refund policy, Article 4.2"
Confidence level shown explicitly

Trade-off → Sources last updated >90 days are flagged. Low-confidence responses show a "Verify before sending" warning — never presented as ready-to-send.

Pillar 03

Delegate

AI Delegation + Approval

The problem it solved

Teams couldn't trust AI with high-stakes decisions. The trust ceiling sat at 40% automation — not because AI wasn't capable, but because there was no human checkpoint for the cases that mattered.

The design decisionControlled autonomy. Not full automation. Not human-only. A precise boundary: AI handles what it can do safely alone. Anything with policy risk, financial impact, or ambiguity goes through an Approval Required checkpoint.

AI handles automatically

Classify intent
Tag the case (priority, tier)
Route to the right queue
Collect initial information
Generate AI Automation Summary

Approval Required checkpoint

Approval Required High Risk

Route to Engineering + Open $500 bounty

Policy 4.2 · Matched 3 similar cases

83% confidence

The philosophy

This is aviation autopilot design applied to enterprise software. The system flies the plane efficiently, but visibly alerts the pilot when it encounters conditions it isn't programmed to navigate.

What if a human approves something incorrectly?

Approval creates an auditable trail: who approved, when, against which policy version. Anomalies are flagged in analytics.

What if AI is wrong about risk classification?

Agents can manually escalate any case. Risk thresholds are configurable per organization.

AI Delegation — Approval Required checkpoint

Pillar 04

Learn

The 8-Phase AI Response Pipeline

A quality gate before AI ever touches a live customer. Before deployment, supervisors evaluate AI answers. If poor, the system identifies which content source caused the failure, generates recommendations, and the supervisor re-tests.

Phase 01

Case Understanding

Parse semantic intent via vector embeddings
Retrieve full omnichannel history
Identify customer tier and lifetime value

Phase 02

Risk Assessment

Classify risk: Low / Medium / High
Flag financial or security thresholds
Determine: autonomous or delegate?

Phase 03

Policy Grounding (RAG)

Search company SOP database
Retrieve exact governing paragraph
Inject into LLM prompt — forces policy-only reasoning

Phase 04

Recommendation Generation

Low risk: draft auto-response
High risk: structured Approval Required payload

Phase 05

Human Control Gate

Agent reviews AI suggestion
Approve / Edit / Reject

Phase 06

Action Execution

AI-executed or human-executed
Log all actions to immutable audit trail

Phase 07

Feedback Collection

Agent rates AI recommendation quality
Optional free-text note

Phase 08

Continuous Learning

Analyze feedback patterns
Identify automation opportunities
Flag policy knowledge gaps

Design Principles

How I thought about designing for AI.

AI should show its work

An AI that gives an answer is a black box. An AI that gives an answer with sources is a tool. The difference between a black box and a tool is the citation. We made citations non-negotiable.

Confidence is a UI element

We designed confidence scores as first-class UI. Not buried in a tooltip. Visible on the action card. An agent seeing "83% confidence" makes a fundamentally different decision than an agent seeing "Answer generated."

Every AI action needs an escape hatch

Every AI-generated state has a human override path. No dead ends. No locked decisions. The AI proposes. The human disposes. Always.

Failure is more important than success to design for

We spent more time designing the failure states than the success states. What does a low-confidence response look like? What happens when the policy source is stale? Designing for failure first means success states feel obvious by comparison.

The feedback loop is the product

Every interaction is training data. So we designed feedback to be frictionless: one click for good/poor, optional text field, shown immediately after resolution — not in a separate review flow.

Four challenges. Four solutions.

Challenge 01

No Clear Starting Point in a New Domain

ProblemNo user requirements. No design specs. No established workflows. A domain new to me and loosely defined. SolutionCombined rapid LLM-based market discovery with hands-on analysis of real customer tickets to build domain understanding fast. ImpactCreated a shared foundation. Accelerated progress in a previously unfamiliar domain.

Challenge 02

Fragmented User Journeys Across Channels

ProblemCustomer context was scattered. Individual cases evolved across channels in ways nearly impossible to follow without manual tracing. SolutionTraced individual cases end-to-end. Built the unified case model directly from handoff failure points. ImpactReduced repetition and confusion for both agents and customers.

Challenge 03

Too Many Insights, Unclear Scope

ProblemResearch surfaced dozens of potential features. No clear signal on what was MVP vs. future roadmap. SolutionStructured insights into themes. Used a prioritization matrix with PM and engineering present. ImpactImproved cross-functional alignment. Converged on a realistic, phased release plan.

Challenge 04

No Established UI Patterns for AI Workflows

ProblemAI-driven features introduced interaction patterns that didn't exist in any design system. What does "AI proposing an action" look like? SolutionDefined reusable components and an AI action language — visual vocabulary for: suggesting, waiting, delegating, logging, learning. ImpactDevelopment cycles became more efficient. Fewer inconsistencies between design intent and implementation.

05 — The Outcome

The same broken table. A different system.

Sarah messages WhatsApp: "Table arrived broken. Refund?" — same message, new system.

Sarah

Table arrived broken. Refund?

Wiz AI

Hi Sarah! I can see this is a damaged goods case. I've retrieved your order details and our refund policy. Pulling this to an agent now — they'll have everything they need.

Sarah

Thank you, that was fast.

Agent (90 sec later)

Hi Sarah — I've approved a full refund per our damaged goods policy, plus a 15% voucher as a Premium customer. You'll see the refund in 3–5 business days. Sorry about the table.

Agent View — Case #4521

CustomerSarah Chen

TierPremium

Lifetime Value$4,200

IssueDamaged goods — table leg

AI Automation Summary

Classified as damaged goods, medium risk, policy retrieved. Customer history reviewed. Premium tier confirmed.

Approval Required Med Risk

Issue full refund per policy 4.2 + 15% voucher for Premium customers

Company Refund Policy, Article 4.2 · Freight Damage

91% confidence

Total agent time: 90 seconds ✓

Before → After

15–20 min

Resolution time

→

2–5 min

AI gathered context before agent opened the case

65–85%

Policy accuracy

→

95%+

Grounded in cited policy, verified by a human

30–40%

Automation trust

→

60%

Teams could see exactly when and why AI asked for help

Human impact.

Frontline Agents

Elevated to decision approvers

No longer data gatherers frantically copy-pasting between five browser tabs. Reduced cognitive load. Reduced burnout from repetitive low-value work.

Support Leads & Managers

Black box illuminated

Structured, consistent, auditable data on exactly how decisions are being made. If a policy is failing, they can see exactly which phase of the AI pipeline is misinterpreting the rule and fix it globally in minutes.

Audit, Finance & Compliance

Risk profile drops

Every refund, account change, clinical authorization leaves an immutable audit trail: which policy, which AI model version, which human approved it.

TechnoWizards repositioned entirely. No longer viewed as a chatbot vendor that plugs into Zendesk. Became a primary AI-native customer service suite. Moved significantly upmarket — capable of displacing the very legacy systems they used to merely augment.

What I'd do differently.

01
Start with the failure states
We spent too much time designing happy paths early. The most valuable design decisions came from asking "what happens when this goes wrong?" I'd flip the order next time.
02
Involve compliance teams earlier
We brought in the compliance perspective late. Their requirements reshaped the audit trail design significantly. Earlier involvement would have saved two iteration cycles.
03
Build the feedback loop UI in Phase 1
We deferred agent feedback UI as "Phase 2." But without it, we were flying blind on AI quality in the first weeks of deployment. It should have been MVP.
04
Design the analytics dashboard last
We had instincts about what metrics to track. We should have waited until we saw real usage patterns. Vanity metrics crept in early and had to be pruned.

Key learnings.

Learning 01

AI design is about trust, not intelligence

The underlying model was sophisticated. What made the product work was the infrastructure of trust around it — citations, confidence scores, audit trails, human gates.

Learning 02

System thinking before screen thinking

The most impactful design decisions happened before Figma was opened. The system map defined the product. Figma documented it.

Learning 03

Design for decisions, not just interfaces

Every screen exists to help someone make a better decision faster. If a screen isn't helping someone decide something, it's wasting space.

Learning 04

Constraints are the design

No telemetry. No user requirements. Eight months. Each constraint forced a creative solution that led to a better outcome than a comfortable process would have produced.

References & Inspiration

intercom.com

Benchmark platform — Fin AI resolves 87% of routine queries. Studied their helpdesk architecture, Copilot design, and handoff patterns.

Visit →

zendesk.com

Legacy helpdesk system that TechnoWizards clients were using before Wiz AI. Used as competitive baseline for workflow and UI patterns.

Visit →

medium.com

Original case study documentation — the full written record of the design process, research findings, and design decisions.

Read →

Design for decisions.
Not just interfaces.

The next time you interact with a piece of software, look past the interface and ask: is this system optimizing the conversation — or the decision?

View more work → Get in touch

Designing trust intoenterprise AI.

What is this project?

TechnoWizards + Wiz AI

My Role

The Problem in One Sentence

TL;DR

The bot that forgets everything.

Six failures hiding in plain sight.

Fragmented view of customer conversations

Limited visibility into where automation fails

Complex issues still flood human teams

Risk and policy management rely on memory

High configuration and maintenance overhead

Limited support for proactive intervention

How I understood the reality.

Ticket Archaeology

Public Interaction Analysis

Semi-structured Interviews

Competitor Analysis

Three numbers that changed the design.

of complex cases

of refund/eligibility tickets

support leaders said

The true ceiling wasn't technical. It was trust.

Why upgrading the bot wasn't the answer.

Smarter AI on a broken frame

Optimizing conversations vs. decisions

From conversation to decision infrastructure

Two fundamentally different product philosophies.

The new system architecture.

Three design mandates.

Unified Workflows

Consistent Resolution

Scalable Collaboration

System thinking before screen thinking.

Ground the Problem

Align the System

Define Structure

Test Decisions

Validate Trust

Connect

Connected Workspace

Assist

Ask Copilot

Delegate

AI Delegation + Approval

Learn

The 8-Phase AI Response Pipeline

How I thought about designing for AI.

AI should show its work

Confidence is a UI element

Every AI action needs an escape hatch

Failure is more important than success to design for

The feedback loop is the product

Four challenges. Four solutions.

No Clear Starting Point in a New Domain

Fragmented User Journeys Across Channels

Too Many Insights, Unclear Scope

No Established UI Patterns for AI Workflows

The same broken table. A different system.

Before → After

Human impact.

Elevated to decision approvers

Black box illuminated

Risk profile drops

What I'd do differently.

Key learnings.

Design for decisions.Not just interfaces.

Designing trust into
enterprise AI.

Design for decisions.
Not just interfaces.