Research Co-Pilot — YOLANI OLIVAS

Making research too easy to skip

Talking to users kept losing out to deadlines, so the whole team was designing on guesswork. I built a chained AI system that closed that gap in minutes instead of days, and I built it in a shape that can grow into something autonomous, not a script that dead-ends the moment it ships.

At a glance

Two systems, not one. A 5-workflow kit anyone on the team could trust from day one, and a true autonomous agent, built in Claude's agent tooling, that runs the same pipeline end to end from a single trigger
Hours of setup became minutes, removing the main reason research got skipped
Built for a ~50-designer team, earned one person at a time. I had no authority to mandate it, so I converted 5 to 6 openly hesitant designers into regular users myself, on their own real work
One agent, three verticals. Credit Cards, Personal Loans, and Tax run on the same system, with vertical context swapped in as configuration instead of the agent being rebuilt per team

Context

Before a team builds or changes a product, the smartest thing it can do is talk to the people who will use it. That is user research: showing your ideas to real people, watching where they struggle, and listening to what they actually need.

Everyone agrees it matters. Almost everyone skips it anyway. Not because designers do not care, but because getting a study off the ground is slow, manual work. You have to write a plan, write the questions that find the right participants, script the interview, run the sessions, and then turn hours of messy notes and transcripts into something a team can act on. When a deadline is breathing down your neck, that overhead is exactly what gets cut.

And when research gets cut, teams guess. Decisions get slower, debates get louder, and products get built on opinion instead of evidence.

My role

Nobody asked for this project. It came from watching talented designers skip research over and over, for the same fixable reason, and deciding that the bottleneck itself was a design problem.

I designed and built the system, ran the research behind it, and led the rollout: training, support, and the push to make it a habit rather than a tool that dies in a folder.

The problem

The bottleneck was never the conversations with users. Those take an hour. The bottleneck was everything wrapped around them: the setup before and the synthesis after.

That insight reframed the whole project. The goal was not "teach designers to do more research" or "hire more researchers." It was to collapse the overhead so much that running a study becomes easier than guessing, in a shape built to keep evolving, not a one-off hack that gets rebuilt from scratch the next time the tooling changes.

Treat the team like users

The first move was to research the researchers. Short interviews with designers across the team asked where studies actually break down: what they avoid, what they procrastinate on, what feels hardest to start, and how they really write their plans and questions today. The output was a map of the top pain points and a clear picture of the gap between how research should work and how it actually works under deadline pressure. That mattered for a simple reason: the system that came next solved problems the team had named, not problems I assumed they had.

The answer was a kit of five AI workflows, one for each stage where studies stall, chained so each stage's output becomes the next stage's input. A plan feeds the screener. The screener and the goal feed the interview script. Transcripts feed synthesis. Insights feed the readout. Think of each one as a smart template: you bring your half-formed idea, it brings the structure, and it hands its output to the next step instead of dead-ending.

The system: a pipeline, not five prompts

Test Plan Generator. Turns a vague "I want to test this" into a structured study plan in minutes.
Screener Generator. Writes the questions that find the right participants, using the plan as context.
Interview Script Creator. Drafts a discussion guide tuned to what you are trying to learn.
Synthesis Engine. Takes hours of messy notes and transcripts and pulls out the patterns and insights.
Readout Formatter. Turns those insights into a clean summary a team can act on.

Each workflow shipped with the same four things: when to use it, what to feed it, the exact prompt to copy, and an example of what good output looks like. No prompt-engineering skills required. Plug and play for a busy designer.

That structure, a defined role, a structured input, a predictable output, wasn't an accident. It's the same shape you need before you can responsibly hand a stage off to an autonomous agent instead of a person clicking "generate." More on that below.

The part most tools skip

Most internal tools die quietly, because building the thing is the easy half. The hard half is adoption, so it was designed with the same care as the workflows.

A hands-on training session, built around the team's own named pain points, with live demos and time to use the workflows on real current work, not toy examples.
Office hours every two weeks. Bring your messy research problem, leave with something usable.
A help channel and a prompt of the week, keeping momentum and surfacing wins.
Tracking time saved and collecting success stories, with a path toward team-approved standard workflows so the system outlives any one person, including me.

Where this is headed from workflows to agents.

I built this as five human-triggered workflows, not one autonomous agent, on purpose. Right now, a person still decides what's worth testing, still clicks "generate" at each stage, and still makes the call on what an insight means. That's deliberate: those are the two places judgment actually matters, and everything else in between was pure overhead worth automating.

But the reason this holds up as more than a clever prompt library is that it was architected for what comes next. Every stage already has a defined role, a structured input, and a predictable output contract, which is exactly the prerequisite for handing that stage to an autonomous agent instead of a human. The natural next step, and the one I'd bring into a conversation about where this goes: an orchestrator that takes a research question, runs the plan, screener, and script stages on its own, checkpoints with a human before anything reaches real participants, then runs synthesis and the readout without anyone touching a prompt. What I shipped is that system with the training wheels on, safe to trust with a whole team on day one, and built so removing the training wheels later is an extension, not a rebuild.

That's the part of this project I'd point to as evidence I'm designing for where AI product work is actually going, agentic systems that chain judgment and automation deliberately, not just where it's comfortable to build today.

The impact

The change this drives is bigger than saved hours. When starting a study takes minutes instead of days, the calculation flips: it becomes easier to check than to guess. More decisions get made with evidence. Fewer debates run on opinion. And the team's designers spend their time on the part that actually requires judgment, the conversations and the decisions, instead of the paperwork around them.

The same system that made that true today is the substrate for a fully agentic version tomorrow, which is the part that compounds.