The AI Operator's Field Guide
Build Your Own Token Dashboard From Scratch
Most people measure AI adoption by vibes. "The team's really using it now." "We've gone all-in on Claude." Ask for a number and the room goes quiet.
A token dashboard is the cure. It turns a vague feeling into a line you can read β where the work is actually happening, where it stalled, and who on your team is quietly becoming the person you'll want leading your next rollout.
Here's how to build one yourself in an afternoon, plus the five things to watch once it's live. The exact prompt and the full build video are linked at the end.
1. Start from a kit, not a blank page
The blank page is where most builds die. You open an editor, stare at an empty file, and the project quietly slides to "next week."
Don't start there. Start from a ready-made kit that matches your stack:
- If you live in Codex β begin from a small repo scaffold: a usage-log file, a parser, and a chart component. Let Codex extend it rather than invent it.
- If you live in Claude β describe the dashboard you want and let it generate a single self-contained file you can run immediately, then iterate in the same thread.
- If you live in ChatGPT β start from a one-file HTML or notebook template and grow it section by section.
- If you bounce between all of them β pick whichever tool owns your usage data first, build the skeleton there, and only then port it.
The point isn't the tool. It's that you're editing something that already runs instead of summoning something from nothing. A kit that does 20% of the job removes 80% of the resistance.
2. Assistant work vs. computer work β and why being stuck has nothing to do with the model
This is the distinction that separates people who get value from AI and people who keep saying "it's not that good yet."
Assistant work is conversation. You ask, it answers, you copy the answer somewhere. Drafting, brainstorming, explaining. The output is text you then act on.
Computer work is execution. The AI reads your files, runs the code, builds the thing, checks its own output, and hands you something that already works. The output is the result itself.
A token dashboard is computer work. If you try to build it through assistant work β pasting snippets back and forth, fixing errors by hand β you'll conclude the model is weak. It isn't. You're just on the wrong side of the line.
The fix is almost never a smarter model. It's moving the task to a tool that can touch your environment: an agentic coding setup, a code-execution mode, a build canvas. When people say "the AI couldn't do it," nine times out of ten they were doing computer work in an assistant's chat box.
Rule of thumb: if the task ends in a file, a run, or a deployment β it's computer work. Use a tool built for that.
3. The build β step by step
- Find your usage data. Most tools expose token or usage logs β via an export, a billing/usage page, or an API endpoint. Grab a small sample first; you need the shape of the data, not all of it.
- Define the rows. At minimum: timestamp, tool, task label, tokens in, tokens out. If you can tag a task ("client report," "code review," "research"), do it β that tag is where the real insight lives later.
- Build the skeleton. Ask your tool to produce a single runnable dashboard that reads your sample and renders one line chart of tokens over time. Get something on screen before you make it pretty.
- Add the three core views: tokens over time, tokens by tool, and tokens by task type. These answer "when," "where," and "on what."
- Add a weekly rollup. A small table totalling the last seven days against the week before. This is the view you'll actually open every Monday.
- Wire it to live data. Only now point it at the real log or endpoint. Doing this last means you debug the chart and the connection separately, not both at once.
- Ship it ugly, then refine. A working ugly dashboard beats a beautiful one that never launched.
4. The prompt
Adapt this to your stack and your data. Swap the bracketed parts:
Starter Prompt
Build a single, runnable token-usage dashboard.
Input: a file of usage records with these fields β timestamp, tool, task_label, tokens_in, tokens_out. I'll attach a sample.
Show me three charts: (1) total tokens per day over time, (2) tokens grouped by tool, (3) tokens grouped by task_label.
Add a weekly summary table: last 7 days vs. the previous 7 days, with the percentage change.
Constraints: one self-contained file I can run immediately, no external services, sensible defaults if a field is missing. Start from a working version with my sample data, then we'll iterate.
The magic line is the last one. "Start from a working version, then we'll iterate" keeps you out of the blank-page trap and turns the build into a conversation about improvements rather than a single make-or-break request.
A spike means someone wrestled with something hard. A quiet stretch usually means they quietly went back to the old way.
5. Five rules for reading your chart
- Read the trend, not the day. One huge day means nothing. A line that climbs for three weeks means a habit is forming. A line that flattens means it didn't.
- A quiet stretch is more dangerous than a spike. A spike means someone wrestled with something hard β that's engagement. A quiet stretch usually means people quietly reverted to the old way of working. The drop-off is silent, which is exactly why it's the thing to watch.
- Watch the task mix, not just the total. Rising tokens on the same three tasks is depth. Tokens spreading across new task types is adoption. They look identical on a total-only chart and mean completely different things.
- Spikes are clues, not problems. When usage jumps, go find out what the work was. Your best spikes are tomorrow's templates.
- Falling tokens on a task can be a win. If a workflow gets templated and reused, tokens per outcome should drop. Efficiency looks like a decline β don't punish it.
6. The fifteen-minute weekly review
Block fifteen minutes every Friday. This is where a dashboard stops being a toy and starts compounding.
- Minutes 1β5 β Find the wins. Open this week's spikes. Which one-off runs actually worked? Which produced something you'd want again?
- Minutes 5β12 β Promote them. Take your two best runs and turn them into reusable workflows: a saved prompt, a template file, a documented recipe. The goal is to never rebuild that thing from scratch again.
- Minutes 12β15 β Note the quiet. Where did usage go silent that shouldn't have? Write one line about why, and one thing to try next week.
Most people's AI usage is a graveyard of brilliant one-off runs they never used twice. This fifteen minutes is what converts those runs into a workflow library. Do it for a quarter and your dashboard stops measuring effort and starts measuring leverage.
7. Why ranking your team by token volume backfires
The moment you put token counts on a leaderboard, you've created the wrong incentive. People optimise for the number you display β and token volume is a terrible number to optimise.
High volume can mean someone is doing hard, valuable work. It can equally mean they're inefficient, looping on the same task, or burning context they never needed. You cannot tell the difference from the total. Rank by it and you reward the loudest user, not the best one β and you punish the person who just templated a workflow so it runs in a fraction of the tokens.
The record that actually shows who can lead an AI rollout
It isn't who consumed the most. It's who converted runs into reusable workflows that others adopted. Look for:
- Prompts and templates other people on the team started using.
- Tasks where tokens-per-outcome fell over time β proof they made something efficient.
- The person colleagues quietly ask "how did you do that?"
That person isn't topping your volume chart. They're building the rails everyone else will run on. Measure that, and your dashboard tells you who to put in charge.