Evaluate the quality of an AI-generated output using the seven-dimension rubric — scored by you.
Most people using AI have a sense of when something feels wrong with what they got back. Fewer have a reliable way of knowing why it feels off, or a way explaining it to someone else.
This tool gives you a structure for that. It walks you through seven dimensions of quality, asks you to score the output you're looking at, and produces a record of your evaluation. No specialist knowledge required. The framework does the framing while you bring the judgement.
The evaluation covers two things: the quality of what the AI produced, and the quality of how you engaged with it to get there. Both matter.
You'll need two things. First, the AI-generated output you want to evaluate — a piece of writing, a response to a question, a structured document, or anything else produced by an AI tool.
Second, your conversation notes or the exchange itself. The evaluation asks you to reflect not just on what the AI gave you, but on how you worked with it. How you prompted, whether you challenged the output, how you refined it. The AI-generated version of this evaluation does that analysis automatically. When you evaluate yourself, you need to bring that thinking consciously.
This is a reflection exercise, not a test. The evaluation takes around fifteen to twenty minutes.
The rubric has seven dimensions. Each one looks at a different aspect of quality, not just whether the output reads well, but whether it's actually doing what it should.
For each dimension, you'll find a short description of what it's assessing. Read it, consider your output against it, and select a score. There's a notes field for each dimension if you want to capture your thinking. That's optional, but it makes the final record more useful.
Work through all seven in order. At the end, there's a short overall summary field before you generate your report.
Each dimension is scored across five bands. The descriptions are intentionally honest.
Adequate means it passed a minimum threshold and nothing more. Capable is genuinely strong. Exemplary means nothing more could reasonably be asked of it. Score what you actually see, not what you were hoping for.
At the end, you can download a structured PDF of your evaluation. It records your scores, your notes, and a summary, formatted for reference or sharing.
Once you've completed a self-evaluation, you may want to see how an AI-generated assessment of the same output compares with your own. That's available as a separate tool, and the differences between the two evaluations can be as instructive as the scores themselves.
Before scoring, tell us a little about the output you're evaluating and how it was produced.
Work through each dimension in order. Select the band that best describes what you see. Add notes if they'd be useful to you later.
After scoring the seven dimensions, consider the output as a whole for this final measure.
In your own words, what are the two or three most significant things you found? What single change would most improve the output?
Draw on your dimensional scores and notes. You do not need to repeat every detail, just the things that matter most.
| Dimension | Band | Score |
|---|---|---|
| Fit to Context | — | — |
| Evidence and Grounding | — | — |
| Analytical Depth | — | — |
| Purposeful Structure | — | — |
| Appropriate Register | — | — |
| Critical Integrity | — | — |
| Evaluative Judgement | — | — |
| AI Voice (holistic) | — | — |
When you're satisfied with your evaluation, generate your PDF record.