Output Evaluator — Self Assessment

You have chosen to self-assess

The Rubric framework can be evaluated by AI, or you can conduct the evaluation yourself. You have chosen to conduct the evaluation yourself.

What this is

The purpose

Most people using AI have a sense of when something feels wrong with what they got back. Fewer have a reliable way of knowing why it feels off, or a way explaining it to someone else.

This tool gives you a structure for that. It walks you through seven dimensions of quality, asks you to score the output you're looking at, and produces a record of your evaluation. No specialist knowledge required. The framework does the framing while you bring the judgement.

The evaluation covers two things: the quality of what the AI produced, and the quality of how you engaged with it to get there. Both matter.

Before you start

What to have ready

You'll need two things. First, the AI-generated output you want to evaluate — a piece of writing, a response to a question, a structured document, or anything else produced by an AI tool.

Second, your conversation notes or the exchange itself. The evaluation asks you to reflect not just on what the AI gave you, but on how you worked with it. How you prompted, whether you challenged the output, how you refined it. The AI-generated version of this evaluation does that analysis automatically. When you evaluate yourself, you need to bring that thinking consciously.

This is a reflection exercise, not a test. The evaluation takes around fifteen to twenty minutes.

How it works

The process

The rubric has seven dimensions. Each one looks at a different aspect of quality, not just whether the output reads well, but whether it's actually doing what it should.

For each dimension, you'll find a short description of what it's assessing. Read it, consider your output against it, and select a score. There's a notes field for each dimension if you want to capture your thinking. That's optional, but it makes the final record more useful.

Work through all seven in order. At the end, there's a short overall summary field before you generate your report.

Scoring

The scale

Each dimension is scored across five bands. The descriptions are intentionally honest.

Insufficient 0 — 2

Partial 3 — 4

Adequate 5 — 6

Capable 7 — 8

Exemplary 9 — 10

Adequate means it passed a minimum threshold and nothing more. Capable is genuinely strong. Exemplary means nothing more could reasonably be asked of it. Score what you actually see, not what you were hoping for.

The output

What you get

At the end, you can download a structured PDF of your evaluation. It records your scores, your notes, and a summary, formatted for reference or sharing.

Once you've completed a self-evaluation, you may want to see how an AI-generated assessment of the same output compares with your own. That's available as a separate tool, and the differences between the two evaluations can be as instructive as the scores themselves.

Dimension	Band	Score
Fit to Context	—	—
Evidence and Grounding	—	—
Analytical Depth	—	—
Purposeful Structure	—	—
Appropriate Register	—	—
Critical Integrity	—	—
Evaluative Judgement	—	—
AI Voice (holistic)	—	—

Dimension

Band

Score

Fit to Context

—

Evidence and Grounding

—

Analytical Depth

—

Purposeful Structure

—

Appropriate Register

—

Critical Integrity

—

Evaluative Judgement

—

AI Voice (holistic)

—

Self-Assessment

The purpose

What to have ready

The process

The scale

What you get