By the end of this session, you can…
- LO 4.1Locate your target difficulty as a zone, not a single point — and describe the adjustment knobs that keep learners inside it.
- LO 4.2Distinguish four kinds of feedback by timing and grain, and choose the right kind for each objective type.
- LO 4.3Design a failure loop that teaches — short, costly-enough, recoverable, and instructive — and say when not to design one.
- LO 4.4Annotate your D2 with the specific challenge/feedback/failure risks per row; identify the highest-risk row first.
Difficulty is a zone, not a point
"Flow" is popular and mostly right but too coarse to design against. Think instead about two ranges inside one envelope: the tolerance band (where most learners can continue without help) and the productive-struggle band (where they stall but can recover within one or two attempts). Your game should spend 70% of time in the first and 25% in the second. The remaining 5% is genuine failure — see §04.
Time pressure
Shorten the clock and every other difficulty goes up with it. Most powerful for retrieval, discrimination, and procedural fluency; dangerous for judgment under uncertainty (it hides reasoning).
Information completeness
Hide features, delay readouts, obscure second-order effects. Most powerful for conceptual reasoning and judgment; can frustrate retrieval/discrimination if overused.
Distractor similarity
Make the wrong options look more like the right one. The fulcrum of discrimination learning. Too subtle and learners guess; too crude and they never have to look at the discriminating feature.
Stake
What the player loses on error. Progress, resources, relationship, identity. Low stake = low engagement and low attention. High stake = avoidance. Calibrate against the learner's risk tolerance.
Every mechanic in your game should have at least one tunable knob that you — or an adaptive system — can turn during play. If you have no knobs, your game teaches one learner and fails the rest.
Four kinds, four purposes
Most educational games default to immediate, verification-grain feedback ("correct!"). That choice is only defensible for retrieval. Every other objective type wants something else.
| Kind | Timing / grain | Best used for | Characteristic mistake |
|---|---|---|---|
| Verification | Immediate; correct/incorrect. | Retrieval, basic discrimination. | Used for judgment tasks — short-circuits the reasoning you wanted. |
| Elaborative | Immediate; why the answer was right or wrong. | Discrimination, procedural fluency. | Too long — players skim past it and lose the point. |
| Consequence | Delayed; change in world state. | Judgment under uncertainty, conceptual reasoning. | Consequence is ambiguous or lucky — learner cannot trace cause. |
| Reflection | Post-loop; learner-generated. | Conceptual reasoning, transfer. | Written prompts treated as paperwork; no integration with play. |
Immediate feedback maximizes engagement but can destroy transfer. Judgment and reasoning learners need time to sit in uncertainty — give feedback after a round, a shift, a day of game-time. Your players will hate it in week 1 and thank you in week 4.
Designing a failure loop that teaches
Kapur showed it in classrooms and every designer who has shipped a roguelite knows it: failure before instruction beats instruction-then-practice for conceptual transfer. But only if the failure is productive — short, costly-enough, recoverable, and instructive.
- ShortOne failure-to-retry cycle < 5 minutes for first-hour play. If recovery is slow, learners quit before they reach the lesson.
- CostlyLosing has to feel like losing — not a reset-with-everything. Resource or progress loss is fine; identity loss almost never is.
- RecoverableThe retry must be available immediately and changeable — player must be able to try a different approach. Same-exact retry teaches nothing.
- InstructiveThe failure state reveals a cue the player can use next attempt. If the player cannot tell why they lost, you have frustration, not failure-as-teacher.
Retrieval-only games. High-stakes professional domains where modeling failure is itself the harm (e.g., giving clinical trainees explicit "how to miss a diagnosis" patterns). In those cases, use consequence feedback without framing it as a failure loop.
Dial your failure loop
Productive failure has four criteria: short, costly, recoverable, instructive. The criteria trade off. Turn the dials, watch the verdict change, and see which combinations collapse into "homework," "punishment," or a real teach.
Productive-failure sandbox
Drag any dialImagine a single iteration of your failure loop — the time between a player getting a call wrong and their next meaningful decision. Adjust the dials to fit your game. The verdict updates live.
Three clips, three verdicts
Watch each clip in triads. Log challenge, feedback, and failure design choices; judge whether each one fits the game's apparent objective. Deliverables: one-paragraph verdict per clip.
Writing feedback copy that lands
Feedback is copy before it is UX. "Correct!" is a copywriting decision. Elaborative feedback is 1-2 lines of writing that must do three jobs in 30 words: confirm or contradict, explain the discriminating feature, and leave the player attentive for the next round. AI Studio is excellent at generating that kind of short-form copy — if you brief it properly.
Use case · Draft feedback copy for every event in your loop
Gemini 2.5 · temperature 0.5Give the model your event → feedback map (even a draft) and ask for copy by feedback kind. You will get 30–50 candidate lines in one pass. Most will be wrong tone. Three will be right — and three is more than you had five minutes ago.
You write in-game feedback copy for educational games. You follow four
feedback kinds with distinct rules:
- VERIFICATION: <=5 words, no elaboration. "Correct." "Miss."
- ELABORATIVE: <=30 words, names the discriminating feature or
corrects the specific error. Second person. No praise words.
- CONSEQUENCE: 0 words on-screen; describe instead what changes in
world state (score, NPC reaction, resource).
- REFLECTION: a question the player answers, not a statement. One
sentence. Not leading.
For each event I give you, produce three candidates in EACH of the four
kinds (12 lines total). Label them. Never invent content I did not
give you; if a field is missing, ask.
Do not use: "Great job," "Awesome," "Oops," exclamation points,
emoji, "Let's," or the word "learn."
Event: Resident picks a non-discriminating test (e.g., CBC when the
discriminator was lactate).
Context:
- Learner: 1st-year IM resident, night shift, overnight on call.
- Objective type: discrimination.
- Role: the intern.
- Tone: plain clinical; no cheerleading; no clinical shorthand a
layperson would need decoded.
Give me 12 lines across the four feedback kinds.
If AI Studio gives you a line that could appear in a toothpaste ad, delete it. Educational game copy has a register; match your learner's professional culture, not the model's default cheerful LMS voice.
Use it when
You have an event→feedback map (even a rough one from S8 drafts) and need to populate each cell with copy candidates. The model is faster than you at the first 20 lines; you are better at choosing.
Don't use it when
You have not decided which feedback kind each event uses. The choice is a design decision that carries from S4's taxonomy — not a copywriting detail.
Use case · Diagnose a failure loop that feels "stuck"
AdversarialIf your paper prototype's failure loop is frustrating playtesters and you cannot tell why, walk the model through the loop and ask it to locate which of the four failure-design criteria (short / costly / recoverable / instructive) is failing.
Here is a failure loop from my game, described step by step:
1. Player makes a triage call.
2. Scene advances; they see more patients.
3. At shift end, the critical patient they under-triaged is revealed.
4. Score summary screen.
5. "Retry shift" button.
Playtesters say it "feels like homework." Using the four criteria —
short, costly, recoverable, instructive — identify which is failing and
why. Do not propose a fix yet. Diagnose first. Be specific about which
step in my loop carries the failure.
Annotate your crosswalk
Return to your D2. For each row, write one sentence on challenge (which knob dominates), one on feedback (which kind, why), one on failure (loop / no loop, why). Star the row with the most unresolved risk; that row drives your prototype priorities in Session 07.