Prolog Extraction Test Ladder

Generated 2026-04-13 00:48:06 UTC | Back to docs

Rung Scenario Utterances Validations Latest Run
1rung_449_frontier_multibind_uncle_query74passed (json)
2rung_448_confirmation_gate_no_then_yes53passed (json)
3rung_447_confirmation_gate_single_yes32passed (json)
4rung_446_frontier_policy_noisy_rebind_loop177passed (json)
5rung_445_frontier_compound_write_query_braid106passed (json)
6rung_444_frontier_unpunctuated_coref_sweep116passed (json)
7rung_443_frontier_dual_item_handoff_coref167passed (json)
8rung_442_frontier_policy_multirevision_guard229passed (json)
9rung_441_frontier_pronoun_bucket_shuffle157passed (json)
10rung_440_frontier_policy_revision_loop177passed (json)
11rung_439_frontier_plural_coref_exception_guard135passed (json)
12rung_438_frontier_multibind_query_pressure156passed (json)
13rung_437_frontier_policy_override_flow177passed (json)
14rung_436_frontier_noise_typo_coref156passed (json)
15rung_435_frontier_checkpoint_compound_turns156passed (json)
16rung_434_dual_pronoun_flip_guard97passed (json)
17rung_433_noisy_inverse_retarget_repair86passed (json)
18rung_432_noise_pronoun_inversion_chain76passed (json)
19rung_431_book_goldilocks_raw_chaptered_qa207not run
20rung_430_goldilocks_roundtrip_retry534not run
21rung_420_progress_focus_shift_transition62passed (json)
22rung_410_progress_goal_context_steering52passed (json)
23rung_400_progress_relevance_repair31passed (json)
24rung_390_progress_goal_directed_clarification41passed (json)
25rung_380_progress_irrelevant_fact_filter62passed (json)
26rung_370_progress_feasibility_alignment52passed (json)
27rung_360_ce_story_branch_merge_noise1412passed (json)
28rung_350_ce_story_multi_round_revision1811passed (json)
29rung_340_ce_story_pronoun_transfer1911passed (json)
30rung_330_story_booklet_cross_scene_rebind1715passed (json)
31rung_320_story_temporal_exception_rebinding1917passed (json)
32rung_310_story_cross_clause_pronoun_weave1415passed (json)
33rung_300_story_nested_corrections1917passed (json)
34rung_290_story_multi_branch_pronoun_pressure1412passed (json)
35rung_280_story_revision_temporal_shift1612passed (json)
36rung_270_story_lineage_fragmented_ingest1611passed (json)
37rung_261_sim_fantasy_overlord_natural_flow195not run
38rung_260_sim_fantasy_state_repair103not run
39rung_251_ops_indie_warroom_natural_flow196not run
40rung_250_ops_indie_launch_uncertainty_routing84not run
41rung_241_ops_hospital_cpm_natural_flow176not run
42rung_240_ops_hospital_vendor_delay_core105not run
43rung_230_fuzzy_ce_branch_exclusion_language106passed (json)
44rung_220_fuzzy_ce_rule_timing_branch_swap139passed (json)
45rung_210_fuzzy_ce_selective_edge_rebuild96passed (json)
46rung_200_ce_selective_branch_repair_queries118passed (json)
47rung_190_ce_midstream_retarget_queries96passed (json)
48rung_180_ce_noisy_pronoun_reverse_guard76passed (json)
49rung_170_ce_pronoun_followup_no_qmark54passed (json)
50rung_160_ce_soft_retract_noise73passed (json)
51rung_150_ce_typo_uncertainty_chain54passed (json)
52rung_140_ce_pronoun_typo_missing_qmark54passed (json)
53rung_130_fuzzy_tail_soft_retract_language84not run
54rung_120_fuzzy_tail_name_noise64not run
55rung_110_fuzzy_tail_fragmented_syntax63not run
56rung_100_fuzzy_tail_directional_chatty126not run
57rung_99_spacing_max_english_directional_stress126passed (json)
58rung_90_spacing_direction_consistency_stress104not run
59rung_80_spacing_query_inversion_guard84not run
60rung_70_spacing_multi_branch_inverse_repair104not run
61rung_60_spacing_hedged_correction_direction74not run
62rung_50_spacing_passive_inverse_mix74not run
63rung_45_spacing_inverse_parent_bundle64not run
64rung_40_robustness_hard_hedged_inversion_bundle83not run
65rung_40_spacing_hedged_inverse_guard83not run
66rung_35_robustness_hard_passive_voice_repair73not run
67rung_35_spacing_passive_direction_repair63not run
68rung_30_robustness_hard_role_inversion_parent_form53not run
69rung_30_spacing_role_inversion_pressure54not run
70rung_28_robustness_hard_parallel_branch_retarget103not run
71rung_27_robustness_hard_midstream_query_repair103not run
72rung_26_robustness_hard_double_repair_chain93not run
73rung_25_robustness_hard_branch_preservation93not run
74rung_24_robustness_hard_passive_retarget83not run
75rung_23_robustness_hard_repair_bridge83not run
76rung_22_robustness_hard_retarget_lineage93not run
77rung_21_robustness_hard_hedged_retract_shift73not run
78rung_20_robustness_hard_inversion_chain52not run
79rung_19_robustness_hard_hedged_retarget83not run
80rung_18_robustness_easy_inversion_retract62not run
81rung_17_robustness_easy_paraphrase_chain52not run
82acid_16_rule_stack_retarget73not run
83acid_15_dual_track_repair73not run
84acid_14_unary_conjunction_retract_effect62not run
85acid_13_branch_preservation_after_repair63not run
86acid_12_compound_repair_with_query63not run
87acid_11_batched_fact_rule_retract_mix63not run
88acid_10_compound_retract_unpacking93not run
89acid_09_compound_rule_unpacking42not run
90acid_08_contradiction_reconciliation83not run
91acid_07_relation_drift_pressure83not run
92acid_06_compound_unpacking83not run
93acid_05_long_context_lineage205passed (json)
94acid_04_alias_pressure103passed (json)
95acid_03_temporal_override103passed (json)
96stage_03_transitive_chain41passed (json)
97stage_02_rule_ingest21passed (json)
98stage_01_facts_only22passed (json)
99stage_00_foreign_unseen_probe80passed (json)
100stage_00_multilingual_probe70passed (json)