Prethinker Report Hub

generated 2026-04-19 20:05:59 UTC | runs=34 passed=33 pass_rate=97% prompts=1 curated=30 | curated evidence | scenario progress | prompt versions
curated manifest: runs manifest
Runs
34
Passed
33
Pass Rate
97%
Scenarios
31
Prompts
1
Newest Success Highlightspublic signal view
FinishedScenarioModelValidationReport
2026-04-12 22:37rung_449_frontier_multibind_uncle_queryqwen35-semparse:9b4/4report
2026-04-12 22:36rung_446_frontier_policy_noisy_rebind_loopqwen35-semparse:9b7/7report
2026-04-12 22:36rung_445_frontier_compound_write_query_braidqwen35-semparse:9b6/6report
2026-04-12 22:35rung_444_frontier_unpunctuated_coref_sweepqwen35-semparse:9b6/6report
2026-04-12 22:35rung_443_frontier_dual_item_handoff_corefqwen35-semparse:9b7/7report
2026-04-12 22:34rung_442_frontier_policy_multirevision_guardqwen35-semparse:9b9/9report
2026-04-12 22:33rung_441_frontier_pronoun_bucket_shuffleqwen35-semparse:9b7/7report
2026-04-12 22:32rung_440_frontier_policy_revision_loopqwen35-semparse:9b7/7report
2026-04-12 22:32rung_439_frontier_plural_coref_exception_guardqwen35-semparse:9b5/5report
2026-04-12 22:31rung_438_frontier_multibind_query_pressureqwen35-semparse:9b6/6report
2026-04-12 22:30rung_437_frontier_policy_override_flowqwen35-semparse:9b7/7report
2026-04-12 22:29rung_436_frontier_noise_typo_corefqwen35-semparse:9b6/6report
Spot Checkscurated checkpoints (not exhaustive)
FinishedScenarioStatusValidationModelReport
2026-04-11 23:51stage_01_facts_onlypassed2/2qwen3.5:9breport
2026-04-11 23:51stage_02_rule_ingestpassed1/1qwen3.5:9breport
2026-04-11 23:51stage_03_transitive_chainpassed1/1qwen3.5:9breport
2026-04-11 23:51acid_03_temporal_overridepassed3/3qwen3.5:9breport
2026-04-11 23:52acid_04_alias_pressurepassed3/3qwen3.5:9breport
2026-04-11 23:52acid_05_long_context_lineagepassed5/5qwen3.5:9breport
2026-04-12 22:37rung_449_frontier_multibind_uncle_querypassed4/4qwen35-semparse:9breport
2026-04-12 22:36rung_446_frontier_policy_noisy_rebind_looppassed7/7qwen35-semparse:9breport
2026-04-12 22:36rung_445_frontier_compound_write_query_braidpassed6/6qwen35-semparse:9breport
2026-04-12 22:35rung_444_frontier_unpunctuated_coref_sweeppassed6/6qwen35-semparse:9breport
2026-04-12 22:35rung_443_frontier_dual_item_handoff_corefpassed7/7qwen35-semparse:9breport
2026-04-12 22:34rung_442_frontier_policy_multirevision_guardpassed9/9qwen35-semparse:9breport
Progress by Scenariotrend window: last 6 runs per scenario
ScenarioLatestStatusValidationTrend Pass %Report
rung_449_frontier_multibind_uncle_query2026-04-12 22:37passed4/4100% (1/1)latest
rung_446_frontier_policy_noisy_rebind_loop2026-04-12 22:36passed7/7100% (2/2)latest
rung_445_frontier_compound_write_query_braid2026-04-12 22:36passed6/6100% (2/2)latest
rung_444_frontier_unpunctuated_coref_sweep2026-04-12 22:35passed6/6100% (2/2)latest
rung_443_frontier_dual_item_handoff_coref2026-04-12 22:35passed7/7100% (1/1)latest
rung_442_frontier_policy_multirevision_guard2026-04-12 22:34passed9/9100% (1/1)latest
rung_441_frontier_pronoun_bucket_shuffle2026-04-12 22:33passed7/7100% (1/1)latest
rung_440_frontier_policy_revision_loop2026-04-12 22:32passed7/7100% (1/1)latest
rung_439_frontier_plural_coref_exception_guard2026-04-12 22:32passed5/5100% (1/1)latest
rung_438_frontier_multibind_query_pressure2026-04-12 22:31passed6/6100% (1/1)latest
rung_437_frontier_policy_override_flow2026-04-12 22:30passed7/7100% (1/1)latest
rung_436_frontier_noise_typo_coref2026-04-12 22:29passed6/6100% (1/1)latest
rung_435_frontier_checkpoint_compound_turns2026-04-12 22:29passed6/6100% (1/1)latest
rung_434_dual_pronoun_flip_guard2026-04-12 22:28passed7/7100% (1/1)latest
rung_433_noisy_inverse_retarget_repair2026-04-12 22:28passed6/6100% (1/1)latest
rung_432_noise_pronoun_inversion_chain2026-04-12 22:27passed6/6100% (1/1)latest
rung_448_confirmation_gate_no_then_yes2026-04-12 22:25passed3/3100% (1/1)latest
rung_447_confirmation_gate_single_yes2026-04-12 22:24passed2/2100% (1/1)latest
demo_04_reimbursement_violation_check2026-04-12 13:05passed3/3100% (1/1)latest
demo_03_story_world_interrogator2026-04-12 13:04failed0/30% (0/1)latest
demo_02_policy_stress_test_machine2026-04-12 13:04passed2/2100% (1/1)latest
rung_360_ce_story_branch_merge_noise2026-04-12 10:22passed12/12100% (1/1)latest
rung_350_ce_story_multi_round_revision2026-04-12 10:22passed11/11100% (1/1)latest
rung_340_ce_story_pronoun_transfer2026-04-12 10:21passed11/11100% (1/1)latest
Run Explorer (Curated)
FinishedScenarioKBBackend/ModelStatusValidationPromptReportJSON
2026-04-11 23:51stage_01_facts_onlypeople_stageollama/qwen3.5:9bpassed2/2sp-1e43c641b01breportjson
2026-04-11 23:51stage_02_rule_ingestpeople_stageollama/qwen3.5:9bpassed1/1sp-1e43c641b01breportjson
2026-04-11 23:51stage_03_transitive_chainpeople_stageollama/qwen3.5:9bpassed1/1sp-1e43c641b01breportjson
2026-04-11 23:51acid_03_temporal_overrideacid_temporalollama/qwen3.5:9bpassed3/3sp-1e43c641b01breportjson
2026-04-11 23:52acid_04_alias_pressureacid_aliasollama/qwen3.5:9bpassed3/3sp-1e43c641b01breportjson
2026-04-11 23:52acid_05_long_context_lineageacid_lineageollama/qwen3.5:9bpassed5/5sp-1e43c641b01breportjson
2026-04-12 22:37rung_449_frontier_multibind_uncle_queryfrontierv6_r1_rung_449_frontier_multibind_uncle_queryollama/qwen35-semparse:9bpassed4/4sp-1e43c641b01breportjson
2026-04-12 22:36rung_446_frontier_policy_noisy_rebind_loopfrontierv6_r1_rung_446_frontier_policy_noisy_rebind_loopollama/qwen35-semparse:9bpassed7/7sp-1e43c641b01breportjson
2026-04-12 22:36rung_445_frontier_compound_write_query_braidfrontierv6_r1_rung_445_frontier_compound_write_query_braidollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:35rung_444_frontier_unpunctuated_coref_sweepfrontierv6_r1_rung_444_frontier_unpunctuated_coref_sweepollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:35rung_443_frontier_dual_item_handoff_coreffrontierv6_r1_rung_443_frontier_dual_item_handoff_corefollama/qwen35-semparse:9bpassed7/7sp-1e43c641b01breportjson
2026-04-12 22:34rung_442_frontier_policy_multirevision_guardfrontierv6_r1_rung_442_frontier_policy_multirevision_guardollama/qwen35-semparse:9bpassed9/9sp-1e43c641b01breportjson
2026-04-12 22:33rung_441_frontier_pronoun_bucket_shufflefrontierv6_r1_rung_441_frontier_pronoun_bucket_shuffleollama/qwen35-semparse:9bpassed7/7sp-1e43c641b01breportjson
2026-04-12 22:32rung_440_frontier_policy_revision_loopfrontierv6_r1_rung_440_frontier_policy_revision_loopollama/qwen35-semparse:9bpassed7/7sp-1e43c641b01breportjson
2026-04-12 22:32rung_439_frontier_plural_coref_exception_guardfrontierv6_r1_rung_439_frontier_plural_coref_exception_guardollama/qwen35-semparse:9bpassed5/5sp-1e43c641b01breportjson
2026-04-12 22:31rung_438_frontier_multibind_query_pressurefrontierv6_r1_rung_438_frontier_multibind_query_pressureollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:30rung_437_frontier_policy_override_flowfrontierv6_r1_rung_437_frontier_policy_override_flowollama/qwen35-semparse:9bpassed7/7sp-1e43c641b01breportjson
2026-04-12 22:29rung_436_frontier_noise_typo_coreffrontierv6_r1_rung_436_frontier_noise_typo_corefollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:29rung_435_frontier_checkpoint_compound_turnsfrontierv6_r1_rung_435_frontier_checkpoint_compound_turnsollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:28rung_434_dual_pronoun_flip_guardfrontierv6_r1_rung_434_dual_pronoun_flip_guardollama/qwen35-semparse:9bpassed7/7sp-1e43c641b01breportjson
2026-04-12 22:28rung_433_noisy_inverse_retarget_repairfrontierv6_r1_rung_433_noisy_inverse_retarget_repairollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:27rung_432_noise_pronoun_inversion_chainfrontierv6_r1_rung_432_noise_pronoun_inversion_chainollama/qwen35-semparse:9bpassed6/6sp-1e43c641b01breportjson
2026-04-12 22:25rung_448_confirmation_gate_no_then_yesconfprobe_r4_rung_448_confirmation_gate_no_then_yesollama/qwen35-semparse:9bpassed3/3sp-1e43c641b01breportjson
2026-04-12 22:24rung_447_confirmation_gate_single_yesconfprobe_r4_rung_447_confirmation_gate_single_yesollama/qwen35-semparse:9bpassed2/2sp-1e43c641b01breportjson
2026-04-12 13:05demo_04_reimbursement_violation_checkdemo_04_reimbursement_violation_checkollama/qwen35-semparse:9bpassed3/3sp-1e43c641b01breportjson
2026-04-12 13:04demo_02_policy_stress_test_machinedemo_02_policy_stress_test_machineollama/qwen35-semparse:9bpassed2/2sp-1e43c641b01breportjson
2026-04-12 10:22rung_360_ce_story_branch_merge_noiserung_360_ce_story_branch_merge_noiseollama/qwen35-semparse:9bpassed12/12sp-1e43c641b01breportjson
2026-04-12 10:22rung_350_ce_story_multi_round_revisionrung_350_ce_story_multi_round_revisionollama/qwen35-semparse:9bpassed11/11sp-1e43c641b01breportjson
2026-04-12 10:21rung_340_ce_story_pronoun_transferrung_340_ce_story_pronoun_transferollama/qwen35-semparse:9bpassed11/11sp-1e43c641b01breportjson
2026-04-12 02:23rung_230_fuzzy_ce_branch_exclusion_languagerung_230_fuzzy_ce_branch_exclusion_languageollama/qwen3.5:9bpassed6/6sp-1e43c641b01breportjson
This explorer is curated for signal. Full history is in runs manifest.
Prompt Evolutionlatest 6 snapshots
Prompt IDRunsPass %Avg Validation %Last SeenScenariosModelsAction
sp-1e43c641b01b3497%97%2026-04-12 22:37312
KB Snapshots
KBUpdatedSize
people_core.html2026-04-09 15:565 KB
people_ladder.html2026-04-09 15:565 KB
people_ladder_tune.html2026-04-09 15:565 KB