Fact Extraction Steering Matrix

Source: fact-extraction-steering-matrix.md | Rendered: 2026-04-07 23:55:28 UTC

Tracks local LM Studio runs for buried-fact extraction quality.

Each run compares models and prompt sets on one synthetic corpus.

Run 2026-04-07 21:48:50 UTC

model	prompt set	docs	precision	recall	f1	uncertainty acc	uncertain recall	evidence support	invalid predicate	invalid arity	parse failures	overconfident	notes
qwen/qwen3.5-9b	strict_v1	2	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0	0	0	0	ok

Run 2026-04-07 21:51:04 UTC

model	prompt set	docs	precision	recall	f1	uncertainty acc	uncertain recall	evidence support	invalid predicate	invalid arity	parse failures	overconfident	notes
qwen/qwen3.5-9b	strict_v1	2	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0	0	2	0	parse_failures_present

Run 2026-04-07 22:12:24 UTC

model	prompt set	docs	precision	recall	f1	uncertainty acc	uncertain recall	evidence support	invalid predicate	invalid arity	parse failures	overconfident	notes
qwen/qwen3.5-9b	strict_v1	2	100.0%	50.0%	66.7%	50.0%	50.0%	50.0%	0	0	1	0	parse_failures_present

Run 2026-04-07 22:21:46 UTC

model	prompt set	docs	precision	recall	f1	uncertainty acc	uncertain recall	evidence support	invalid predicate	invalid arity	parse failures	overconfident	notes
qwen3.5-4b	baseline_v1	8	100.0%	44.4%	61.5%	50.0%	37.5%	50.0%	0	0	4	0	parse_failures_present
qwen3.5-4b	strict_v1	8	100.0%	44.4%	61.5%	37.5%	25.0%	37.5%	0	0	5	0	parse_failures_present
qwen/qwen3.5-9b	baseline_v1	8	100.0%	55.6%	71.4%	62.5%	50.0%	62.5%	0	0	3	0	parse_failures_present
qwen/qwen3.5-9b	strict_v1	8	100.0%	66.7%	80.0%	62.5%	37.5%	62.5%	0	0	3	0	parse_failures_present
qwen3.5-27b@q4_k_m	baseline_v1	8	100.0%	66.7%	80.0%	54.2%	12.5%	62.5%	0	0	3	2	parse_failures_present
qwen3.5-27b@q4_k_m	strict_v1	8	100.0%	77.8%	87.5%	75.0%	50.0%	75.0%	0	0	2	0	parse_failures_present