In production
Drafter vs Critic — the NCN writing loop
Two local agents per article: one drafts, one critiques. No cloud APIs — the same debate loop that publishes every headline on this site.
NCN Labs
Reproducible benchmarks, local-only inference, zero cloud APIs. The same rigor we apply to news — applied to models.
Flagship · Local LLM Arena
Same prompts, same machine, local judge. Full rankings across coding, reasoning, tools and vision.
In production
Two local agents per article: one drafts, one critiques. No cloud APIs — the same debate loop that publishes every headline on this site.
Experiment
Pair two local models on a controversial headline. Same system prompt, opposite stances, human pick — calibrates which model argues better before we trust it in the pipeline.
Playbook
Quant picks (Q4 vs Q8 vs FP16), context windows, keep_alive, batching, and when a 7B beats a 32B on Strix Halo — lessons from running NCN 24/7 on Ollama.
In production
Every new post runs through qwen3.6 for full ES translation, native slugs and WebP heroes before deploy. We log failures instead of shipping English by accident.
Benchmark
Identical prompts across quantization levels on the same GPU. Score output quality vs tokens/sec to find the sweet spot for daily inference.
Coming soon
Hero images for NCN articles: same brief, blind human + VLM judge. Which local stack produces usable editorial art without Midjourney?
Coming soon
Debate rounds, token spend, gen_image latency, translate time and deploy duration — a dashboard for the full NCN cron run.
Coming soon
Partial evidence, stale chunks, wrong citations. Synthetic corpora to measure how often local models hallucinate despite having the “right” context.