WIP · Phase 0 proof of concept

Better European language models, aligned with European values

Open models are trained on English first, everything else second. We think we can do better: higher quality output for European languages, more aligned with how Europeans communicate and what we value. This is our proof of concept, starting with Dutch. Let's be honest: this is a fine-tune, not a new foundation model. But a good fine-tune on the right data, properly measured, is a useful starting point.

How we measure

Two tracks: automated benchmarks and human evaluation.

EuroEval (euroeval.com) is a benchmarking framework for European languages. It tests reading comprehension, summarization, NER, and linguistic quality across 12+ languages including Dutch. It's maintained by the Alexandra Institute, funded by the EU's TrustLLM project, and comes with a Python package so anyone can reproduce the results.

Arena eval: we also run a blind human comparison. Same prompt goes to the baseline and our tuned model. Dutch speakers see both outputs side by side (anonymized) and pick which one is better. This gives us something benchmarks can't: "do real people actually prefer this model's Dutch?"

Starting point

We run EuroEval on a few strong open models first and pick the best one as our base. Current shortlist:

Whichever scores best on Dutch becomes the base for fine-tuning.

Training data

No copyrighted books, no unauthorized subtitles. Every source gets a license check.

Method

QLoRA fine-tuning on EU-hosted GPUs (a single A100 is enough for a 7-9B model).

Cost

Realistic total: €200-500 including iteration. All compute on EU providers. Price reference: OVHcloud A100 at €2.75/hour.

Timeline

Volunteer project, not a corporate sprint.

Deliverables

Join on Matrix FAQ Home