Introducing Cormsor 1.0 — a Turkish-native language model
Cormsor 1.0 is a 128M-parameter Turkish language model, tokenized and trained on 15 million Turkish sentences. It will soon be available here for live public testing.
Most Turkish LLM usage today routes through English-first foundation models. Their tokenizers were built on English-dominant corpora, so Turkish agglutinative morphology is fragmented into inefficient subwords — wasting context, inflating cost, and often missing idiomatic nuance.
Cormsor 1.0 is a focused step in the other direction: a compact transformer sized for fast inference and fine-tuning, with a tokenizer and training distribution built from scratch on Turkish text. It is not a frontier model — it is a purpose-built Turkish foundation intended as infrastructure for retrieval, classification, and downstream chat applications that demand low latency and native language fluency.
A public test endpoint will be published on this page. The first wave of user prompts and error reports will shape the Cormsor 2.0 training run.