Rootcomputer builds small language models from the ground up — architecture, training software, corpora, and alignment — to understand how these systems work and how to make them reliable.
Most of our work happens below 7 billion parameters. We think small models are undervalued as research tools — they train fast, fail legibly, and let you test ideas in days instead of weeks.
Our smallest model, Haiku Mini (~296M parameters), serves as a baseline for every architectural and training decision we make. Ideas that work at Haiku scale get promoted to larger runs. Ideas that don't get caught early.
This isn't a philosophical stance against scale. It's a practical one: constraints sharpen experiments.
We write our own training code. Our pipelines handle pretraining, supervised fine-tuning, and alignment end-to-end, with fine-grained control over optimization schedules, data mixing, and curriculum design.
Building the tooling ourselves means we can change anything at any layer of the stack — architecture, data pipeline, optimization — and trace the effect on downstream behavior.
We treat training data as a first-class research problem, not a commodity input.
Rootcomputer assembles and curates multi-source pretraining corpora with explicit control over domain composition, quality filtering, and mixing ratios. Our current pretraining mix draws from FineWeb, PubMed Central, Project Gutenberg, Common Crawl News, arXiv, Wikipedia, and PubMed Abstracts — each weighted to balance broad language coverage with domain-specific depth.
What a model sees during training shapes everything it does afterward. We invest accordingly.
Not every problem needs a general-purpose model. We actively develop specialist models trained for constrained domains and specific tasks.
These models trade breadth for reliability:
We think specialization is one of the more underexplored paths toward safe and dependable AI systems.
Alongside engineering work, Rootcomputer engages with foundational questions about intelligence, cognition, and what it means to build systems that process language.
Our thinking draws from computer science, cognitive science, neuroscience, and philosophy:
We treat topics like consciousness and understanding as open scientific questions — worth investigating carefully, not worth claiming prematurely.
Safety isn't a post-training patch. We study alignment as something that emerges from architecture, training data, and optimization — not something bolted on afterward.
The goal is to build systems that are predictable, understandable, and aligned with human values — starting from the architecture up.