_rootcomputer

Research

Rootcomputer builds small language models from the ground up — architecture, training software, corpora, and alignment — to understand how these systems work and how to make them reliable.

Small Models First

Most of our work happens below 7 billion parameters. We think small models are undervalued as research tools — they train fast, fail legibly, and let you test ideas in days instead of weeks.

Our smallest model, Haiku Mini (~296M parameters), serves as a baseline for every architectural and training decision we make. Ideas that work at Haiku scale get promoted to larger runs. Ideas that don't get caught early.

This isn't a philosophical stance against scale. It's a practical one: constraints sharpen experiments.

Training Software & Infrastructure

We write our own training code. Our pipelines handle pretraining, supervised fine-tuning, and alignment end-to-end, with fine-grained control over optimization schedules, data mixing, and curriculum design.

Custom transformer training and evaluation pipelines
Streaming data loaders with weighted multi-source interleaving
Multi-phase training support (pretrain → SFT → DPO)
Designed to run on limited hardware without sacrificing reproducibility

Building the tooling ourselves means we can change anything at any layer of the stack — architecture, data pipeline, optimization — and trace the effect on downstream behavior.

Curated Corpora & Data Design

We treat training data as a first-class research problem, not a commodity input.

Rootcomputer assembles and curates multi-source pretraining corpora with explicit control over domain composition, quality filtering, and mixing ratios. Our current pretraining mix draws from FineWeb, PubMed Central, Project Gutenberg, Common Crawl News, arXiv, Wikipedia, and PubMed Abstracts — each weighted to balance broad language coverage with domain-specific depth.

Corpus cleaning pipelines with extensive regex-based boilerplate removal
Weighted multi-file interleaving with document-level shuffle buffers
Supervised fine-tuning datasets built locally with deduplication and quality controls
Alignment data designed for behavioral constraint without sycophancy

What a model sees during training shapes everything it does afterward. We invest accordingly.

Specialist & Narrow Models

Not every problem needs a general-purpose model. We actively develop specialist models trained for constrained domains and specific tasks.

These models trade breadth for reliability:

Lower hallucination rates within their target domain
Stronger behavioral guarantees through scope limitation
Clear, measurable evaluation criteria
Practical deployment at lower compute cost

We think specialization is one of the more underexplored paths toward safe and dependable AI systems.

AI Theory, Intelligence, and the Mind

Alongside engineering work, Rootcomputer engages with foundational questions about intelligence, cognition, and what it means to build systems that process language.

Our thinking draws from computer science, cognitive science, neuroscience, and philosophy:

What constitutes intelligence in artificial systems, and how would we know
The relationship between language modeling and reasoning
Emergent behavior and internal representation in trained networks
The limits of simulation, understanding, and machine cognition

We treat topics like consciousness and understanding as open scientific questions — worth investigating carefully, not worth claiming prematurely.

Alignment, Safety, and Constitutional AI

Safety isn't a post-training patch. We study alignment as something that emerges from architecture, training data, and optimization — not something bolted on afterward.

Constitutional approaches to behavioral constraint during training
Failure mode discovery through systematic probing at small scale
How alignment pressure changes as models grow
Long-horizon interaction safety and drift detection

The goal is to build systems that are predictable, understandable, and aligned with human values — starting from the architecture up.