Open Source Code Examples

Local Model Inference Server

This script demonstrates a simplified, open-source Python inference server that reflects how Haiku models handle runtime inference. The code is syntax-highlighted and explained line-by-line, keeping commentary separate from the implementation for readability.

Interactive Source

Hover to highlight lines. Click a line with a note marker to view an explanation.

inference_example.py

What’s included in this example:

Simple tokenizer loader
Transformer model definition + generation
Thread-safe Flask inference endpoint (/api/chat)

What’s not included:

All training code
Any training server endpoints