Open Source Code Examples

Local Model Inference Server

This script demonstrates a simplified, open-source Python inference server that reflects how Haiku models handle runtime inference. The code is syntax-highlighted and explained line-by-line, keeping commentary separate from the implementation for readability.

Interactive Source

Hover to highlight lines. Click a line with a note marker to view an explanation.

inference_example.py

What’s included in this example:

  • Simple tokenizer loader
  • Transformer model definition + generation
  • Thread-safe Flask inference endpoint (/api/chat)

What’s not included:

  • All training code
  • Any training server endpoints