Tokenizer

Haiku Mini Tokenizer

Full BPE tokenizer used by Haiku Mini models.

Tokenizer Explorer

Search for words, phrases or IDs.

haiku_50kbpe.json

About this tokenizer

This is the 50,000-token Byte Pair Encoding (BPE) vocabulary used across the Haiku Mini model line. It’s designed as a balanced general-purpose tokenizer, covering a broad range of common words, subwords, and high-frequency phrases to support efficient encoding and strong everyday language performance.