Blog

UnigramLM: An Attempt at Writing The Missing Manual

This post is my attempt to write down the UnigramLM tokenization algorithm cleanly and explicitly because no such derivation appears to exist and I think understanding the theory …