I wanted to make a random word generator to help write stories. After all, naming is hard. What's a convincing name for a new city? Or a person? Or a food?
I ended up with a hash-table approach to figuring out probable letter combinations. You can feed the program a file, say a list of French words, and it will determine possible letter combinations along with their probability.
There are some parallels with large language models, which is why I named the actual method [[Small Character Model]]; likewise it uses a context window to determine the next most likely letter. For comparison (at the time of this writing) ChatGPT uses a context window of 128k tokens to determine the next word. Wizardly Words uses less than 10 letters. Yes, I know, not quite as beefy as ChatGPT, but also doesn't cost $700,000 a day to run. In fact, it's... free! 💸
Back to how it works - context matters, but it's also important to not overfit. If we were to have a context of 5 letters, instead of 3, we run the risk of creating words that sound too familiar. Here's the difference between words based on the English dictionary that use a context of 3 letters v. 5 letters:
| 3 Character Context | 5 Character Context |
| ------------------- | ------------------- |
| strate | parcel |
| potem | cale |
| votingl | talist |
| omicrom | fade |
| negapost | ventra |
| twent | stimulale |
| scro | adonize |
| intraino | squa |
| sumper | butter |
| aire | encourage |
You'll notice that the 5 character context comes up with more "realistic" sounding words, but more often finds words that already exist. The 3 character context model sounds less cohesive, but also allows for more interesting variants.
If you're interested, you can clone [the repo on my GitHub](https://github.com/joshgrant/WizardlyWords).