NanoGPT – My takes

By Jacob K. Lo August 5, 2023

NanoGPT is develop by Andrew Karpathy, a Lead engineer from Tesla.

The simplest, fastest repository for training/finetuning medium-sized GPTs.
Andrew Karpathy

It basically only 2 files, less than 200 lines of code. and all is coded from scratch. It is great to learn about the ins and outs of a large language model.

So, I looked into the code line by line, and checked out the tutorial. I want to see why large language models (LLMs) is so magical.

Here are the things I experiment.

Plot and animate each layer

I want to see the difference between layers when I train from scratch, fine-tune, overfit, turn all weight’s of a layer to zero, etc.

Remove some attention layers and generate text from it

Basically, I want to see what would happen to the output prompt, if I remove the last layer, or the first layer, maybe middle layer, etc.

Create artificial Language

I will create another blog post for this. But basically,

Create a language with maybe 5 characters [0,1,2,3,4], and space. I will also create some patterns for each character. For example, If I see a ‘2’, the next 2 character will always be ‘3’; If I see a ’43’, next character will be space, etc.

Last updated on March 15, 2024

Jacob K. Lo

View All Posts