NanoGPT is develop by Andrew Karpathy, a Lead engineer from Tesla.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Andrew Karpathy
It basically only 2 files, less than 200 lines of code. and all is coded from scratch. It is great to learn about the ins and outs of a large language model.
So, I looked into the code line by line, and checked out the tutorial. I want to see why large language models (LLMs) is so magical.
Here are the things I experiment.
Plot and animate each layer
I want to see the difference between layers when I train from scratch, fine-tune, overfit, turn all weight’s of a layer to zero, etc.
Remove some attention layers and generate text from it
Basically, I want to see what would happen to the output prompt, if I remove the last layer, or the first layer, maybe middle layer, etc.
Create artificial Language
I will create another blog post for this. But basically,
Create a language with maybe 5 characters [0,1,2,3,4], and space. I will also create some patterns for each character. For example, If I see a ‘2’, the next 2 character will always be ‘3’; If I see a ’43’, next character will be space, etc.