Best Way to Learn About AI | VibeBuilders.ai Resource Directory

It's nice to see a concise implementation of GPT, in pytorch, as it is true Hugging Face's Transformer's is excellent, but it is quite difficult to trace. They are trying to build it out constantly with loads of features, so you get lost.

I'm wondering Andrej's reason for doing this? His wiki states he works for OpenAI and Tesla is at least affiliated with Openai. Also it's very far from computer vision domain, so why spend the time on an open source implementation and make some guesses on GPT-2/GPT-3.

His implementation is easy to follow, which is nice , most reimplementations I see have bugs or are unecessary complex. I am surprised at how basic his trainer code is etc., that he pretty much uses the very standard stuff.

Andrej also states 'A vanilla multi-head masked self-attention layer with a projection at the end. I believe I could have just used torch.nn.MultiheadAttention but their documentation is all but absent and code ugly so I don't trust it, rolling my own here.'

I usually trust the torch team blindly to come up with a more efficient implementation than I ever could, as don't they stringently test their code, before releasing? so am also surprised by this.

Andrej Karpathy releases concise GPT implementation. Why has he bothered to do this: doesn't he work for OpenAI, at least indirectly? [D] [N]

Rate this Resource

Join the VibeBuilders.ai Newsletter

Topics