![Andrej Karpathy releases concise GPT implementation. Why has he bothered to do this: doesn't he work for OpenAI, at least indirectly? [D] [N]](https://jfbmhhfxbbrxcmwilqxt.supabase.co/storage/v1/object/public/resource-images/MachineLearning_AI_implementation_20250328_185843_processed_image.jpg)
Andrej Karpathy releases concise GPT implementation. Why has he bothered to do this: doesn't he work for OpenAI, at least indirectly? [D] [N]
It's nice to see a concise implementation of GPT, in pytorch, as it is true Hugging Face's Transformer's is excellent, but it is quite difficult to trace. They are trying to build it out constantly with loads of features, so you get lost.
I'm wondering Andrej's reason for doing this? His wiki states he works for OpenAI and Tesla is at least affiliated with Openai. Also it's very far from computer vision domain, so why spend the time on an open source implementation and make some guesses on GPT-2/GPT-3.
His implementation is easy to follow, which is nice , most reimplementations I see have bugs or are unecessary complex. I am surprised at how basic his trainer code is etc., that he pretty much uses the very standard stuff.
Andrej also states 'A vanilla multi-head masked self-attention layer with a projection at the end. I believe I could have just used torch.nn.MultiheadAttention but their documentation is all but absent and code ugly so I don't trust it, rolling my own here.'
I usually trust the torch team blindly to come up with a more efficient implementation than I ever could, as don't they stringently test their code, before releasing? so am also surprised by this.
Vibe Score

0
Sentiment

0
Rate this Resource
Join the VibeBuilders.ai Newsletter
The newsletter helps digital entrepreneurs how to harness AI to build your own assets for your funnel & ecosystem without bloating your subscription costs.
Start the free 5-day AI Captain's Command Line Bootcamp when you sign up: