Building GPT from scratch using Apple MLX

I spent some time recently recreating the GPT model alongside the video from Andrej Karpathy using the Apple MLX machine learning framework.

Building GPT from scratch using Apple MLX

I spent some time recently recreating the GPT model alongside the video from Andrej Karpathy using the Apple MLX machine learning framework.

I found it to be a fun exercise and learned a ton. The Apple MLX framework is developed specifically for Apple Silicon and largely mirrors PyTorch with some exceptions.

I trained the model on my M4 Max Macbook Pro with 64GB of ram and was able to recreate the model by training for around 30 minutes. Check out the code below if you are interested.

https://github.com/zsiegel/mlx-gpt

Subscribe to zsiegel.com

Don’t miss out on the latest articles. Sign up now to get exclusive members only content and early access to articles.
jamie@example.com
Subscribe