Building GPT from scratch using Apple MLX
I spent some time recently recreating the GPT model alongside the video from Andrej Karpathy using the Apple MLX machine learning framework.
I spent some time recently recreating the GPT model alongside the video from Andrej Karpathy using the Apple MLX machine learning framework.
I found it to be a fun exercise and learned a ton. The Apple MLX framework is developed specifically for Apple Silicon and largely mirrors PyTorch with some exceptions.
I trained the model on my M4 Max Macbook Pro with 64GB of ram and was able to recreate the model by training for around 30 minutes. Check out the code below if you are interested.
https://github.com/zsiegel/mlx-gpt