A little talk on what we can learn from implementing LLM architectures from scratch in Python and PyTorch. And how I approach new open-weight models, compare them against reference implementations etc:
PS: New write-up in the works, featuring the latest LLM architecture design choices
May 14
at
1:45 PM
Relevant people
Log in or sign up
Join the most interesting and insightful discussions.