In this issue of LLM Watch:
A potential alternative to MLPs
Not all layers are created equal
Going beyond single token prediction