2026.02.02

A Gentle Introduction to Multi-Head Attention and Grouped-Query Attention