mHC: Manifold-Constrained Hyper-Connections (Paper Review)
Автор: SheepML
Загружено: 2026-01-07
Просмотров: 441
In this video, I explain DeepSeek's latest paper: mHC: Manifold-Constrained Hyper-Connections (arXiv: 2512.24880).
Hyper-Connections (HC) extended the classic residual connection paradigm by widening the residual stream and introducing learnable mixing matrices. While this brought performance gains, it also broke the identity mapping property — causing training instability and gradient explosions at scale (gains up to 3000× in 27B models).
mHC solves this by projecting the residual connection matrices onto the Birkhoff polytope (doubly stochastic matrices) using the Sinkhorn-Knopp algorithm. This restores stable signal propagation while preserving the flexibility and performance benefits of Hyper-Connections.
Key takeaways:
Why standard residual connections work (identity mapping)
How Hyper-Connections break this property
The Birkhoff polytope constraint and Sinkhorn-Knopp projection
Empirical results: stable training + better downstream performance
📄 Paper: https://arxiv.org/abs/2512.24880
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: