Beyond LayerNorm: Introducing Derf for Normalization-Free Transformers
Автор: PaperLens
Загружено: 2025-12-22
Просмотров: 11
Discover Dynamic erf (Derf), a simple yet powerful point-wise function designed to replace traditional normalization layers like LayerNorm and RMSNorm. This research by Mingzhi Chen, Taiming Lu, Jiachen Zhu, Mingjie Sun, and Zhuang Liu (from Princeton, NYU, and CMU) demonstrates that Derf consistently outperforms standard normalization across vision, speech, and DNA modeling by improving model generalization.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: