728x90
BOS
-
[Team Seminar] Why do LLMs attend to the first token?AI/NLP 2025. 5. 10. 16:22
[Team Seminar] Why do LLMs attend to the first token? 원래 궁금하던 논문이었는데 때마침 팀원분이 리뷰해주셔서 나이스 https://arxiv.org/abs/2504.02732 Why do LLMs attend to the first token?Large Language Models (LLMs) tend to attend heavily to the first token in the sequence -- creating a so-called attention sink. Many works have studied this phenomenon in detail, proposing various ways to either leverage or alleviate..