Attention mechanisms enable models to dynamically focus on relevant information, similar to how humans pay attention to specific parts when reading or looking at images.
How It Works
Attention computes a weighted combination of input values, where weights are determined by the relevance of each input to the current context. This is typically done through query-key-value computations.
Types
- Self-Attention: Attention within the same sequence
- Cross-Attention: Attention between two different sequences
- Multi-Head Attention: Multiple attention operations in parallel
Tags
Related Terms
Multi-Head Attention
Running multiple attention operations in parallel with different learned projections, capturing diverse relational patterns.
Query-Key-Value
The three learned projections in attention mechanisms used to compute attention weights and outputs.
Self-Attention
A mechanism where each token attends to all other tokens in the sequence to understand contextual relationships.
Transformer
A neural network architecture introduced in 'Attention is All You Need' (2017) that relies entirely on self-attention mechanisms, becoming the foundation for modern LLMs.