Model Architectures

Attention mechanisms enable models to dynamically focus on relevant information, similar to how humans pay attention to specific parts when reading or looking at images.

How It Works

Attention computes a weighted combination of input values, where weights are determined by the relevance of each input to the current context. This is typically done through query-key-value computations.

Types

  • Self-Attention: Attention within the same sequence
  • Cross-Attention: Attention between two different sequences
  • Multi-Head Attention: Multiple attention operations in parallel

Tags

architecture mechanism nlp

Related Terms

Added: January 15, 2025