Encoder & Decoder

Encoder: Processes the input sequence using multiple layers of self-attention and feed-forward networks.
Decoder: Takes the encoder’s output and generates the target sequence using self-attention and cross-attention mechanisms.

The Transformer consists of two main parts: an encoder and a decoder. They are connected by Cross-Attention.