Training & Inference Pipeline

The lifecycle of an LLM — pre-training, fine-tuning, alignment, and how inference works when you send a prompt

Three-Stage Pipeline

The model reads trillions of tokens and learns to predict the next word. Cost: $10-50M. Duration: Weeks.

The model learns to follow instructions from human-written conversations. Cost: $10-100K.

Humans rank outputs by quality. The model learns to be helpful, harmless, and honest. Cost: $5-50K.

When you send a prompt:

**Tokenization:** Text is split into tokens

**Forward pass:** Tokens flow through all transformer layers

**Next-token prediction:** Model predicts the next token

**Autoregressive generation:** Feed token back in, repeat

Concept	Definition
Token	Unit of text (word/subword)
Context Window	Max tokens a model can process
Temperature	Controls randomness (0=deterministic)
Training vs Inference	Training = learning; Inference = using

What is the primary training objective during pre-training?

a) Answering questions b) Predicting next token c) Following instructions

What does RLHF stand for?

a) Recurrent Language Hidden Framework

b) Reinforcement Learning from Human Feedback

c) Recursive Language Hierarchical Function

✏️ Code Editor

Loading Python...

📤 Output

Write your solution and click "Run Code" to test it!