What is LLM ?

Rohit Kumar

27 December 2025

2 mins

What is LLM ?

What is LLM: A Beginner’s Guide

LLMs are sophisticated functions that predict the next word (token) in a sequence by assigning probabilities to all possible words.

Key Points:

  • Training: Models learn from enormous amounts of text (GPT-3’s training data would take 2,600+ years to read). They start with random parameters that get refined through billions of examples using backpropagation.

  • Scale: Training requires over 100 million years worth of computation if done sequentially, made possible only by GPUs running operations in parallel.

  • Transformers (introduced by Google in 2017): Unlike older models that processed text sequentially, transformers process all text simultaneously using:

    • Attention mechanisms: Allow words to influence each other’s meanings based on context
    • Feed-forward networks: Store language patterns learned during training
    • Fine-tuning: After pre-training, models undergo “reinforcement learning with human feedback” where workers flag problematic outputs to align the model with user preferences.

How LLM Chatbot Works?

LLM chatbots take a user prompt, encode it into tokens, and generate a reply by repeatedly predicting the next token until a stopping condition is reached.

LLM Chatbot Process

Conversation format

  • Prompts are often structured with explicit markers to provide role and intent, e.g.:
    • User: or Q: for input
    • Assistant: or A: for the expected response
  • Consistent formatting helps the model understand turn-taking and desired output style.

Generation process (high level)

  • Tokenization: input text → discrete tokens the model understands.
  • Encoding: tokens fed into the transformer to compute contextual representations (attention + feed-forward).
  • Decoding: the model predicts the next token autoregressively and appends it, repeating until completion.
  • Decoding strategies: greedy, beam search, top-k/top-p sampling, and temperature control how deterministic or diverse outputs are.

Context and transformers

  • Attention lets tokens influence each other so responses reflect context from the entire input window.
  • During training transformers process tokens in parallel; during autoregressive generation tokens are produced sequentially to preserve coherence.

Augmentation and retrieval

  • Retriever-Augmented Generation (RAG) and similar patterns combine an information retrieval step with the LLM:
    • Retrieve relevant documents or facts.
    • Condition the model on retrieved context to produce more accurate, grounded answers.
  • Useful for up-to-date or domain-specific queries the base model didn’t memorize.

Determinism and variability

  • The model’s weights are fixed for a given checkpoint, but outputs can vary due to decoding choices (sampling, temperature) and nondeterministic runtime factors.
  • Re-running with the same prompt and deterministic decoding yields the same output; sampling introduces deliberate variability.

Training and model behavior

  • Model behavior emerges from hundreds of billions of learned parameters refined by gradient-based training (e.g., backpropagation) on large corpora.
  • Post-training alignment (fine-tuning, RLHF) adjusts behavior toward safer or more helpful responses.

Practical tips

  • Provide clear role markers and concise context.
  • Include examples or desired formats in the prompt for consistent output.
  • Use retrieval or tool calls for factual, up-to-date, or long-context needs.
  • Adjust decoding settings to trade off creativity vs. predictability.