# LLM Resources

## Knowledge

- [Video: 3Blue1Brown — "But what is a GPT?"](https://www.youtube.com/watch?v=wjZofJX0v4M)
  Visual introduction to how LLMs work. Best starting point for building intuition. Uses custom animations to explain tokens, embeddings, and text generation.

- [Video: 3Blue1Brown — "Attention in Transformers, visually explained"](https://www.youtube.com/watch?v=eMlx5fFNoYc)
  Deep dive into self-attention — the core mechanism of LLMs. Essential viewing for understanding how LLMs "think."

- [Article: "Illustrated Transformer" — Jay Alammar](https://jalammar.github.io/illustrated-transformer/)
  The classic visual walkthrough of the Transformer architecture. Clear diagrams, minimal math. Use as reference when you need to revisit specific components.

- [Article: "Illustrated GPT-2" — Jay Alammar](https://jalammar.github.io/illustrated-gpt2/)
  Extends the Transformer explanation to GPT-style decoder-only models. Covers token generation, causal masking, and how GPT-2 works step by step.

- [Blog: "The Illustrated Word2Vec" — Jay Alammar](https://jalammar.github.io/illustrated-word2vec/)
  Explains word embeddings — the foundation of how LLMs represent meaning. Important for understanding why LLMs can reason about semantics.

- [Paper: "Attention Is All You Need" — Vaswani et al. (2017)](https://arxiv.org/abs/1706.03762)
  The original Transformer paper. Dense but important. Read after the visual explanations to solidify understanding. Use as a reference, not a tutorial.

- [Book: "Build a Large Language Model (From Scratch)" — Sebastian Raschka (2024)](https://www.manning.com/books/build-a-large-language-model-from-scratch)
  Practical, code-first approach. Even if you don't implement everything, the explanations connect theory to practice. Good for bridging from application development to deeper understanding.

- [Blog: Anthropic — "How Claude works"](https://www.anthropic.com/news/how-claude-works)
  High-level explanation from an AI lab perspective. Useful for understanding the training pipeline (pretraining → RLHF → deployment).

## Wisdom (Communities)

- [r/LocalLLaMA](https://reddit.com/r/LocalLLaMA)
  High-signal community for LLM practitioners. Active discussion of model capabilities, benchmarks, and application patterns. Good for staying current.

- [Hacker News](https://news.ycombinator.com)
  Search for LLM-related threads. High-quality technical discussions, often with practitioners sharing real-world experience.

- [Latent Space Podcast](https://www.latent.space/)
  Interviews with AI researchers and practitioners. Good for building wisdom about where the field is heading and what matters in practice.

## Gaps
- Need a good Chinese-language resource that explains LLM internals at an application-developer level
- Need practical "mental model" resources that help developers predict LLM behavior without deep math