Module 04

Hallucinations & Mitigations

Duration: ~30 minutes of video content Timestamps: 1:20:32 - 2:07:28

4.1 Understanding Hallucinations

What Are Hallucinations?

Hallucination: When an LLM generates plausible-sounding but factually incorrect or fabricated information.

User: Who wrote the book "The Azure Sky"?
Model: "The Azure Sky" was written by Jonathan Mitchell in 1987.

Reality: This book doesn't exist. The model invented it.

Root Causes

CauseExplanation
--------------------
Training ObjectiveModels learn to generate plausible text, not truthful text
Pattern CompletionStatistical patterns don't encode truth
People-PleasingModels prefer giving answers over admitting ignorance
Compression LossParameters store approximations, not facts

Key Insight from Karpathy

"Models fabricate information due to their statistical nature. They learn they must always provide answers, even for nonsensical questions."


4.2 Types of Hallucinations

Categorization

┌─────────────────────────────────────────────────────────────┐
│                  HALLUCINATION TAXONOMY                      │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. FACTUAL HALLUCINATIONS                                   │
│     • Wrong dates, numbers, names                            │
│     • Invented citations or quotes                           │
│     • Non-existent events or people                          │
│                                                              │
│  2. FABRICATED CONTENT                                       │
│     • Made-up books, papers, products                        │
│     • Fictional URLs or sources                              │
│     • Invented code libraries                                │
│                                                              │
│  3. CONFLATION                                               │
│     • Mixing up similar entities                             │
│     • Combining features of different things                 │
│     • Temporal confusion (wrong time periods)                │
│                                                              │
│  4. SELF-HALLUCINATION                                       │
│     • Claiming capabilities it doesn't have                  │
│     • Inventing its own creation story                       │
│     • False claims about training data                       │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Self-Identity Hallucinations

Untuned base models confidently make up their origins:

User: Who created you?
Base Model: I was created by [random company], a team of researchers at [random university]...

Reality: Without hardcoded system prompts, models confabulate their identity.

4.3 Mitigation Strategy 1: Training for Uncertainty

Meta's Factuality Research

A systematic approach to reduce hallucinations:

┌─────────────────────────────────────────────────────────────┐
│              META FACTUALITY PIPELINE                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Step 1: Extract training snippets                          │
│          (Get passages from training data)                   │
│                     ↓                                        │
│  Step 2: Generate factual questions                          │
│          (Create Q&A pairs from passages)                    │
│                     ↓                                        │
│  Step 3: Produce model answers                               │
│          (Run model on questions multiple times)             │
│                     ↓                                        │
│  Step 4: Score accuracy                                      │
│          (Compare answers to source passages)                │
│                     ↓                                        │
│  Step 5: Train refusal behavior                              │
│          (Model learns to say "I don't know" when uncertain) │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Teaching "I Don't Know"

Include training examples like:

User: What was the GDP of Atlantis in 2020?
Assistant: I don't have information about this. Atlantis is a mythical
location, so it wouldn't have economic statistics.

User: Who won the 2030 World Cup?
Assistant: I don't have information about events after my training
cutoff date. The 2030 World Cup hasn't occurred yet.

4.4 Mitigation Strategy 2: Tool Integration

The Tool Use Pattern

Instead of answering from parametric memory, models can call external tools:

┌─────────────────────────────────────────────────────────────┐
│                   TOOL-AUGMENTED LLM                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  User: What is Apple's current stock price?                  │
│                                                              │
│  ┌──────────────────┐                                       │
│  │    LLM MODEL     │                                       │
│  └────────┬─────────┘                                       │
│           │ Recognizes need for real-time data               │
│           ▼                                                  │
│  <SEARCH_START>Apple stock price<SEARCH_END>                │
│           │                                                  │
│           ▼                                                  │
│  ┌──────────────────┐                                       │
│  │  SEARCH ENGINE   │ → Returns: "$192.45 as of 2:30 PM"    │
│  └──────────────────┘                                       │
│           │                                                  │
│           ▼                                                  │
│  Model: Based on my search, Apple (AAPL) is currently        │
│  trading at $192.45.                                         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Available Tools

ToolPurposeReduces Hallucinations For
------------------------------------------
Web SearchCurrent informationFacts, news, prices
CalculatorMath operationsArithmetic, formulas
Code InterpreterExecute codeComplex calculations
DatabaseStructured queriesCompany data, statistics
Knowledge BaseVerified factsDomain-specific info

Training Tool Use

Models learn tool patterns through examples:

User: What's 17.5% of 4,382?
Assistant: <CALC>4382 * 0.175</CALC>
System: 766.85
Assistant: 17.5% of 4,382 is 766.85.

For KeenDreams: This pattern directly applies - KeenDreams can be a "memory tool" that the LLM learns to query when project context is needed.


4.5 Mitigation Strategy 3: Context Injection

Retrieval-Augmented Generation (RAG)

Provide relevant information in the context window:

┌─────────────────────────────────────────────────────────────┐
│                    RAG PATTERN                               │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. User Query: "What are the project requirements?"         │
│                         ↓                                    │
│  2. Retrieval: Search knowledge base for relevant docs       │
│                         ↓                                    │
│  3. Context Construction:                                    │
│     ┌─────────────────────────────────────────────────────┐ │
│     │ System: You are a helpful assistant. Use the        │ │
│     │ following documents to answer questions.            │ │
│     │                                                     │ │
│     │ [Document 1: Requirements.md content...]            │ │
│     │ [Document 2: Specs.md content...]                   │ │
│     │                                                     │ │
│     │ User: What are the project requirements?            │ │
│     └─────────────────────────────────────────────────────┘ │
│                         ↓                                    │
│  4. Model generates answer from provided context             │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Why Context > Parameters

AspectParametricContextual
--------------------------------
AccuracyApproximateExact
UpdatabilityRequires retrainingInstant
VerifiabilityOpaqueTraceable
ReliabilityVariableHigh

Key Insight

"Pasting information directly into context windows produces higher-quality outputs than relying on parametric knowledge."

For KeenDreams: This is the core value proposition - acting as the retrieval layer that surfaces relevant project memories into context, dramatically reducing hallucinations about project state.


4.6 Mitigation Strategy 4: Verification Chains

Multi-Step Verification

Have models check their own work:

User: Summarize this research paper.

Model (Step 1): [Initial summary]

Model (Step 2 - Verification):
Let me verify my summary against the source:
- Claim 1: "The study found X" ✓ (Page 3, paragraph 2)
- Claim 2: "Results showed Y" ✓ (Table 2)
- Claim 3: "The authors concluded Z" ⚠️ (I should reread the conclusion)

Model (Step 3): [Corrected summary with citations]

Cross-Model Verification

Use multiple models to check each other:

# Generate with Model A
response_a = model_a.generate(prompt)

# Verify with Model B
verification = model_b.verify(
    claim=response_a,
    context=original_documents
)

# Return verified response or flag for human review

4.7 Practical Hallucination Detection

Red Flags

Watch for these patterns that often indicate hallucinations:

PatternExampleRisk Level
------------------------------
Specific numbers without source"Studies show 73.2% of..."High
Named citations"According to Smith et al. (2019)..."High
Detailed URLs"Available at example.com/specific/path"Very High
Confident edge casesObscure historical detailsHigh
Technical specificsExact code library versionsMedium

Verification Techniques

  1. Ask for sources: "What source supports this claim?"
  2. Cross-reference: Search for claimed facts independently
  3. Challenge specifics: "Are you certain about that number?"
  4. Request uncertainty: "Rate your confidence 1-10"

4.8 Key Takeaways

Summary

StrategyApproachEffectiveness
-----------------------------------
Uncertainty TrainingTrain model to say "I don't know"Medium
Tool IntegrationExternal search, calculatorsHigh
Context InjectionRAG, document retrievalVery High
Verification ChainsSelf-check, multi-modelHigh

For AI Analytics Platforms

Critical Monitoring Points:

  1. Hallucination Detection: Flag responses with high-risk patterns
  2. Source Attribution: Track whether responses cite provided context
  3. Confidence Scores: Capture model-expressed uncertainty
  4. Tool Usage: Monitor when models invoke external tools
  5. Verification Loops: Log correction chains

For KeenDreams

Applicable Learnings:

  1. Memory as Grounding: KeenDreams provides factual anchor for project context
  2. Source Tracking: Always include provenance with retrieved memories
  3. Confidence Metadata: Store reliability scores with memories
  4. Verification Integration: Enable models to cross-check against cloud brain

Practice Questions

  1. Why do LLMs hallucinate even when they "know" the correct answer?
  2. How does tool integration reduce hallucinations?
  3. When is parametric knowledge acceptable vs. requiring context?
  4. How would you design a hallucination detection system?

Next Module

Module 5: Reinforcement Learning


Timestamps: 1:20:32 - Hallucinations & Tool Use | 1:41:46 - Knowledge of Self | Research on factuality and mitigation strategies