
Will LLMs Get Us To AGI?
LLMs have made tremendous progress in modeling human language. But can they go beyond that to make new discoveries and move the needle on novel scientific progress? We sat down with distinguished Columbia CS professor Vishal Misra to discuss this, plus why chain-of-thought reasoning works so well, and what real AGI would look like.
Table of Contents
๐ง What is AGI and how does it differ from current LLMs?
Artificial General Intelligence Definition
AGI represents a fundamental leap beyond current language models - it's the ability to create genuinely new knowledge rather than just recombining existing information.
Key Distinction:
- Current LLMs: Process and recombine information from training data
- True AGI: Creates entirely new paradigms, theories, and scientific discoveries
The Einstein Example:
Any LLM trained on pre-1915 physics would never have developed the theory of relativity. Einstein had to:
- Reject existing frameworks - Move beyond Newtonian physics
- Create new concepts - Introduce the space-time continuum
- Rewrite fundamental rules - Establish entirely new paradigms
AGI Requirements:
- New science creation - Developing novel theories and discoveries
- Mathematical innovation - Creating new mathematical frameworks
- Paradigm shifts - Moving beyond training data limitations
- Original thinking - Going beyond pattern recognition to genuine creativity
When an AGI system can independently develop something equivalent to the theory of relativity - a breakthrough that fundamentally changes our understanding - that's when we'll have achieved true artificial general intelligence.
๐ฌ How do LLMs reduce complex reality into geometric manifolds?
Dimensional Reduction in AI Reasoning
LLMs perform a remarkable feat of dimensional reduction, transforming the infinite complexity of reality into manageable geometric structures that enable reasoning.
The Process:
- Complex Input: LLMs receive multi-dimensional, heavy-tailed stochastic universe data
- Manifold Creation: Reduce this complexity to geometric manifolds with fewer degrees of freedom
- Predictable Movement: Enable formal specification of reasoning paths within these manifolds
Key Characteristics:
- Reduced state space - Fewer variables to manage
- Geometric structure - Mathematical framework for reasoning
- Predictable paths - Ability to forecast reasoning directions
- Formal specification - Mathematical description of reasoning boundaries
Human Parallel:
Humans appear to use a similar process:
- Take complex, chaotic reality
- Reduce it to manageable geometric manifolds
- Reason by moving along these simplified structures
Practical Implications:
- Confident reasoning - When LLMs stay within manifolds, they produce coherent outputs
- Hallucination risk - Moving outside manifolds leads to confident but incorrect responses
- Information theory foundation - Provides mathematical framework for understanding AI reasoning
This manifold-based approach offers both the power of dimensional reduction and the risk of oversimplification.
๐ฏ How do LLMs create probability distributions for next token prediction?
The Core Mechanism of Language Model Operation
All LLMs, regardless of their sophistication, fundamentally operate through the same basic process of creating probability distributions for predicting the next token in a sequence.
The Basic Process:
- Input Processing: Given a prompt, the LLM analyzes the context
- Distribution Creation: Generates a probability distribution across all possible next tokens
- Token Selection: Uses algorithms to pick from this distribution
- Iteration: Repeats the process for subsequent tokens
Architecture Components:
- Transformer architecture - The underlying neural network structure
- Loss function - Guides the training process
- Post-training techniques - RLHF (Reinforcement Learning from Human Feedback) and other refinements
Manifold Integration:
The training process creates Bayesian manifolds that:
- Represent the reduced dimensional space of possibilities
- Enable confident prediction when reasoning stays within bounds
- Lead to hallucination when the model ventures outside these structures
Confidence vs. Accuracy:
- Within manifolds: Confident and coherent responses
- Outside manifolds: Confident but nonsensical outputs
- The challenge: Models don't always know when they've left reliable territory
This token-by-token prediction process, enhanced by manifold-based reasoning, forms the foundation of all modern language model capabilities.
๐ What is entropy in LLM token prediction and why does it matter?
Understanding Information Entropy in Language Models
Entropy measurement provides crucial insights into LLM confidence and reasoning quality by analyzing the distribution of probability across possible next tokens.
Shannon Entropy Basics:
- Not thermodynamic entropy - Information theory concept
- Vocabulary scope - Measured across entire token vocabulary (e.g., 50,000 tokens)
- Distribution analysis - Examines probability spread across possible choices
Practical Example:
Prompt: "The cat sat on the..."
- High probability tokens: "mat", "hat", "table"
- Low probability tokens: "ship", "whale"
- Entropy level: Depends on how concentrated the probability is
Two Types of Entropy:
High Entropy Distribution:
- Many viable paths - Multiple tokens have significant probability
- Less predictable - Model has many reasonable choices
- Broader possibilities - More creative potential but less certainty
Low Entropy Distribution:
- Few clear choices - Probability concentrated on small set of tokens
- More predictable - Model has strong preference for specific tokens
- Focused output - Higher confidence in specific direction
Strategic Importance:
Understanding entropy helps predict:
- Model confidence levels
- Likelihood of coherent vs. creative responses
- Risk of hallucination or deviation
This entropy analysis forms the foundation for understanding how context and prompting affect LLM behavior.
๐ช How does context richness affect LLM prediction accuracy?
The Information Entropy vs. Prediction Entropy Relationship
LLMs demonstrate optimal performance when processing prompts with high information entropy but low prediction entropy - a counterintuitive but powerful principle.
The Two Types of Prompt Entropy:
High Information Entropy Prompts:
- Rich context - Contain specific, detailed information
- Rare combinations - Less frequently seen in training data
- Distinctive patterns - Unique contextual signatures
Low Information Entropy Prompts:
- Generic context - Common, frequently seen phrases
- Broad possibilities - Many different continuation paths
- Less distinctive - Harder to pin down specific direction
Real-World Example:
Generic Prompt: "I'm going out for dinner"
- High prediction entropy - Many possible continuations
- Paths: "tonight", "to McDonald's", "with friends", etc.
- LLM challenge - Too many viable options
Context-Rich Prompt: "I'm going to dinner with Martin Casado"
- High information entropy - Specific, rare combination
- Low prediction entropy - Constrained possibilities
- Logical constraint: Martin only goes to Michelin star restaurants, not McDonald's
The Optimization Principle:
LLMs perform best when:
- Context is information-rich - Provides specific, detailed constraints
- Prediction space is narrow - Fewer but higher-quality options
- Constraints are logical - Context naturally limits possibilities
This explains why detailed, specific prompts often yield more accurate and useful responses than generic ones.
๐ Summary from [0:00-7:59]
Essential Insights:
- AGI Definition - True artificial general intelligence requires creating new science and paradigms, not just recombining existing knowledge
- Manifold Reasoning - LLMs reduce complex reality into geometric manifolds, enabling predictable reasoning paths but risking hallucination outside these bounds
- Entropy Optimization - LLMs work best with high information entropy prompts that create low prediction entropy outcomes
Actionable Insights:
- Prompt Engineering: Use specific, context-rich prompts to constrain LLM outputs and improve accuracy
- Understanding Limitations: Recognize that current LLMs recombine rather than create truly novel knowledge
- Confidence Assessment: Monitor when models venture outside their training manifolds to identify potential hallucinations
๐ References from [0:00-7:59]
People Mentioned:
- Albert Einstein - Used as example of paradigm-shifting scientific thinking that current LLMs cannot replicate
- Martin Casado - General Partner at a16z, co-host with networking background similar to guest
- Vishal Misra - Columbia University Professor and Vice Dean of Computing, expert in networking and AI reasoning models
- Harvey Baller Christian - Recommended Vishal's MIT talk on understanding LLMs
Companies & Products:
- a16z - Venture capital firm hosting the podcast
- Columbia University - Vishal Misra's academic institution
- MIT - Where Vishal gave his influential talk on LLM understanding
Technologies & Tools:
- Transformers - Neural network architecture underlying modern LLMs
- RLHF (Reinforcement Learning from Human Feedback) - Post-training technique for improving LLM performance
- Bayesian Manifolds - Mathematical structures that LLMs create for reasoning
Concepts & Frameworks:
- Shannon Entropy - Information theory concept for measuring uncertainty in probability distributions
- Token Prediction - Core mechanism of how LLMs generate text
- Manifold Theory - Geometric approach to understanding LLM reasoning patterns
- Information Entropy vs. Prediction Entropy - Framework for understanding optimal prompting strategies
๐ง How does chain-of-thought reasoning reduce prediction entropy in LLMs?
Mathematical Problem-Solving Through Step-by-Step Breakdown
Chain-of-thought reasoning works by transforming high-entropy prediction problems into low-entropy sequential steps that LLMs have been trained on.
The Entropy Reduction Process:
- Initial High Entropy: When asked "What is 769 ร 1025?", the next token distribution is diffuse - you have no clear idea of the answer
- Algorithm Invocation: By breaking it into steps (writing down numbers, following multiplication algorithm), each stage becomes predictable
- Low Entropy Steps: At each stage of the process, you know exactly what to do next because you've learned the algorithm
Why This Works for LLMs:
- Pattern Recognition: LLMs break problems into small steps they've seen during training
- Familiar Concepts: Even with different numbers, the underlying concepts have been trained on
- Sequential Confidence: Once broken down, the model becomes confident in the AโBโCโD progression
- Reduced State Space: Instead of guessing the final answer, the model navigates through known intermediate states
The Core Mechanism:
The fundamental insight is that LLMs can't directly solve complex problems with high prediction entropy, but they excel at following learned algorithms through sequential, low-entropy steps. This explains both the power and limitations of current language models.
๐ What is Vishal Misra's background in networking and cricket entrepreneurship?
From Academic Networking Research to Cricket Analytics Pioneer
Vishal Misra combines deep technical expertise in networking with entrepreneurial success in sports analytics, providing a unique perspective on AI applications.
Academic Foundation:
- PhD and Early Work: Focused on networking research at Columbia University
- Similar Background: Shares networking expertise with Martin Casado
- Current Role: Professor and Vice Dean of Computing at Columbia
Cricket Entrepreneurship Legacy:
- Cricket Info Founder: Co-started the portal in the 1990s
- Massive Scale: At its peak, Cricket Info had more hits than Yahoo globally
- ESPN Acquisition: Sold to ESPN in 2006, demonstrating significant market value
Current Cricket Involvement:
- Team Ownership: Minority owner of the San Francisco Unicorns cricket team
- Continued Passion: Maintains active involvement in cricket analytics and business
The Stats Guru Innovation:
- Complex Database: Built the world's most comprehensive cricket statistics database
- Free Access: Made searchable stats available to fans since 2000
- Technical Challenge: Cricket statistics are exponentially more complex than baseball
- Interface Problem: Required 25 checkboxes, 15 text fields, and 18 dropdowns - creating a daunting user experience
This combination of networking expertise and sports analytics entrepreneurship positioned Misra uniquely to recognize AI's potential for solving complex query interfaces.
๐ How did Vishal Misra accidentally invent RAG while trying to fix cricket statistics?
The Serendipitous Discovery That Preceded ChatGPT by 15 Months
Misra's frustration with a cricket website's terrible interface led to accidentally creating Retrieval-Augmented Generation (RAG) in 2021.
The Original Problem:
- Stats Guru Interface: ESPN's cricket database had an unusable web form with 25 checkboxes, 15 text fields, and 18 dropdowns
- User Experience Nightmare: Only hardcore nerds could navigate the complex interface
- Long-Standing Frustration: The problem had bothered Misra since ESPN acquired Cricket Info in 2006
The GPT-3 Breakthrough Moment:
- July 2020: First version of GPT-3 was released during the pandemic
- Inspiration: Saw someone use GPT-3 to write SQL queries from natural language
- The Question: "Can I use this to fix Stats Guru?"
Technical Challenges Overcome:
- Context Window Limitation: GPT-3 had only 4,048 tokens - couldn't fit complex database schema
- No Instruction Following: Early GPT-3 didn't follow instructions like modern models
- Database Complexity: Stats Guru's backend was too intricate for direct translation
The RAG Solution:
- Example Database: Created 1,500 natural language query examples paired with structured queries
- Domain-Specific Language: Built a DSL that translated to REST calls to Stats Guru
- Similarity Matching: For new queries, found 6-7 most relevant examples from the database
- Prompt Engineering: Used examples as prefix with new query for GPT-3 completion
- High Accuracy: The system worked remarkably well in production
Historical Significance:
- Production Since September 2021: Running 15 months before ChatGPT launched
- Accidental Innovation: Didn't call it "RAG" but created the core concept
- Preceded the Revolution: This was before RAG became widely popular in AI
๐ Summary from [8:06-15:56]
Essential Insights:
- Chain-of-Thought Mechanics - Works by reducing prediction entropy through step-by-step breakdown of problems into familiar algorithmic patterns
- Academic-Entrepreneur Hybrid - Vishal Misra combines Columbia networking research with cricket analytics entrepreneurship, creating unique AI application perspectives
- Accidental RAG Innovation - Frustration with cricket database interfaces led to inventing RAG 15 months before ChatGPT, demonstrating practical AI problem-solving
Actionable Insights:
- LLMs excel at following learned algorithms through sequential steps rather than direct complex problem solving
- Real-world interface problems can drive breakthrough AI innovations when combined with domain expertise
- Early AI adoption requires creative workarounds for technical limitations like context windows and instruction-following capabilities
๐ References from [8:06-15:56]
People Mentioned:
- Martin Casado - General Partner at a16z, shares networking background with Vishal Misra
Companies & Products:
- Cricket Info - Sports portal co-founded by Misra in the 1990s, later acquired by ESPN
- ESPN - Acquired Cricket Info in 2006, maintained the Stats Guru interface
- San Francisco Unicorns - Cricket team where Misra is a minority owner
- Yahoo - Used as comparison point for Cricket Info's massive traffic at its peak
- OpenAI GPT-3 - First version released July 2020, used by Misra to develop RAG solution
Technologies & Tools:
- Stats Guru - Cricket statistics database with complex searchable interface, available since 2000
- RAG (Retrieval-Augmented Generation) - AI technique accidentally invented by Misra in 2021 for natural language database queries
- DSL (Domain-Specific Language) - Created by Misra to translate natural language queries into REST calls
Concepts & Frameworks:
- Chain-of-Thought Reasoning - AI technique that reduces prediction entropy by breaking problems into sequential steps
- Prediction Entropy - Measure of uncertainty in next token prediction, reduced through algorithmic step-by-step processes
๐ How did a cricket problem lead Vishal Misra to develop LLM mathematical models?
Personal Journey into AI Research
The Cricket Problem Origin:
- Initial Challenge - Misra needed to solve a specific cricket-related problem using quick information retrieval
- Unexpected Success - Built a working solution using transformer architecture but had no understanding of why it worked
- Research Motivation - This confusion sparked his journey to develop mathematical models explaining LLM functionality
From Problem to Research:
- Started by staring at transformer architecture diagrams and reading papers
- Couldn't comprehend the underlying mechanisms despite the working solution
- Began developing formal mathematical frameworks to understand the "how" and "why"
- This cricket problem became the catalyst for his entire AI and LLM research trajectory
Research Philosophy:
The experience taught him that practical success doesn't guarantee theoretical understanding - a principle that would guide his formal approach to LLM analysis.
๐ What has most surprised Vishal Misra about LLM development since GPT-3?
The Unprecedented Pace of Progress
Initial State vs. Current Reality:
- GPT-3 Limitations - Started as a "nice parlor trick" requiring complex workarounds for useful applications
- Rapid Evolution - ChatGPT, chain of thought reasoning, instruction following, and GPT-4 created a polished experience
- Workplace Integration - LLMs evolved from novelty tools to essential co-workers and brainstorming partners
Transformation Timeline:
- Early GPT-3: Could write poems and limericks, answered questions with hallucinations
- Current Capabilities: Millions treat these models as intern-level collaborators for diverse work tasks
- Unexpected Scale: The pace of capability emergence far exceeded initial predictions
Personal Impact:
When Misra first worked with GPT-3, he could clearly see its limitations and boundaries. The transformation into today's versatile AI assistants was completely unforeseen in both speed and scope.
๐ฑ Is LLM progress plateauing like iPhone development?
The iPhone Analogy for AI Development
Current Plateau Indicators:
- Cross-Industry Pattern - Not limited to one company or model (OpenAI, Anthropic, Google, open-source models, Mistral)
- Incremental Improvements - LLMs are getting better but haven't crossed into fundamentally different capabilities
- No Breakthrough Advances - Similar to how iPhones improved cameras and memory without revolutionary changes
The iPhone Comparison:
- Early iPhone Era: Constant amazement at new capabilities with each iteration
- Recent iPhone Years: Minor improvements (better cameras, more memory) without fundamental advances
- LLM Current State: Following a similar pattern of refinement rather than revolution
Capability Assessment:
While LLMs have improved significantly, they haven't fundamentally changed what they're capable of doing. The core limitations and boundaries remain consistent across different models and companies.
๐ฌ Why did Vishal Misra choose formal modeling over AGI rhetoric?
The Scientific Approach to Understanding LLMs
The Problem with Existing Approaches:
- Fanciful Rhetoric - Claims about AGI and recursive self-improvement without evidence
- Reductionist Dismissals - Calling LLMs "just stochastic parrots" or "just a database"
- Lack of Precision - No formal frameworks to reason about actual capabilities
Misra's Methodology:
- Formal Model Development - Create mathematical abstractions to understand mechanisms
- Evidence-Based Reasoning - Use formal models to make logical deductions about capabilities
- Practical Applications - Apply models to explain previously mysterious phenomena
Research Impact:
His work provided the first real formal explanation for in-context learning through:
- Matrix abstraction framework
- Mapping in-context learning to Bayesian reasoning
- Explaining why these mechanisms worked when nobody else could
Current Research Direction:
Developing generalized models for the state space of model outputs regarding confidence, creating manifold representations of LLM behavior.
๐ข How does Vishal Misra's matrix model explain LLM token prediction?
The Gigantic Matrix Abstraction
Matrix Structure:
- Rows - Every possible prompt that could be input to the LLM
- Columns - Every token in the LLM's vocabulary (what it can output)
- Values - Probability distribution over vocabulary for each prompt
Scale Visualization:
- GPT-3 Example: 2,000 token context window ร 50,000 vocabulary tokens
- Matrix Size: More rows than atoms across all known galaxies
- Impossibility: Cannot be represented exactly even with trillion parameters
Sparsity Characteristics:
Real-World Constraints:
- Most arbitrary token collections never appear as actual prompts
- Most vocabulary tokens have zero probability for any given prompt
- Example: "The cat sat on the" unlikely followed by numbers or random tokens
Practical Implementation:
- Models train on subset of meaningful rows from training data
- For new prompts, models interpolate using learned patterns
- Even after removing gibberish prompts, matrix remains too large for exact representation
๐ Summary from [16:02-23:59]
Essential Insights:
- Research Origins - Misra's LLM research began from solving a cricket problem, leading to mathematical model development when he couldn't understand why his transformer solution worked
- Development Pace - The speed of LLM evolution from GPT-3's "parlor trick" status to current co-worker capabilities has been the most surprising aspect
- Plateau Pattern - LLM progress appears to be plateauing similar to iPhone development, with incremental improvements rather than fundamental capability breakthroughs
Actionable Insights:
- Formal mathematical modeling provides better understanding than rhetoric about AGI or reductionist dismissals
- The matrix abstraction reveals why LLMs work through sparse representation of prompt-to-token probability distributions
- Current LLM limitations stem from the impossibility of representing the complete matrix of all possible prompts and responses
๐ References from [16:02-23:59]
People Mentioned:
- Vishal Misra - Professor and Vice Dean of Computing at Columbia University, discussing his research journey into LLM mathematical modeling
Companies & Products:
- OpenAI - Mentioned as developer of GPT-3, ChatGPT, and GPT-4 models showing incremental progress
- Anthropic - Referenced as another company showing similar plateau patterns in LLM development
- Google - Cited as example of company with LLMs that haven't fundamentally changed capabilities
- Mistral - Open-source model mentioned as part of the broader LLM plateau trend
Technologies & Tools:
- GPT-3 - Early LLM model described as "parlor trick" requiring workarounds for useful applications
- ChatGPT - Advancement over GPT-3 that improved usability and capabilities
- GPT-4 - Polished version that made LLMs more practical for widespread use
- Transformer Architecture - Core technology that Misra studied to understand LLM functionality
- Chain of Thought Reasoning - Advanced capability that emerged during rapid LLM development phase
Concepts & Frameworks:
- Matrix Abstraction - Misra's mathematical model representing prompts as rows and vocabulary as columns with probability distributions
- In-Context Learning - LLM capability that Misra mapped to Bayesian reasoning in his formal analysis
- Bayesian Reasoning - Mathematical framework used to explain how in-context learning functions
- Sparsity - Key characteristic of the matrix model where most values are zero due to practical constraints
๐ง How do LLMs generate responses to prompts they've never seen before?
Bayesian Inference and Matrix Compression
LLMs operate through a sophisticated process that goes beyond simple stochastic parroting. When encountering new prompts, they use Bayesian inference on compressed training data to generate contextually appropriate responses.
The Core Mechanism:
- Matrix Representation - LLMs store training data as a compressed matrix of token relationships
- Bayesian Posterior - New prompts serve as evidence to compute updated probability distributions
- Interpolation Process - Models interpolate from training data using prompt context as new evidence
Real-World Example:
When processing "I'm going out for dinner with Martin tonight," the model:
- Recognizes Pattern Variants - Has seen similar phrases in training data
- Uses Context as Evidence - "Martin" becomes evidence for the Bayesian posterior
- Generates Focused Distribution - Produces next tokens focusing on likely dinner locations
Technical Implementation:
- Compressed Storage - Training data represented efficiently in matrix form
- Universal Response - Same mechanism handles all prompts regardless of content
- Evidence Integration - Prompt context directly impacts posterior distribution calculations
The model essentially treats every new prompt as evidence to update its understanding, allowing it to respond meaningfully to novel combinations of familiar concepts.
๐ง How can LLMs learn custom DSLs without any training?
In-Context Learning Through Few-Shot Examples
LLMs demonstrate remarkable ability to learn entirely new Domain Specific Languages (DSLs) through in-context learning, using the same underlying mechanism as regular text generation.
The Cricket DSL Experiment:
Professor Misra created a custom DSL that:
- Mapped Natural Language - Converted cricket queries to structured DSL format
- Translated to APIs - DSL could convert to SQL queries or REST API calls
- Never Seen Before - OpenAI had no access to this custom DSL design
Learning Process:
- Few Examples Provided - Only a handful of natural language to DSL mappings shown
- Immediate Understanding - Model learned the pattern instantly
- No Special Instructions - No explicit "this is few-shot learning" guidance given
Key Insight - Unified Mechanism:
The same process handles both:
- In-Context Learning - Learning from examples in the prompt
- Regular Generation - Standard text continuation tasks
- No Distinction Made - LLM processes both scenarios identically
Technical Implications:
- Pattern Recognition - Models identify structural similarities to known DSLs
- Evidence-Based Learning - Examples serve as evidence for Bayesian inference
- Universal Processing - Same inferencing mechanism regardless of task type
This demonstrates that in-context learning isn't a separate capability but rather the natural result of the model's core Bayesian inference process.
๐ Why can't LLMs achieve recursive self-improvement?
The Inductive Closure Limitation
LLMs face fundamental mathematical constraints that prevent true recursive self-improvement, even when multiple models interact with each other.
The Core Principle:
Inductive Closure - LLM output represents the inductive closure of its training data, meaning it can only generate variations and combinations of what it has already learned.
Two Scenarios Analyzed:
- Single Model Feedback - Feeding LLM output back as input produces no improvement
- Multiple Model Interaction - Even with n-number of LLMs talking to each other, no new information entropy is gained
Matrix Model Explanation:
- Subset Representation - Models represent only a subset of possible knowledge rows
- Limited Extrapolation - Some missing rows can be filled through algorithmic unrolling
- Boundary Constraints - Beyond certain points, models cannot generate truly novel knowledge
Self-Improvement Limitations:
What LLMs Can Do:
- Algorithmic Unrolling - Execute embedded step-by-step processes (like multiplication)
- Pattern Completion - Fill in missing information using learned algorithms
- Limited Expansion - Improve within the bounds of existing knowledge
What They Cannot Do:
- Generate New Paradigms - Cannot create fundamentally new theoretical frameworks
- Reject Training Assumptions - Cannot question or override foundational training concepts
- True Discovery - Cannot make genuine scientific breakthroughs requiring paradigm shifts
The mathematical structure of LLMs inherently prevents them from transcending their training data boundaries, regardless of architectural complexity or multi-model interactions.
๐ What would true AGI require that LLMs cannot provide?
Paradigm-Breaking Discovery vs. Pattern Recognition
True Artificial General Intelligence requires the ability to generate fundamentally new knowledge by rejecting existing paradigmsโsomething current LLMs cannot achieve due to their training-bound nature.
Historical Examples of True Discovery:
Einstein's Relativity Theory:
- Paradigm Rejection - Had to completely reject Newtonian physics
- Novel Framework - Introduced space-time continuum concept
- Rule Rewriting - Fundamentally rewrote the rules of physics
- Training Limitation - No pre-1915 physics LLM could have generated this breakthrough
Quantum Mechanics:
- Conceptual Revolution - Introduced wave-particle duality
- Probabilistic Framework - Replaced deterministic models with probabilistic ones
- Energy Quantization - Discovered energy is not continuous but quantized
- Paradigm Shift - Required rejecting classical Newtonian assumptions
Gรถdel's Incompleteness Theorem:
- Axiomatic Transcendence - Had to go outside existing mathematical axioms
- Meta-Mathematical Insight - Proved fundamental limitations of formal systems
- System-Breaking Discovery - Showed inherent incompleteness in mathematical frameworks
AGI Requirements:
- External Knowledge Generation - Ability to discover information not present in training data
- Paradigm Rejection - Capacity to question and override foundational assumptions
- Creative Synthesis - Generate genuinely novel theoretical frameworks
- Meta-Cognitive Reasoning - Think beyond the constraints of learned patterns
Current LLM Limitations:
- Training Boundary - Cannot transcend the knowledge boundaries of training data
- Pattern Dependency - Relies on interpolation and combination of existing patterns
- No Paradigm Breaking - Cannot reject fundamental assumptions from training
True AGI would require systems capable of genuine discovery and paradigm creation, not just sophisticated pattern matching and recombination.
๐ Summary from [24:05-31:54]
Essential Insights:
- Bayesian Inference Mechanism - LLMs use sophisticated Bayesian inference on compressed training data, treating new prompts as evidence to compute updated probability distributions
- Unified Learning Process - In-context learning and regular text generation use identical underlying mechanisms, with no special processing for few-shot examples
- Recursive Self-Improvement Impossibility - Mathematical constraints prevent LLMs from true recursive self-improvement, as they can only generate the inductive closure of their training data
Actionable Insights:
- Prompt Engineering Strategy - Understanding that LLMs treat context as Bayesian evidence can improve prompt design for better results
- Custom DSL Applications - LLMs can learn new domain-specific languages through few-shot examples without additional training
- AGI Development Focus - True AGI requires paradigm-breaking capabilities that current LLMs fundamentally cannot achieve
๐ References from [24:05-31:54]
People Mentioned:
- Albert Einstein - Referenced for developing theory of relativity by rejecting Newtonian physics and creating space-time continuum framework
- Kurt Gรถdel - Mentioned for incompleteness theorem, demonstrating ability to go outside mathematical axioms to prove fundamental limitations
Companies & Products:
- OpenAI - Referenced as the company providing API access for the cricket DSL experiment, with no access to the custom DSL structure
Technologies & Tools:
- SQL - Mentioned as target translation format for the custom cricket DSL
- REST API - Referenced as alternative output format for DSL translation
Concepts & Frameworks:
- Bayesian Inference - Core mechanism explaining how LLMs process new prompts using prior training as evidence
- In-Context Learning - Learning capability demonstrated through few-shot examples without additional training
- Inductive Closure - Mathematical concept describing the limitation that LLM output represents only combinations of training data
- Domain Specific Language (DSL) - Custom programming language created for cricket queries to demonstrate LLM learning capabilities
- Theory of Relativity - Einstein's paradigm-breaking physics theory used as example of true AGI-level discovery
- Quantum Mechanics - Revolutionary physics framework exemplifying knowledge generation that requires rejecting existing paradigms
- Incompleteness Theorem - Gรถdel's mathematical proof used as example of meta-cognitive reasoning beyond training constraints
๐ฌ Can LLMs Create New Scientific Discoveries or Just Connect Existing Knowledge?
Current Limitations of Large Language Models
What LLMs Can Do Well:
- Connect Known Results - They excel at linking existing knowledge in sequences to solve problems
- Refine Existing Solutions - They can improve and fill gaps where answers already exist
- Navigate Known Patterns - They operate within trained data manifolds with low entropy paths
What LLMs Cannot Do:
- Create Fundamental New Science - Cannot generate novel theories or paradigms
- Invent New Mathematical Branches - Limited to using existing axioms and mathematical frameworks
- Generate Original Discoveries - Cannot produce results that go beyond their training data
International Math Olympiad Example:
The recent IMO results demonstrate this limitation perfectly. Whether humans or LLMs solve these problems, they're not inventing new mathematics. Instead, they:
- Connect known mathematical results
- Follow sequences of established steps
- Explore solution paths where entropy collapses
- Use trained knowledge to arrive at answers
The Architectural Challenge:
Current transformer architectures can get better at connecting known dots but struggle with creating new dots. This fundamental limitation suggests we need architectural advances rather than just more data or compute.
๐ง How Does Columbia Professor Vishal Misra Define True AGI?
Beyond Stochastic Parrots: A New Framework
Current LLM Capabilities:
- Sophisticated Reasoning - More advanced than simple stochastic parrots
- Bayesian Navigation - They perform Bayesian reasoning over trained data
- Manifold Exploration - They navigate through known knowledge manifolds
AGI Definition - The Creation Test:
Current Models: Navigate existing manifolds True AGI: Creates entirely new manifolds
The High Bar for AGI:
- Generate New Science - Create novel scientific theories and frameworks
- Develop New Mathematics - Invent new axioms and mathematical branches
- Produce Original Paradigms - Go beyond training data to create unprecedented results
The Einstein Standard:
True AGI would be capable of developing something equivalent to the theory of relativity - a completely new paradigm that fundamentally changes our understanding. This represents the kind of creative leap that current architectures cannot achieve.
Key Distinction:
- Current LLMs: Sophisticated pattern matching and connection
- Future AGI: Genuine creation of new knowledge domains
๐ Why Can't More Data and Compute Alone Achieve AGI Breakthroughs?
The Fundamental Data Scaling Problem
The Manifold Evolution Challenge:
- Existing Data Dominance - LLMs trained on massive datasets create established manifolds
- New Data Gets Absorbed - Additional data gets consumed into existing patterns rather than creating new ones
- Incremental Impact - Small data additions have minimal effect on vast training foundations
The Compute Plateau Effect:
- Diminishing Returns - More compute leads to smoother manifolds, not new ones
- iPhone Analogy - Like iPhone 15, 16, 17 - incremental improvements without revolutionary change
- Architectural Necessity - Fundamental breakthroughs require new architectures, not just scaling
The Multimodal Limitation:
Even giving LLMs "eyes and ears" through multimodal capabilities won't solve the core problem:
- Still Pattern-Based - They remain dependent on pattern recognition from training
- Limited Learning Style - Transformers don't learn like human brains with few examples
- Data Volume Dependency - They require massive datasets unlike human learning
The New Architecture Imperative:
- Beyond Current Limits - Need architectural leaps to create new manifolds
- Not Just Smoothing - More data only smoothens existing manifolds
- Creative Generation - Requires fundamentally different approaches to knowledge creation
๐ What Promising Research Directions Could Lead Beyond LLM Limitations?
Exploring New Architectural Frontiers
Current Position on LLMs:
- Productivity Powerhouses - LLMs will dramatically increase productivity
- Not the Final Answer - They're fantastic but insufficient for AGI
- Stepping Stone Technology - Valuable but require architectural additions
Yann LeCun's Perspective:
- Dead End Theory - LeCun considers LLMs a distraction and dead end
- Moderate Disagreement - Misra sees them as useful but incomplete
- New Architecture Need - Both agree on requiring fundamental changes
Promising Research Directions:
Energy-Based Architectures:
- Yann LeCun's JEPA - Joint Embedding Predictive Architecture showing promise
- Energy-Based Models - Alternative approaches to current transformer limitations
ARC Prize Reverse Engineering:
- Benchmark Analysis - Understanding why LLMs fail on ARC tests
- Architecture Insights - Using failure patterns to design better systems
- Problem-Solution Mapping - Reverse engineering successful architectures from test requirements
Beyond Language-Centric Models:
- Simulation-Based Thinking - Models that perform mental simulations rather than language translation
- Visual-Spatial Processing - Like catching a ball without converting to language
- Approximate Simulation Capabilities - Testing ideas through internal modeling
The Human Learning Advantage:
Human brains learn with very few examples - a capability that transformers fundamentally lack and new architectures must address.
๐ Summary from [32:00-39:54]
Essential Insights:
- LLM Limitation Boundary - Current models excel at connecting existing knowledge but cannot create fundamentally new science or mathematics
- AGI Definition Framework - True AGI requires the ability to create new knowledge manifolds, not just navigate existing ones
- Scaling Plateau Reality - More data and compute alone won't achieve AGI breakthroughs; architectural advances are essential
Actionable Insights:
- Recognize LLMs as productivity tools rather than paths to general intelligence
- Focus research efforts on architectural innovations beyond transformer limitations
- Investigate energy-based models and simulation-capable architectures as promising directions
- Use benchmark failures (like ARC Prize) to reverse-engineer better architectural approaches
๐ References from [32:00-39:54]
People Mentioned:
- Yann LeCun - Meta's Chief AI Scientist who considers LLMs a "dead end" and advocates for energy-based architectures
- Mike Knoop - Co-creator of the ARC Prize benchmark for testing AI reasoning capabilities
- Franรงois Chollet - Creator of Keras and co-developer of the ARC Prize benchmark
Concepts & Frameworks:
- Bayesian Manifold - The mathematical space where LLMs navigate through known patterns and low-entropy paths
- International Math Olympiad (IMO) - Competition used as benchmark for AI mathematical reasoning capabilities
- JEPA (Joint Embedding Predictive Architecture) - Yann LeCun's proposed alternative to transformer architectures
- ARC Prize - Benchmark test designed to measure AI's ability to perform abstract reasoning and pattern recognition
- Energy-Based Models - Alternative AI architectures that use energy functions rather than autoregressive prediction
- Entropy Collapse - The phenomenon where AI models follow low-uncertainty paths in their solution space
๐ง Did humans develop language because of intelligence or intelligence because of language?
The Chicken-and-Egg Problem of Human Cognition
This fundamental question explores whether language emerged as a result of existing intelligence or if developing language actually accelerated our cognitive abilities.
The Evidence Debate:
- Anecdotal Examples: Cases like Guatemalan or Nicaraguan sign language where students developed their own communication systems without formal instruction
- Research Limitations: These examples lack proper controls and could involve unrecorded teaching influences
- Observational Challenges: So few documented cases exist that sloppy observation could explain the findings
The Networking Perspective:
Language definitely accelerated human intelligence through:
- Communication Networks - Enabling information exchange between individuals
- Knowledge Storage - Allowing information to be preserved and transmitted
- Replication Systems - Creating ways to duplicate and spread ideas across populations
Current Scientific Status:
- The causal direction remains unknown and outstanding
- Both directions likely played important roles in human development
- The question represents a classic problem in understanding cognitive evolution
๐ฌ How does the AI community respond to formal modeling approaches?
Reception of Information Theory and Systems Thinking in AI Research
The integration of formal modeling techniques from networking and information theory into AI research faces mixed reception and cultural challenges.
Community Reception:
- Partial Acceptance: Some researchers are receptive to formal modeling approaches
- Review Process Challenges: Large AI conferences have inconsistent and sometimes random reviewing standards
- Cultural Divide: Different methodological backgrounds create communication barriers
The Modeling vs. Empiricism Tension:
- Traditional Approach: Focus on building models first, then conducting experiments
- Current AI Trend: Heavy emphasis on empirical measurement without underlying models
- Review Expectations: Conferences often demand large-scale experiments even for foundational modeling work
Historical Comparison:
- Systems Field Evolution: Started with models, then moved to measurement when systems became complex
- AI Field Pattern: Opposite trajectory - measuring complex systems while trying to develop models afterward
- Current State: Easy artifact creation has led to measurement-first approaches
โ๏ธ Why is "prompt engineering" not real engineering?
The Distinction Between True Engineering and Prompt Manipulation
The term "prompt engineering" misrepresents what actual engineering entails and reflects a fundamental misunderstanding of rigorous technical practice.
Traditional Engineering Standards:
- Reliability Requirements: Achieving 99.999% uptime and dependability
- Mission-Critical Applications: Sending humans to space with predictable outcomes
- Systematic Methodology: Following established principles and proven processes
Prompt Engineering Reality:
- Actually Prompt Twiddling: Random experimentation without systematic approach
- Unpredictable Outcomes: Small changes produce dramatically different results
- Trial-and-Error Method: Fiddling with inputs until desired output appears
Academic Impact:
- Paper Proliferation: Hundreds of papers documenting prompt variations and observations
- Review Burden: Overwhelming reviewers with empirical work lacking theoretical foundation
- Quality Dilution: Focus on experimentation rather than understanding underlying mechanisms
Preference for Modeling:
The approach should prioritize:
- Understanding First: Develop models before extensive experimentation
- Theoretical Foundation: Build conceptual frameworks to guide empirical work
- Systematic Analysis: Apply rigorous methodology rather than random exploration
๐ฏ What benchmarks would prove LLMs are approaching AGI?
Real-World Tasks That Would Signal Genuine Progress
Despite having the most training data and structure in coding, current LLMs still demonstrate fundamental limitations that reveal their distance from AGI.
The Coding Domain Challenge:
- Optimal Conditions: Coding has the most available training data and inherent structure
- Current Reality: Tools like Cursor and Claude continue to hallucinate and generate unreasonable code
- Supervision Required: Constant babysitting remains necessary for all coding tasks
Two Critical Benchmarks:
1. Autonomous Software Development:
- The Test: Creating large software projects without human supervision
- Current Gap: Requires continuous oversight and correction
- Significance: Would demonstrate practical problem-solving at scale
2. Novel Scientific Discovery:
- The Ultimate Test: Generating genuinely new scientific knowledge
- Higher Bar: Creating discoveries that advance human understanding
- True AGI Indicator: Ability to expand beyond existing knowledge boundaries
The Definitional Challenge:
With billions of dollars in funding, models can be trained to excel in any specific domain by collecting targeted data. The real question becomes whether they can transcend their training distribution.
๐ What would prove AGI has truly arrived?
The Manifold Test for Genuine Intelligence
The definitive test for AGI lies in whether systems can transcend their training data to create genuinely new knowledge domains.
The Manifold Framework:
- Current State: LLMs operate within a manifold defined by their training data
- The Test: Producing something completely outside the existing data distribution
- Significance: Creating new manifolds rather than navigating existing ones
The Einstein Standard:
- Historical Examples: Figures like Einstein created entirely new conceptual frameworks
- New Manifold Creation: Developing knowledge that transcends existing understanding
- Beyond Interpolation: Moving past computational steps from known information
Current LLM Limitations:
- Existing Manifold Navigation: Getting better at working within training data boundaries
- Powerful but Limited: Extremely capable within their domain but unable to transcend it
- World-Changing Impact: Will transform many areas while remaining fundamentally constrained
The Counter-Argument:
Perhaps all human intelligence operates within manifolds, and breakthrough discoveries are simply fortunate navigation rather than true transcendence.
The Verdict:
Until LLMs demonstrate the ability to create genuinely new knowledge domainsโnew manifoldsโthey remain sophisticated pattern matching systems rather than truly intelligent agents.
๐ฎ What architectural breakthrough could enable new manifold creation?
Future Research Directions for Transcending Current AI Limitations
The next phase of AI research focuses on identifying the architectural innovations needed to move beyond current training data constraints.
Core Research Question:
What architectural leap is needed to create new manifolds?
Key Research Areas:
- Multimodal Data Integration: Exploring how different data types can expand solution spaces
- Architectural Innovation: Developing new model structures beyond current transformer approaches
- Manifold Expansion: Creating systems that can transcend their training distributions
Practical Implementation:
- Entropy-Based Inference: Following minimum entropy paths to improve model reasoning
- Model Development: Building and training systems based on entropic path principles
- Incremental Progress: Taking systematic steps toward architectural breakthroughs
Current Tools and Validation:
The Token Probe software demonstrates these principles in action:
- Shows entropy reduction during in-context learning
- Visualizes confidence building with each new example
- Provides real-time validation of the underlying model
- Available for public testing and exploration
This research represents the bridge between current LLM capabilities and potential AGI development.
๐ Summary from [40:03-50:28]
Essential Insights:
- Language-Intelligence Paradox - The causal relationship between language development and intelligence remains an unresolved scientific question with limited empirical evidence
- AI Community Methodology Gap - Formal modeling approaches face resistance in an empirically-driven field that prioritizes measurement over theoretical understanding
- AGI Benchmarks - True artificial general intelligence would require creating autonomous software projects and generating novel scientific discoveries, not just improving existing capabilities
Actionable Insights:
- Prompt Engineering Critique: Recognize that current "prompt engineering" is actually prompt manipulation without the rigor of true engineering principles
- Manifold Test for AGI: Evaluate AI progress by whether systems can transcend their training data to create genuinely new knowledge domains
- Research Direction: Focus on architectural breakthroughs and multimodal data integration to enable manifold expansion beyond current limitations
๐ References from [40:03-50:28]
People Mentioned:
- Einstein - Used as the standard for creating new manifolds of knowledge and breakthrough scientific discoveries
Companies & Products:
- Cursor - AI-powered code editor mentioned as an example of current LLM limitations in coding tasks
- Claude - Anthropic's AI assistant referenced for its coding capabilities and continued hallucination issues
- a16z - Venture capital firm providing server infrastructure for the Token Probe software
Technologies & Tools:
- Token Probe - Software tool developed to visualize entropy reduction and confidence building in LLM inference, running on a16z servers for public testing
Concepts & Frameworks:
- Manifold Theory - Mathematical framework used to understand AI training data boundaries and the potential for creating new knowledge domains
- Information Theory - Theoretical foundation applied to modeling LLM behavior and understanding their limitations
- In-Context Learning - Learning paradigm where models improve performance through examples within the same conversation
- Entropy Path - Minimum entropy approach to improving LLM inference and reasoning capabilities
- Nicaraguan Sign Language - Spontaneous language development case study examining the relationship between intelligence and language emergence