undefined - Scaling and the Road to Human-Level AI | Anthropic Co-founder Jared Kaplan

Scaling and the Road to Human-Level AI | Anthropic Co-founder Jared Kaplan

Jared Kaplan on June 16th, 2025 at AI Startup School in San Francisco.Jared Kaplan started out as a theoretical physicist chasing questions about the universe. Then he helped uncover one of AIโ€™s most surprising truths: that intelligence scales in a predictable, almost physical way.That insight became foundational to the modern era of large language modelsโ€”and led him to co-found Anthropic.

โ€ขJuly 29, 2025โ€ข40:47

Table of Contents

0:00-8:18
8:25-15:46
15:53-21:13
21:18-29:45
29:52-35:21
35:27-40:43

๐Ÿš€ How Does a Theoretical Physicist End Up Co-founding an AI Company?

From Faster-Than-Light Dreams to AI Reality

Jared Kaplan's journey to AI wasn't conventional. Starting as a theoretical physicist with dreams inspired by his science fiction writer mother, he spent the vast majority of his career in academia before making a dramatic pivot to artificial intelligence.

The Physics Foundation:

  1. Childhood Inspiration - Mother was a science fiction writer, sparking dreams of building faster-than-light drives
  2. Core Motivation - Deep fascination with understanding the universe's fundamental workings
  3. Big Questions - Is the universe deterministic? Do we have free will? How do the biggest trends underlying everything emerge?

The Academic Journey:

  • Diverse Specializations: Large hadron collider physics, particle physics, cosmology, string theory
  • Growing Frustration: Progress felt too slow, becoming bored with the pace of discovery
  • Key Connections: Met future Anthropic co-founders during his physics career

The AI Skepticism and Conversion:

Jared Kaplan
I didn't believe them. I was really skeptical. I thought, well, AI, people have been working on it for 50 years. SVMs aren't that exciting.
Jared KaplanAnthropicAnthropic | Co-founder
  • Initial Dismissal: Thought AI was overhyped based on 2005-2009 knowledge of SVMs
  • Friend Pressure: Physics colleagues kept insisting AI was becoming "a really big deal"
  • Lucky Break: Knew the right people at the right time to make the transition

Timestamp: [0:00-2:16]Youtube Icon

๐Ÿง  What Are the Two Secret Ingredients That Make Modern AI Work?

The Foundation of ChatGPT, Claude, and All Contemporary AI Models

Modern AI success comes down to a surprisingly simple two-phase training process that transforms raw computational power into intelligent behavior.

Phase 1: Pre-training - Learning the Patterns

What it does: Trains AI models to imitate human-written text and understand underlying correlations in data

The Process:

  • Models learn what words are likely to follow other words
  • Training on massive corpora of text (now multimodal data)
  • Understanding statistical patterns in human communication
  • Building foundational knowledge about language and concepts

Phase 2: Reinforcement Learning - Learning to Be Helpful

What it does: Optimizes models to perform useful tasks through human feedback

The Process:

  1. Conversation Interface - Early Claude versions had simple chat interfaces
  2. Human Preference Collection - Crowdworkers and users pick better responses
  3. Behavior Reinforcement - Reward helpful, honest, and harmless behaviors
  4. Behavior Discouragement - Penalize problematic or unhelpful responses

The Elegant Simplicity:

Jared Kaplan
Really all there is to training these models is learning to predict the next word and then doing reinforcement learning to learn to do useful tasks.
Jared KaplanAnthropicAnthropic | Co-founder

Timestamp: [2:16-4:22]Youtube Icon

๐Ÿ“ˆ What Happens When a Physicist Asks the "Dumbest Possible Question" About AI?

The Discovery That Changed Everything About AI Development

Sometimes the most profound discoveries come from asking embarrassingly simple questions. Kaplan's physicist training led him to uncover one of AI's most important secrets.

The "Dumb" Questions That Sparked Discovery:

  1. About Big Data - "How big should the data be? How important is it? How much does it help?"
  2. About Model Size - "How much better do these models perform [when they're larger]?"
  3. The Physicist Approach - "As a physicist, that's what you're trained to do. You sort of look at the big picture and you ask really dumb things."

The Shocking Discovery - Scaling Laws:

What They Found: AI performance follows precise, predictable mathematical relationships

  • Performance improves systematically as you increase compute, data, and model size
  • The relationships are "as precise as anything that you see in physics or astronomy"
  • Trends hold across many orders of magnitude

The Revolutionary Implications:

Jared Kaplan
This really blew us away that there are these nice trends that are as precise as anything that you see in physics or astronomy. And these gave us a lot of conviction to believe that AI was just going to keep getting smarter and smarter in a very predictable way.
Jared KaplanAnthropicAnthropic | Co-founder

Why This Mattered:

  • Predictability - Could forecast AI improvements with scientific precision
  • Confidence - Evidence the trend would continue for many orders of magnitude
  • Investment Justification - Clear ROI on scaling up compute and data

Timestamp: [4:22-6:09]Youtube Icon

๐ŸŽฏ How Did a Solo Researcher with One GPU Prove Scaling Works Beyond Language?

The Underrated Discovery That Connected Chess Ratings to AI Progress

While everyone focused on language models, a lone researcher made a crucial discovery that proved scaling laws work across different types of AI training - using nothing but a single GPU and a simple board game.

The Researcher and Setup:

  • Who: Andy Jones, working independently about four years ago
  • Resources: Just his own single GPU (in the "ancient days" of limited compute)
  • Challenge: Couldn't study expensive AlphaGo, so chose simpler game called Hex
  • Goal: Test if scaling laws applied to reinforcement learning

The Breakthrough Discovery:

ELO Scores Applied to AI: Used chess rating system to measure AI model performance

  • ELO scores measure likelihood of one player beating another
  • Now used to benchmark how often humans prefer one AI model over another
  • Back then, just classic chess rating application

The Results:

  • Studied different models training to play Hex (simpler than Go)
  • Found "remarkable straight lines" in performance scaling
  • Clear evidence that RL (reinforcement learning) follows scaling laws too

Why This Was Overlooked:

Jared Kaplan
I think it went unnoticed. I think people didn't focus on this sort of scaling behavior in RL soon enough, but eventually it came to pass.
Jared KaplanAnthropicAnthropic | Co-founder

The Unified Theory:

Both Phases Scale: You can scale up compute in both pre-training AND reinforcement learning for predictable improvements

Timestamp: [6:09-8:01]Youtube Icon

โš™๏ธ What's Really Driving AI Progress - Genius or Just Good Engineering?

The Surprising Truth About Why AI Is Getting Better So Fast

The explosive progress in AI isn't what most people think. It's not about sudden breakthroughs in intelligence or researchers getting smarter - it's about something much more systematic and predictable.

The Real Driver of Progress:

Jared Kaplan
It's not that AI researchers are really smart or they suddenly got smart. It's that we found a very simple way of making AI better systematically and we're turning that crank.
Jared KaplanAnthropicAnthropic | Co-founder

The Systematic Approach:

Scaling Both Phases:

  1. Pre-training Compute - More computational power for initial training
  2. Reinforcement Learning Compute - More resources for human feedback optimization
  3. Predictable Results - Better and better performance following mathematical laws

Why This Approach Works:

  • Simplicity - No need for complex algorithmic breakthroughs
  • Reliability - Performance improvements are predictable and consistent
  • Scalability - Can continue "turning the crank" for continued progress
  • Evidence-Based - Proven across many orders of magnitude

The Implications:

Systematic Progress: Rather than waiting for genius insights, AI advancement becomes an engineering and resource allocation problem

Future Confidence: If scaling laws hold, continued investment in compute will yield continued improvements

Timestamp: [8:01-8:18]Youtube Icon

๐Ÿ’Ž Key Insights from [0:00-8:18]

Essential Insights:

  1. Cross-Disciplinary Advantage - Physics training provided unique perspective to ask "dumb" fundamental questions that revealed AI's scaling laws
  2. Two-Phase Training Foundation - All modern AI success reduces to next-word prediction plus reinforcement learning from human feedback
  3. Scaling Law Discovery - AI performance follows precise mathematical relationships as predictable as physical laws, giving confidence in continued progress

Actionable Insights:

  • Investment Strategy: Scaling laws provide scientific basis for predicting AI development ROI and resource allocation
  • Research Approach: Sometimes the most obvious questions yield the most profound discoveries - embrace beginner's mind
  • Industry Understanding: AI progress is now systematically engineerable rather than dependent on unpredictable breakthroughs

Timestamp: [0:00-8:18]Youtube Icon

๐Ÿ“š References from [0:00-8:18]

People Mentioned:

  • Andy Jones (Google Scholar) - AI researcher at Anthropic who discovered scaling laws for reinforcement learning using a single GPU and the game Hex, author of "Scaling Scaling Laws with Board Games"
  • Jared Kaplan's Mother - Science fiction writer who inspired his initial interest in physics and faster-than-light travel

Companies & Products:

  • Anthropic - AI safety company co-founded by Kaplan, creator of Claude
  • OpenAI - Creator of ChatGPT and GPT-3, referenced as contemporary AI model
  • Claude - Anthropic's AI assistant, with early versions dating back to 2022
  • GPT-3 - OpenAI's language model that demonstrated scaling law principles

Games & Applications:

  • AlphaGo - DeepMind's Go-playing AI that demonstrated reinforcement learning success
  • Hex - Simple board game used by Andy Jones to study scaling laws in RL
  • Support Vector Machines (SVMs) - Earlier AI technique that Kaplan found unexciting in 2005-2009

Technologies & Tools:

Concepts & Frameworks:

  • Scaling Laws - Mathematical relationships showing predictable AI performance improvements with increased compute, data, and model size
  • Pre-training - First phase of AI training focused on next-word prediction from human text
  • Reinforcement Learning from Human Feedback (RLHF) - Second training phase optimizing for helpful, honest, and harmless behavior
  • Multimodal Data - Modern training data including text, images, and other formats

Timestamp: [0:00-8:18]Youtube Icon

๐Ÿ“Š How Do You Measure AI Progress on Two Critical Dimensions?

The Framework for Understanding Where AI Is Headed

Kaplan presents a compelling two-axis framework for understanding AI capabilities that reveals both current limitations and future potential.

The Y-Axis: Flexibility - Meeting Humans Where We Are

What it measures: The ability of AI to operate across different modalities and contexts

The Spectrum:

  • Bottom: AlphaGo - superhuman at Go but confined to a single domain
  • Current Progress: Large language models handling multiple modalities
  • Missing Pieces: AI models don't have sense of smell yet (but "that's probably coming")
  • Future Goal: AI systems that can handle all human-relevant modalities

The X-Axis: Task Duration - The More Interesting Dimension

What it measures: How long it would take a person to complete tasks that AI can now do

The Scaling Discovery:

  • Task duration capability is "doubling roughly every 7 months"
  • Another systematic scaling trend discovered by organizational research
  • Predictable progression from minutes โ†’ hours โ†’ days โ†’ weeks โ†’ months โ†’ years

The Trajectory Implications:

Jared Kaplan
The increasing intelligence that is being baked into AI by scaling compute for pre-training and RL is leading to predictable useful tasks that the AI models can do, including longer and longer horizon tasks.
Jared KaplanAnthropicAnthropic | Co-founder

Timestamp: [8:25-10:15]Youtube Icon

๐Ÿ”ฎ What Could AI Accomplish in 2027 and Beyond?

From Individual Tasks to Organizational-Level Work

The scaling trends point toward a future where AI doesn't just help with tasks - it could replace entire organizational functions and accelerate scientific progress by decades.

The 2027 Speculation:

Task Duration Expansion: AI models may handle tasks taking not just minutes or hours, but:

  • Days of human work
  • Weeks of complex projects
  • Months of sustained effort
  • Years of organizational initiatives

The Organizational Vision:

Collective AI Power: Millions of AI models working together could:

  • Perform work of entire human organizations
  • Handle tasks requiring whole scientific communities
  • Coordinate complex, multi-year initiatives

The Scientific Acceleration Example:

Jared Kaplan
You can imagine AI systems working together to make the kind of progress that the theoretical physics community makes in say 50 years in a matter of days, weeks, etc.
Jared KaplanAnthropicAnthropic | Co-founder

Why This Works for Science:

  • Math and theoretical physics progress through pure thinking
  • No physical constraints on rapid iteration
  • AI systems can collaborate without human coordination overhead
  • Massive parallelization of intellectual work

The Broader Implications:

Organizational Transformation: AI won't just automate individual jobs, but could fundamentally change how large-scale work gets accomplished across industries and research domains.

Timestamp: [10:15-11:13]Youtube Icon

๐Ÿงฉ What Are the Missing Pieces for Human-Level AI?

The Three Critical Ingredients Still Being Developed

Despite dramatic scaling progress, Kaplan identifies specific capabilities that need development to reach broadly human-level AI.

1. Organizational Knowledge - Beyond the Blank Slate

The Challenge: AI models currently start fresh with each interaction The Solution: Train models to work within specific organizational contexts

What This Means:

  • Understanding company-specific processes, culture, and history
  • Operating with institutional knowledge like long-term employees
  • Contextual awareness of organizational relationships and dynamics
  • Industry-specific expertise and unwritten rules

2. Memory - Tracking Long-Term Progress

The Distinction: Memory differs from general knowledge The Purpose: Essential for extended, complex tasks

Memory Requirements:

  • Track progress on specific, long-duration tasks
  • Build and maintain task-relevant memories
  • Access and utilize accumulated context over time
  • Maintain continuity across work sessions

Current Development:

Jared Kaplan
That's something that we've begun to build into Claude 4 and I think will become increasingly important.
Jared KaplanAnthropicAnthropic | Co-founder

3. Oversight - Handling Nuanced, Fuzzy Tasks

Current Limitation: Easy to train AI on crisp success/failure tasks (code that passes tests, correct math answers) The Challenge: Developing nuanced judgment for subjective tasks

Examples of Fuzzy Tasks:

  • Tell good jokes
  • Write good poems
  • Have good taste in research
  • Make nuanced creative decisions

The Solution: AI models that generate sophisticated reward signals to enable reinforcement learning on subjective tasks

Timestamp: [11:13-13:16]Youtube Icon

๐Ÿ› ๏ธ What Other Capabilities Need Development for Full AI?

The Simpler But Essential Remaining Ingredients

Beyond the three critical missing pieces, Kaplan outlines additional capabilities needed for comprehensive AI systems.

Progressive Complexity Training:

The Pathway: Work systematically up the capability ladder

  1. Text Models - Current foundation (largely solved)
  2. Multimodal Models - Handling images, audio, video alongside text
  3. Robotics - Physical world interaction and manipulation

Domain-Specific Scaling:

Continued Gains Expected: Scaling laws should continue applying as AI expands into:

  • Physical robotics applications
  • Real-world sensory integration
  • Complex multi-modal reasoning tasks
  • Embodied intelligence scenarios

The Scaling Confidence:

Jared Kaplan
I expect that over the next few years, we'll see increasing continued gains from scale when applied to these different domains.
Jared KaplanAnthropicAnthropic | Co-founder

Why These Are "Simpler":

  • Established Patterns: These follow known scaling law principles
  • Technical Challenges: Engineering problems rather than fundamental research questions
  • Resource Requirements: Mainly need more compute and data, not new theoretical breakthroughs

The Integration Challenge:

Moving from individual capabilities to comprehensive AI systems that can seamlessly operate across all these domains simultaneously.

Timestamp: [13:16-13:41]Youtube Icon

๐Ÿš€ Why Should You Build Things That Don't Work Yet?

The Counterintuitive Strategy for AI-Era Product Development

Kaplan's first major recommendation challenges conventional product wisdom: deliberately build products that current AI can't quite handle.

The Core Strategy:

Jared Kaplan
I think it's really a good idea to build things that don't quite work yet. This is probably always a good idea. We always want to have ambition, but I think specifically AI models right now are getting better very quickly.
Jared KaplanAnthropicAnthropic | Co-founder

Why This Works in the AI Era:

Rapid Capability Growth: AI models are improving at unprecedented speed

  • If Claude 4 is "still a little bit too dumb" for your product
  • Claude 5 will likely make that product work and "deliver a lot of value"
  • The gap between "almost works" and "works great" is shrinking rapidly

The Strategic Advantages:

  1. First-Mover Positioning - Ready when AI catches up to your vision
  2. Deep Understanding - Learn the problem space before solutions mature
  3. Competitive Timing - Launch when AI enables your solution
  4. Market Education - Build awareness before the technology is perfect

The Boundary Strategy:

Jared Kaplan
I always recommend experimenting on the boundaries of what AI can do because those boundaries are moving rapidly.
Jared KaplanAnthropicAnthropic | Co-founder

Practical Application:

  • Identify tasks AI almost but not quite handles well
  • Build products assuming next-generation AI capabilities
  • Focus on problems that seem just out of reach today
  • Prepare for rapid capability expansion

The Risk Mitigation:

This isn't reckless speculation - it's informed betting based on predictable scaling laws and systematic improvement trends.

Timestamp: [13:41-14:25]Youtube Icon

๐Ÿ”„ How Can AI Help Solve Its Own Integration Problem?

Using AI to Accelerate AI Adoption

One of the biggest bottlenecks to AI progress isn't capability - it's integration speed. Kaplan proposes a meta-solution: leverage AI itself to solve this challenge.

The Integration Bottleneck:

Jared Kaplan
One of the main bottlenecks for AI is really just that it's developing so quickly that we haven't had time to integrate it into products, companies, and everything else that we do in science.
Jared KaplanAnthropicAnthropic | Co-founder

The Core Problem:

Speed Mismatch: AI capabilities are advancing faster than our ability to:

  • Integrate AI into existing products
  • Adapt company workflows and processes
  • Modify scientific research methodologies
  • Update educational and training systems

The Meta-Solution:

Jared Kaplan
I think leveraging AI for AI integration is going to be very valuable.
Jared KaplanAnthropicAnthropic | Co-founder

Practical Applications of AI-Assisted Integration:

Product Development:

  • AI helping design AI-integrated workflows
  • Automated adaptation of existing systems for AI enhancement
  • AI-generated integration documentation and training materials

Organizational Change:

  • AI analyzing optimal integration points within companies
  • Automated process redesign incorporating AI capabilities
  • AI-driven change management for AI adoption

Technical Implementation:

  • AI writing integration code for AI systems
  • Automated testing and optimization of AI implementations
  • AI-generated APIs and interfaces for easier AI adoption

The Acceleration Effect:

This creates a positive feedback loop where AI capabilities help overcome the primary constraint on AI utilization, potentially dramatically speeding overall AI integration across society.

Timestamp: [14:25-14:55]Youtube Icon

๐ŸŽฏ What's the Next Software Engineering for AI Adoption?

Finding the Next Explosive Growth Opportunity

Software engineering has become the poster child for rapid AI integration, but what domain will experience similar explosive adoption next?

The Software Engineering Success Story:

Jared Kaplan
We're seeing an explosion of AI integration for coding. And there are a lot of reasons why software engineering is a great place for AI.
Jared KaplanAnthropicAnthropic | Co-founder

Why Software Engineering Works So Well for AI:

Natural Fit Characteristics:

  • Clear Success Metrics - Code either works or doesn't
  • Immediate Feedback - Quick testing and iteration cycles
  • Digital Native - No physical world constraints
  • Modular Tasks - Breaking down complex problems into components
  • Rapid Iteration - Fast cycles of improvement and testing

The Big Strategic Question:

Jared Kaplan
But I think the big question is what's next? What beyond software engineering can grow that quickly?
Jared KaplanAnthropicAnthropic | Co-founder

Potential Candidate Domains:

What to Look For:

  • Clear success/failure criteria
  • Digital or easily digitized workflows
  • High-frequency iteration opportunities
  • Modular, decomposable tasks
  • Strong economic incentives for efficiency

The Honest Assessment:

Jared Kaplan
I don't know the answer, of course. But hopefully you guys will figure it out.
Jared KaplanAnthropicAnthropic | Co-founder

The Opportunity:

First-Mover Advantage: The domain that achieves software engineering-level AI integration next could see massive competitive advantages and market creation opportunities.

Strategic Approach: Look for fields with similar structural characteristics to software development but currently underserved by AI solutions.

Timestamp: [14:55-15:28]Youtube Icon

๐Ÿ’Ž Key Insights from [8:25-15:46]

Essential Insights:

  1. Two-Dimensional Progress - AI advancement happens on both flexibility (handling more modalities) and task duration (7-month doubling of time horizon capabilities)
  2. Missing Pieces Are Specific - Human-level AI needs organizational knowledge, memory, and oversight for nuanced tasks - not just raw scaling
  3. Build Ahead Strategy - Deliberately create products that don't quite work yet, as AI capabilities are predictably improving to meet those needs

Actionable Insights:

  • Product Strategy: Focus on boundaries of current AI capability, knowing those boundaries move rapidly
  • Integration Acceleration: Use AI itself to solve AI integration challenges and speed adoption
  • Market Opportunity: Identify the next domain after software engineering for explosive AI adoption growth

Timestamp: [8:25-15:46]Youtube Icon

๐Ÿ“ข Promotional Content & Announcements

Program Announcements:

Y Combinator Applications:

  • Program: YC's next batch now accepting applications
  • Call to Action: Apply to YCombinator
  • Benefits: "It's never too early and filling out the app will level up your idea"
  • Timing: Applications currently open for next batch

Upcoming Content:

Interview Transition:

  • Format Change: Moving from presentation to fireside chat Q&A
  • Participants: Jared Kaplan and Diana Hu (YC General Partner)
  • Focus: Deep dive discussion on scaling laws and AI development

Timestamp: [15:34-15:46]Youtube Icon

๐Ÿ“š References from [8:25-15:46]

People Mentioned:

  • Diana Hu - General Partner at Y Combinator, upcoming interview participant

Companies & Products:

  • Y Combinator - Startup accelerator program with applications currently open
  • Anthropic - Kaplan's company developing Claude 4 with new memory capabilities
  • AlphaGo - DeepMind's Go-playing AI used as example of narrow but superhuman intelligence
  • Claude 4 - Latest Anthropic model beginning to incorporate memory capabilities
  • Claude 5 - Future model referenced as likely improvement over Claude 4

Research & Studies:

  • AI 2027 Report - Study that examined and projected AI task duration capabilities
  • METR Study - Research discovering 7-month doubling trend in AI task duration capabilities
  • METR Study arXiv - Research paper on measuring AI ability to complete long tasks

Technologies & Tools:

  • Large Language Models - AI systems that can handle multiple modalities beyond single-domain applications
  • Multimodal Models - AI systems processing text, images, audio, and video
  • Robotics - Physical world AI applications as next frontier beyond digital domains

Concepts & Frameworks:

  • Task Duration Scaling - AI capability improvement measured by time horizon of completable tasks
  • Organizational Knowledge - AI understanding of company-specific context and institutional knowledge
  • AI Memory Systems - Capability for AI to maintain context and progress across extended tasks
  • Oversight for Fuzzy Tasks - AI ability to handle subjective tasks requiring nuanced judgment
  • AI Integration Bottleneck - Challenge of incorporating AI into existing systems faster than capabilities develop

Timestamp: [8:25-15:46]Youtube Icon

๐Ÿš€ What's Wrong with Being "Too Eager" in AI Development?

How Claude 4 Fixes the Overzealous Assistant Problem

The conversation shifts to Diana's question about Claude 4's impact, revealing a fascinating problem with previous models - they were actually too helpful.

The "Too Eager" Problem with Claude 3.7:

What Users Experienced:

  • Claude 3.7 Sonnet was excellent for coding applications
  • But it became overly enthusiastic about making tests pass
  • Would implement solutions users didn't actually want
  • Added unnecessary "try-except" blocks and workarounds
Jared Kaplan
Sometimes it just really wanted to make your tests pass. And it would do things that you don't really want. There are a lot of try-except blocks and things like that.
Jared KaplanAnthropicAnthropic | Co-founder

Claude 4's Improvements:

Enhanced Agency: Better ability to act as an agent for:

  • Coding applications with improved judgment
  • Search functionality
  • Various other application domains

Better Supervision: Improved oversight capabilities that:

  • Follow user directions more precisely
  • Improve overall code quality
  • Balance helpfulness with user intent

The Modeling Challenge:

Timeline Pressure:

Jared Kaplan
I think that we'll be in trouble if it's 12 months before an even better model comes out.
Jared KaplanAnthropicAnthropic | Co-founder

This reveals the intense competitive pressure in AI development, where 12-month cycles between major improvements could be considered slow.

Timestamp: [15:53-17:09]Youtube Icon

๐Ÿง  How Does Claude 4's Memory System Enable Multi-Session Projects?

Breaking Through Context Window Limitations for Long-Term Work

Claude 4 introduces a game-changing memory system that allows AI to work on complex projects that span far beyond single conversations.

The Memory Innovation:

  • Core Capability: Save and store memories as files or records
  • Strategic Retrieval: Access stored information to continue work across multiple context windows
  • Extended Collaboration: Enables Claude to work on projects that exceed single-session limitations

How It Works:

  1. Memory Storage - Claude can save important information, decisions, and progress as persistent records
  2. Context Bridging - When approaching context window limits, retrieve relevant memories
  3. Continuous Work - Maintain project continuity across "many many many context windows"
  4. File-Based Persistence - Memories stored as accessible files rather than just conversation history

The Unlock Potential:

Jared Kaplan
I think the thing that I'm most excited about is memory unlocking longer and longer horizon tasks. I think that as time goes on we're going to see Claude as a collaborator that can take on larger and larger chunks of work.
Jared KaplanAnthropicAnthropic | Co-founder

Practical Applications:

  • Complex Software Projects - Maintain architecture decisions across development sessions
  • Research Projects - Track findings, hypotheses, and methodologies over time
  • Business Strategy - Remember organizational context and long-term planning decisions
  • Creative Projects - Maintain narrative consistency and character development

The Collaboration Evolution:

Moving from single-interaction assistant to persistent collaborative partner capable of sustained, complex work relationships.

Timestamp: [17:09-18:27]Youtube Icon

๐Ÿ“ˆ Are We Really at the "Hours-Long Task" Stage Already?

Measuring Current AI Capability Against the Scaling Predictions

Diana probes whether Kaplan's theoretical scaling predictions are already manifesting in current AI capabilities, particularly around task duration.

Current Capability Assessment:

  • Software Engineering Focus: Meter's benchmarking reveals AI can now handle tasks taking hours of human time
  • Measurement Approach: Direct comparison of how long various tasks take humans versus AI
  • Imprecise but Real: While the measurement is "very imprecise," the trend is clear

The Scaling Law Manifestation:

Jared Kaplan
I think the picture that scaling laws paint is one of incremental progress. And so I think that what you'll see with Claude is that steadily it gets better in lots of different ways with each release.
Jared KaplanAnthropicAnthropic | Co-founder

The Trajectory Insight:

  • Smooth Progression: Rather than sudden breakthroughs, expect steady, predictable improvements
  • Multi-Dimensional Growth: Each release improves capabilities across various domains simultaneously
  • AGI Pathway: This smooth curve leads toward "human level AI or AGI"

Current State Validation:

Jared Kaplan
Yeah, I think so. I think it's a very imprecise measure, but I think that right now if you look at software engineering tasks, I think Meter literally benchmarked how long it would take people to do various tasks and yeah, I think it's a time scale of hours.
Jared KaplanAnthropicAnthropic | Co-founder

This confirms that the theoretical scaling predictions are already becoming practical reality in specific domains like software engineering.

Timestamp: [18:27-18:55]Youtube Icon

๐Ÿค” What's the Strangest Thing About AI Intelligence Compared to Humans?

The Judgment vs. Generation Gap That Defines AI Collaboration

Kaplan reveals a fundamental difference between human and AI intelligence that explains both AI's limitations and the optimal way to work with it.

The Human Intelligence Pattern:

Clear Separation: Humans often can't perform a task but can judge if it was done correctly Examples:

  • Can't write great poetry but recognize good poetry
  • Can't solve complex math but verify solutions
  • Can't code expertly but spot bugs and issues

The AI Intelligence Pattern:

Jared Kaplan
I think for AI the judgment versus the generative capability is much closer, which means that I think a major role people can play in interacting with AI is as managers to sanity check the work.
Jared KaplanAnthropicAnthropic | Co-founder
  • Compressed Gap: AI's ability to judge and generate are much more aligned
  • Implication: AI that can generate solutions is nearly as good at evaluating them

The Collaboration Model:

  • Human as Manager: People become supervisors and quality controllers
  • AI as Generator: AI produces the actual work output
  • Sanity Check Function: Humans provide oversight for reasonableness and correctness

The Dual Nature Challenge:

Jared Kaplan
I think broadly as people work with AI, I think that the people who are skeptics of AI will say correctly that AI makes lots of stupid mistakes. It can do things that are absolutely brilliant and surprise you, but it can also make basic errors.
Jared KaplanAnthropicAnthropic | Co-founder

This creates a unique collaboration dynamic where AI can be simultaneously brilliant and foolish, requiring human oversight despite high capability.

Timestamp: [18:55-19:41]Youtube Icon

๐Ÿ”„ How Fast Are We Moving from Co-Pilot to Full Automation?

The Rapid Evolution Y Combinator Is Witnessing in Real-Time

Diana shares fascinating insights about how quickly AI product strategies are evolving, based on Y Combinator's unique vantage point across hundreds of startups.

The Co-Pilot Era (Last Year):

  • Customer Support Example: Companies selling AI as assistants requiring human approval
  • Human-in-the-Loop: Final human verification before customer-facing actions
  • Safety-First Approach: Conservative implementation ensuring human oversight

The Spring Batch Transformation:

Diana Hu
One thing that has changed just in the spring batch, I think a lot of the AI models are very capable to do tasks end to end, which is remarkable. Founders are selling now directly replacements of full workflows.
Diana HuY CombinatorYCombinator | General Partner

The New Reality:

  • End-to-End Capability: AI models handling complete workflows without human intervention
  • Full Replacement Strategy: Founders now selling AI as direct substitutes for entire processes
  • Workflow Automation: Moving beyond assistance to complete task ownership

The Speed of Change:

  • Timeline: Massive shift observed just between Y Combinator batches (approximately 6-month cycles)
  • Founder Confidence: Entrepreneurs now comfortable betting on full automation
  • Market Acceptance: Customers willing to adopt end-to-end AI solutions

The Validation:

This real-world evidence from hundreds of startups confirms Kaplan's scaling law predictions are manifesting in practical applications faster than expected.

Timestamp: [19:41-20:19]Youtube Icon

โš–๏ธ What Determines Whether You Need 70% or 99.9% Accuracy?

The Strategic Framework for Choosing AI Implementation Approaches

Kaplan provides a practical framework for understanding when different levels of AI accuracy are acceptable and how this impacts product development strategy.

The Accuracy Spectrum Decision:

Jared Kaplan
There are some tasks where getting it 70% right is good enough and others where you need 99.9% to deploy.
Jared KaplanAnthropicAnthropic | Co-founder

The 70-80% Sweet Spot:

Strategic Advantage:

Jared Kaplan
I think it's probably a lot more fun to build for use cases where 70-80% is good enough because then you can really get to the frontier of what AI is capable of.
Jared KaplanAnthropicAnthropic | Co-founder

Benefits of Lower Accuracy Requirements:

  • Access to cutting-edge AI capabilities
  • Faster time to market
  • More innovative applications
  • Greater competitive differentiation

The 99.9% Necessity Cases:

High-Stakes Applications:

  • Medical diagnoses and treatment recommendations
  • Financial trading and investment decisions
  • Safety-critical system controls
  • Legal document generation and analysis

The Reliability Trajectory:

Continuous Improvement:

Jared Kaplan
I think that we're pushing up the reliability as well. So I think that we will see more and more of these tasks.
Jared KaplanAnthropicAnthropic | Co-founder

The Implementation Strategy:

  • Current Optimal Approach: Human-AI collaboration for advanced tasks
  • Future Evolution: Increasing full automation as reliability improves
  • Strategic Positioning: Build for current accuracy levels while preparing for higher reliability

The Collaboration Timeline:

Jared Kaplan
I think that right now human-AI collaboration is going to be the most interesting place because I think that for the most advanced tasks you're really going to need humans in the loop. But I do think in the longer term there will be more and more tasks that can be fully automated.
Jared KaplanAnthropicAnthropic | Co-founder

This provides a roadmap for when to implement different AI strategies based on accuracy requirements and risk tolerance.

Timestamp: [20:19-21:13]Youtube Icon

๐Ÿ’Ž Key Insights from [15:53-21:13]

Essential Insights:

  1. Memory Revolution - Claude 4's persistent memory system enables multi-session collaboration on complex, long-horizon projects
  2. Intelligence Gap Analysis - AI's judgment and generation capabilities are more aligned than humans', requiring different collaboration models
  3. Rapid Market Evolution - Y Combinator data shows startups moving from co-pilot to full automation strategies within 6-month cycles

Actionable Insights:

  • Accuracy Strategy: Choose 70-80% accuracy applications for cutting-edge AI capabilities vs. 99.9% for high-stakes deployment
  • Collaboration Model: Position humans as managers/supervisors rather than co-workers when working with AI
  • Implementation Timing: Current optimal approach is human-AI collaboration with preparation for increasing full automation

Timestamp: [15:53-21:13]Youtube Icon

๐Ÿ“š References from [15:53-21:13]

People Mentioned:

  • Diana Hu - General Partner at Y Combinator conducting the interview

Companies & Products:

  • Anthropic - Developer of Claude 4 with new memory and supervision capabilities
  • Y Combinator - Startup accelerator observing rapid evolution from co-pilot to full automation strategies
  • Claude 3.7 Sonnet - Previous Anthropic model that was "too eager" in coding applications
  • Claude 4 - Latest model with improved memory, supervision, and agent capabilities
  • Claude 5 - Future model referenced as next improvement iteration

Research & Benchmarking:

  • METR - Organization that benchmarked AI task duration capabilities against human performance using the "50%-task-completion time horizon" metric
  • Y Combinator Spring Batch - Recent cohort showing dramatic shift toward end-to-end AI automation

Technologies & Tools:

  • Context Windows - AI conversation memory limitations that Claude 4's memory system overcomes
  • Memory Storage System - Claude 4's ability to save information as files/records across sessions
  • Agent Capabilities - AI's ability to act autonomously in coding, search, and other applications

Concepts & Frameworks:

  • Human-AI Collaboration Model - Humans as managers providing oversight and sanity checks for AI work
  • Accuracy Requirements Spectrum - 70-80% vs 99.9% accuracy determining implementation strategy
  • Co-pilot to Full Automation Evolution - Rapid transition from human-supervised to fully automated AI workflows
  • Judgment vs Generation Gap - Fundamental difference between human and AI intelligence patterns
  • Scaling Law Manifestation - Theoretical predictions now visible in practical applications

Timestamp: [15:53-21:13]Youtube Icon

๐ŸŒŸ What Does Dario's "Machines of Loving Grace" Vision Look Like in Practice?

From Optimistic Essays to Real-World Human-AI Collaboration

Diana references Dario Amodei's influential essay about AI's potential, prompting Kaplan to share concrete examples of how this vision is already materializing.

Current Reality in Biomedical Research:

Jared Kaplan
I think that we already see some of that happening. So at least when I talk to folks who work in say biomedical research, with the right orchestration I think it's possible to take frontier AI models now and produce interesting valuable insights for say drug discovery.
Jared KaplanAnthropicAnthropic | Co-founder

The Orchestration Key:

Critical Success Factor: "With the right sort of orchestration" - not just raw AI capability, but thoughtful integration and management

Drug Discovery Applications:

  • Frontier AI models already producing valuable insights
  • Real researchers achieving meaningful results
  • Practical applications beyond theoretical potential

The Optimistic Foundation:

  • Dario's Vision: "Machines of Loving Grace" paints an optimistic picture of AI-human collaboration
  • Current Evidence: Early manifestations already visible in high-stakes research domains
  • Implementation Reality: Success depends on skillful orchestration rather than just AI capability

The Bridge to Reality:

This represents the practical manifestation of ambitious AI visions - not just theoretical possibilities, but working applications in critical domains like healthcare and drug development.

Timestamp: [21:18-22:02]Youtube Icon

๐Ÿง  Why Is AI's "Breadth vs. Depth" Advantage Perfect for Scientific Breakthroughs?

How AI's Unique Intelligence Pattern Unlocks New Research Possibilities

Kaplan reveals a fundamental distinction between types of intelligence that explains why AI may excel in certain scientific domains over others.

The Two Types of Intelligence:

Depth Intelligence:

  • Requires intensive focus on single problems
  • Example: Working on one theorem for a decade (Riemann Hypothesis, Fermat's Last Theorem)
  • Traditional strength of human experts

Breadth Intelligence:

  • Requires synthesizing vast amounts of information across domains
  • More common in biology, psychology, history
  • AI's natural advantage due to pre-training

AI's Unique Advantage:

Jared Kaplan
I think that AI models during the pre-training phase kind of imbibe all of human civilization's knowledge. And so I suspect that there's a lot of fruit to be picked in using that feature of AI that it knows much more than any one human expert.
Jared KaplanAnthropicAnthropic | Co-founder

The Cross-Domain Synthesis Opportunity:

  • AI's Superpower: Ability to connect insights across multiple areas of expertise
  • Human Limitation: No single human expert has knowledge spanning all relevant domains
  • Research Application: Eliciting insights that combine biology, chemistry, physics, and other fields simultaneously

The Knowledge Integration Advantage:

Jared Kaplan
I suspect that there's a particular overhang in areas where putting together knowledge that maybe no one human expert would have, where that kind of intelligence is very useful.
Jared KaplanAnthropicAnthropic | Co-founder

Practical Implications:

  • AI can synthesize insights across traditionally separate research silos
  • Breakthrough potential in interdisciplinary research
  • Leveraging AI's comprehensive knowledge base for novel connections

Timestamp: [22:02-23:46]Youtube Icon

๐Ÿ”ฎ How Do You Predict the Unpredictable Future of AI Implementation?

Why Scaling Laws Work for Trends But Fall Short for Details

Kaplan provides a nuanced view of prediction in AI, distinguishing between what can be reliably forecasted and what remains fundamentally uncertain.

What Scaling Laws Can Predict:

  • Reliable Trend Continuation: The overall trajectory of AI capability improvement
  • Macro-Economic Parallels: GDP, economic growth, and other long-term trends provide precedent
  • Capability Progression: General advancement in AI performance and task complexity

What Remains Unpredictable:

Jared Kaplan
In terms of how exactly it will roll out, I really don't know. It's really hard to predict the future.
Jared KaplanAnthropicAnthropic | Co-founder
  • Implementation Details: Specific ways AI will be integrated into society and business
  • Adoption Patterns: Which industries will adopt AI first and how quickly
  • Social Dynamics: How humans will adapt to and interact with AI systems

The Prediction Framework:

Reliable Long-Term Trends:

Jared Kaplan
I think a lot of trends that we see over the long haul I expect will continue. I mean the economy, the GDP, these kinds of trends are really reliable indicators of the future.
Jared KaplanAnthropicAnthropic | Co-founder
  • Uncertain Specifics: The details of implementation, timing, and social adaptation
  • Scientific Approach: Use what can be predicted (scaling laws) while acknowledging uncertainty about specifics

The Intellectual Honesty:

Rather than overconfident predictions, Kaplan demonstrates scientific rigor by clearly distinguishing between what scaling laws can and cannot forecast about the future.

Timestamp: [23:46-24:20]Youtube Icon

๐Ÿ’ผ What Are the Most Promising "Green Field" Opportunities for AI Builders?

Beyond Coding: Identifying the Next Wave of AI Applications

Diana asks about untapped opportunities, and Kaplan identifies specific domains ripe for AI transformation based on clear criteria.

The Green Field Criteria:

Jared Kaplan
In general any place where it requires a lot of skill and it's a task that mostly involves sitting in front of a computer interacting with data.
Jared KaplanAnthropicAnthropic | Co-founder

Specific High-Potential Domains:

Finance:

  • Complex data analysis and pattern recognition
  • Quantitative modeling and risk assessment
  • Algorithmic trading and portfolio management

Excel-Heavy Professionals:

  • Financial analysts and accountants
  • Business analysts and consultants
  • Operations managers and planners

Legal (with caveats):

  • Document review and contract analysis
  • Legal research and case law synthesis
  • BUT: "Maybe law is more regulated, requires more expertise as a stamp of approval"

The Meta-Opportunity:

AI Integration Services:

Jared Kaplan
How do we integrate AI into existing businesses? I think that like when electricity came along, there was some long adoption cycle and the very first simplest ways of using electricity weren't necessarily the best.
Jared KaplanAnthropicAnthropic | Co-founder

The Electricity Analogy:

  • Historical Parallel: Early electricity adoption simply replaced steam engines with electric motors
  • Better Approach: "You wanted to sort of remake the way that factories work"
  • AI Implication: Don't just replace human tasks - reimagine entire workflows and business processes

The Leverage Opportunity:

Jared Kaplan
I think that probably leveraging AI to integrate AI into parts of the economy as quickly as possible. I expect there's just a lot of leverage there.
Jared KaplanAnthropicAnthropic | Co-founder

Timestamp: [24:20-25:56]Youtube Icon

๐Ÿ”ฌ How Does a Physicist's "Dumb Questions" Approach Revolutionize AI Research?

The Power of Precision in Identifying Breakthrough Opportunities

Diana explores how Kaplan's physics training contributed to discovering scaling laws, revealing a methodology that others can apply.

The Physics Mindset:

Jared Kaplan
I think the thing that was useful from a physics point of view is looking for the biggest picture, most macro trends and then trying to make them as precise as possible.
Jared KaplanAnthropicAnthropic | Co-founder

The "Dumb Questions" Method:

  • Real Example: Encountering brilliant AI researchers saying "learning is converging exponentially"

Critical Questions:

  • "Are you sure it's an exponential?"
  • "Could it just be a power law?"
  • "Is it quadratic?"
  • "Like exactly how is this thing converging?"

Why Precision Matters:

Jared Kaplan
I think there was a lot of fruit to be picked and probably still is in trying to make the big trends that you see as precise as possible because that gives you a lot of tools.
Jared KaplanAnthropicAnthropic | Co-founder

The Strategic Value:

  • The Holy Grail: Finding a better slope to the scaling law
  • Competitive Advantage: "As you put in more compute, you're going to get a bigger and bigger advantage over other AI developers"
  • Systematic Progress: Know exactly what it means to improve and how to measure success

The Precision Requirement:

Jared Kaplan
Until you've made precise what the trend is that you see, you don't know exactly what it means to beat it and how much you can beat it by and how to know systematically whether you're achieving that end.
Jared KaplanAnthropicAnthropic | Co-founder

The Transferable Skill:

This approach isn't limited to physics - anyone can apply rigorous questioning to make vague trends precise and actionable.

Timestamp: [25:56-27:47]Youtube Icon

๐Ÿงฎ What Physics Concepts Actually Transfer to AI Research?

From Matrix Limits to Naive Questions: The Real Physics Tools for AI

Diana probes deeper into specific physics techniques, but Kaplan reveals that the most powerful tools are surprisingly fundamental.

The Matrix Mathematics Connection:

Jared Kaplan
Something that you'll observe if you look at AI models is that they're big. Neural networks are big. They have billions now trillions of parameters. That means that they're made out of big matrices.
Jared KaplanAnthropicAnthropic | Co-founder
  • Practical Application: Studying approximations where neural networks are very large
  • Physics Heritage: Well-known approximation techniques from physics and mathematics
  • Current Relevance: Applied to understanding behavior of massive neural networks

The Counter-Intuitive Truth:

Jared Kaplan
I think generally it's really asking very naive dumb questions that gets you very far.
Jared KaplanAnthropicAnthropic | Co-founder

Why "Fancy Techniques" Aren't Needed:

  • AI's Youth: "AI is really in a certain sense only like maybe 10-15 years old in terms of the current incarnation"
  • Fundamental Gaps: "A lot of the most basic questions haven't been answered like questions of interpretability, how AI models really work"
  • Low-Hanging Fruit: More value in basic understanding than advanced mathematical techniques

The Specific Physics Reality:

Jared Kaplan
It wasn't necessarily like literally applying say quantum field theory to AI. I think that's a little bit too specific.
Jared KaplanAnthropicAnthropic | Co-founder

The Research Opportunity:

  • Basic Questions Remain: Fundamental interpretability and understanding challenges
  • New Field Advantage: Incredible opportunity for foundational discoveries
  • Simple Tools Win: Naive questioning more valuable than sophisticated mathematical machinery

Timestamp: [27:47-29:15]Youtube Icon

๐Ÿ” Why Is AI Interpretability More Like Biology Than Physics?

The Advantage AI Has Over Neuroscience in Understanding Intelligence

Kaplan draws fascinating parallels between AI interpretability challenges and biological research, while highlighting AI's unique research advantages.

The Biological Analogy:

Jared Kaplan
I would say that interpretability is a lot more like biology. It's a lot more like neuroscience. So I think those are kind of the tools.
Jared KaplanAnthropicAnthropic | Co-founder
  • Research Approach: Similar to trying to understand brain features and neural networks
  • Complexity Level: More biological investigation than mathematical derivation
  • Methodology: Reverse engineering complex systems rather than deriving from first principles

AI's Massive Research Advantage:

Jared Kaplan
The benefit that you get with AI over neuroscience is that you can really measure everything in AI. You can't measure the activity of every neuron, every synapse in a brain, but you can do that in AI.
Jared KaplanAnthropicAnthropic | Co-founder

The Data Advantage:

  • Complete Observability: Every parameter, activation, and connection is measurable
  • Perfect Monitoring: Can track all neural network activity during training and inference
  • Unlimited Experimentation: Can modify and test AI systems in ways impossible with biological brains

The Research Implications:

  • Much More Data: "There's much much much more data for reverse engineering how AI models work"
  • Better Tools: Complete system access versus limited biological measurement capabilities
  • Faster Progress: Potential for more rapid interpretability breakthroughs than neuroscience

The Methodological Insight:

AI interpretability combines the systematic approach of biology with the complete data access that biological systems can never provide, creating unprecedented opportunities for understanding intelligence.

Timestamp: [29:15-29:45]Youtube Icon

๐Ÿ’Ž Key Insights from [21:18-29:45]

Essential Insights:

  1. Breadth Intelligence Advantage - AI's ability to synthesize knowledge across all human domains creates unique research opportunities that no single human expert could achieve
  2. Physics Methodology Transfer - Asking precise, "dumb" questions about vague trends yields more breakthroughs than applying sophisticated mathematical techniques
  3. AI Research Superiority - Unlike neuroscience, AI interpretability benefits from complete observability of all system components and behaviors

Actionable Insights:

  • Research Strategy: Focus on interdisciplinary problems where AI can synthesize knowledge across multiple expert domains
  • Business Opportunities: Target skill-intensive, computer-based work in finance, Excel-heavy roles, and AI integration services
  • Scientific Approach: Make vague trends precise through rigorous questioning to identify competitive advantages and systematic improvement paths

Timestamp: [21:18-29:45]Youtube Icon

๐Ÿ“š References from [21:18-29:45]

People Mentioned:

  • Dario Amodei - Anthropic CEO and author of "Machines of Loving Grace" essay painting optimistic AI future
  • Diana Hu - Y Combinator General Partner conducting the interview

Publications & Essays:

Mathematical Concepts:

  • Riemann Hypothesis - Famous unsolved mathematical conjecture used as example of depth intelligence
  • Fermat's Last Theorem - Historical mathematical problem requiring decade-long focused work
  • Power Laws vs. Exponentials - Mathematical distinctions Kaplan uses to make AI trends precise
  • Matrix Mathematics - Large matrix approximation techniques from physics applied to neural networks

Technologies & Applications:

  • Drug Discovery - Biomedical research domain where AI is already producing valuable insights
  • Excel Spreadsheets - Business tool representing skill-intensive, computer-based work ripe for AI
  • Neural Network Parameters - AI models now have billions to trillions of parameters forming large matrices

Research Domains:

  • Biomedical Research - Field where AI orchestration is already producing meaningful results
  • Finance - High-potential domain for AI applications in data analysis and modeling
  • Legal Services - Promising but regulated domain requiring expertise approval
  • AI Integration Services - Meta-opportunity helping businesses adopt AI effectively

Concepts & Frameworks:

  • Breadth vs. Depth Intelligence - Distinction between synthesizing across domains vs. deep focus on single problems
  • Scaling Law Precision - Making vague trends mathematically precise to identify improvement opportunities
  • AI Interpretability - Understanding how AI models work, compared to neuroscience methodology
  • Electricity Adoption Analogy - Historical parallel for AI integration requiring workflow redesign rather than simple replacement

Timestamp: [21:18-29:45]Youtube Icon

๐Ÿ“‰ What Would It Take to Convince a Scaling Laws Pioneer That the Curve Is Breaking?

The Contrarian Question That Reveals Deep Conviction About AI Progress

Diana poses a challenging contrarian question about scaling law durability, revealing just how robust these patterns have proven and why Kaplan remains convinced they'll continue.

The Remarkable Track Record:

Diana Hu
They've held for over five orders of magnitude, which is wild.
Diana HuYCombinatorYCombinator | General Partner
  • Scale of Validation: Scaling laws have remained consistent across enormous ranges of compute, data, and model sizes
  • Statistical Significance: Five orders of magnitude represents unprecedented consistency in empirical observations
  • Foundation for Confidence: This track record provides strong basis for continued belief

Kaplan's Diagnostic Approach:

Jared Kaplan
I mostly use scaling laws to diagnose whether AI training is broken or not.
Jared KaplanAnthropicAnthropic | Co-founder

The Default Assumption - Training Problems, Not Law Failures:

Jared Kaplan
I think that my first inclination is to think if scaling laws are failing, it's because we've screwed up AI training in some way.
Jared KaplanAnthropicAnthropic | Co-founder

Potential Issues:

  • Wrong neural network architecture
  • Hidden bottlenecks in training process
  • Precision problems in algorithms
  • Implementation errors rather than fundamental limits

The Experience-Based Conviction:

Jared Kaplan
So many times in my experience over the last 5 years when it seemed like scaling was broken it was because we were doing it wrong.
Jared KaplanAnthropicAnthropic | Co-founder

What It Would Take:

Jared Kaplan
I think it would take a lot to convince me at least that scaling was really no longer working at the level of these empirical laws.
Jared KaplanAnthropicAnthropic | Co-founder
  • High Bar for Evidence: Would require overwhelming proof that the laws themselves, not implementation, are failing
  • Scientific Rigor: Distinguishes between execution problems and fundamental physical limits

Timestamp: [29:52-31:03]Youtube Icon

โšก How Far Down the Precision Ladder Will AI Go When Compute Gets Scarce?

From FP4 to Binary: The Future of Efficient AI Computing

Diana explores the technical challenge of maintaining scaling progress when compute becomes scarce, leading to fascinating insights about AI efficiency and the "back to binary" future.

The Compute Scarcity Challenge:

  • Current Reality: Massive compute requirements to maintain scaling curve progress
  • Future Constraint: Compute will become increasingly scarce and expensive
  • Technical Question: How low can precision go while maintaining performance?

The Current Inefficiency Situation:

Jared Kaplan
Right now AI is really inefficient because there's a lot of value in AI. So there's a lot of value in unlocking the most capable frontier model.
Jared KaplanAnthropicAnthropic | Co-founder

The Dual Focus Strategy:

  • Frontier Capabilities: Priority on unlocking most advanced AI capabilities
  • Efficiency Improvements: Simultaneously making training and inference more efficient
  • Speed of Innovation: Companies like Anthropic moving as quickly as possible on both fronts

Current Efficiency Gains:

Jared Kaplan
Right now we're seeing sort of 3x to 10x gains algorithmically and in scaling up compute and inference efficiency per year.
Jared KaplanAnthropicAnthropic | Co-founder

The "Back to Binary" Joke:

Jared Kaplan
The joke is that we're going to get computers back into binary. So I think that we will see much lower precision as one of the many avenues to make inference more efficient over time.
Jared KaplanAnthropicAnthropic | Co-founder
  • Technical Evolution: From high-precision floating point (FP16, FP32) toward binary representations
  • Efficiency Driver: Lower precision dramatically reduces computational requirements
  • Multiple Avenues: Precision reduction is one of many efficiency improvement strategies

Timestamp: [31:03-32:38]Youtube Icon

๐ŸŒช๏ธ Why Are We "Very Very Very Out of Equilibrium" with AI Development?

Understanding the Chaotic State of Current AI Progress

Kaplan describes the current AI landscape as fundamentally unstable, with implications for how we should think about efficiency, cost, and future development.

The Disequilibrium State:

Jared Kaplan
We're very out of equilibrium with AI development right now. AI is improving very rapidly. Things are changing very rapidly.
Jared KaplanAnthropicAnthropic | Co-founder

Multiple Rapid Changes Simultaneously:

  • Capability Improvements: AI models getting smarter faster than expected
  • Unrealized Potential: Haven't fully exploited current model capabilities
  • New Capabilities: Continuously unlocking additional functionality
  • Implementation Lag: Can't integrate improvements as fast as they emerge

The Equilibrium Question:

Jared Kaplan
I think that what the equilibrium situation looks like where AI isn't changing that quickly, I think is one where AI is extremely inexpensive, but it's hard to know if we're even going to get there.
Jared KaplanAnthropicAnthropic | Co-founder

The Potential Perpetual Acceleration:

Jared Kaplan
AI may just keep getting better so quickly that improvements in intelligence unlock so much more and so we may continue to focus on that rather than say getting precision down to FP2.
Jared KaplanAnthropicAnthropic | Co-founder

The Jevons Paradox Application:

  • Economic Principle: As efficiency increases, demand often increases more than efficiency savings
  • AI Context: Better AI creates more demand for AI capabilities
  • Cost Implications: Instead of cheaper AI, we get more capable (and expensive) AI
Diana Hu
As intelligence becomes better and better people are going to want it more, not that it's driving the cost down, which is this irony.
Diana HuYCombinatorYCombinator | General Partner

Timestamp: [32:38-33:33]Youtube Icon

๐Ÿ’ฐ Will All the Value Stay at the AI Frontier or Spread to Cheaper Models?

The Strategic Question That Could Shape the Entire AI Economy

Kaplan grapples with a fundamental economic question about AI value distribution that has massive implications for businesses and developers.

The Core Strategic Question:

Jared Kaplan
Is all of the value at the frontier or is there a lot of value with cheaper systems that aren't quite as capable?
Jared KaplanAnthropicAnthropic | Co-founder

The Task Complexity Framework:

  • Simple Tasks: Can be handled by cheaper, less capable models
  • Complex Tasks: Require frontier model capabilities for end-to-end completion

The Convenience Factor:

Jared Kaplan
I think it's just much more convenient to be able to use an AI model that can do a very complex task end to end rather than requiring us as humans to orchestrate a much dumber model to break the task down into very small slices and put them together.
Jared KaplanAnthropicAnthropic | Co-founder

The Human Orchestration Challenge:

  • Complex Coordination: Breaking complex tasks into small pieces requires significant human oversight
  • Integration Overhead: Putting small task results together adds complexity and cost
  • End-to-End Value: Single capable model eliminates coordination overhead

Kaplan's Expectation:

Jared Kaplan
I do expect that a lot of the value is going to come from the most capable models, but I might be wrong.
Jared KaplanAnthropicAnthropic | Co-founder

The Uncertainty Factor:

  • Integration Capabilities: Depends on how efficiently humans can leverage less capable AI
  • Market Development: Could change based on AI integration tool sophistication
  • Economic Dynamics: Value distribution may shift as market matures

The Practical Implication:

This suggests frontier AI capabilities command premium pricing while commodity AI applications face cost pressure - a critical consideration for AI business strategy.

Timestamp: [33:33-34:44]Youtube Icon

๐ŸŽฏ How Do You Stay Relevant When AI Models Become "So Awesome"?

Career Advice for Thriving in the Age of Superhuman AI

Diana asks the ultimate career question for young professionals: how to remain valuable when AI capabilities explode beyond current imagination.

The Direct Career Advice:

Jared Kaplan
I think as I mentioned there's a lot of value in understanding how these models work and being able to really efficiently leverage them and integrate them and I think there's a lot of value in building at the frontier.
Jared KaplanAnthropicAnthropic | Co-founder

The Three-Pronged Strategy:

1. Deep Model Understanding:

  • Technical Literacy: Understand how AI models actually function
  • Capability Assessment: Know what models can and cannot do
  • Limitation Awareness: Recognize current constraints and failure modes

2. Efficient Leverage and Integration:

  • Optimization Skills: Maximize AI model effectiveness for specific tasks
  • Integration Expertise: Seamlessly incorporate AI into existing workflows
  • Orchestration Abilities: Coordinate multiple AI capabilities for complex outcomes

3. Frontier Building:

  • Cutting-Edge Development: Work on the most advanced AI applications
  • Innovation Focus: Create novel uses of emerging capabilities
  • Early Adoption: Be among first to exploit new AI model features

The Meta-Skill Implication:

  • Adaptive Learning: The ability to quickly understand and leverage new AI capabilities as they emerge
  • Human-AI Collaboration: Becoming expert at the interface between human judgment and AI execution
  • System Thinking: Understanding how AI fits into larger workflows and business processes

The Positioning Strategy:

Rather than competing with AI, become the person who makes AI most effective - the translator, integrator, and optimizer who maximizes AI value creation.

Timestamp: [34:44-35:21]Youtube Icon

๐Ÿ’Ž Key Insights from [29:52-35:21]

Essential Insights:

  1. Scaling Law Robustness - Five orders of magnitude validation creates extremely high confidence; apparent failures typically indicate implementation problems rather than fundamental limits
  2. AI Development Disequilibrium - Current rapid AI improvement creates chaotic conditions where efficiency may remain secondary to capability advancement
  3. Frontier Value Concentration - Most AI economic value likely concentrates in most capable models due to end-to-end task completion convenience

Actionable Insights:

  • Career Strategy: Focus on understanding AI models deeply, leveraging them efficiently, and building at the frontier rather than competing with AI
  • Business Planning: Expect continued prioritization of capability over efficiency while AI remains in rapid improvement phase
  • Technical Preparation: Prepare for dramatic efficiency improvements (including binary precision) when AI development eventually reaches equilibrium

Timestamp: [29:52-35:21]Youtube Icon

๐Ÿ“š References from [29:52-35:21]

People Mentioned:

  • Diana Hu - Y Combinator General Partner asking probing questions about scaling law durability and career advice

Companies & Technologies:

  • Anthropic - Kaplan's company working on both frontier capabilities and efficiency improvements
  • FP4 and FP2 - Low-precision floating point formats for efficient AI computation
  • Binary Representations - Ultra-low precision computing format representing the efficiency frontier

Technical Concepts:

  • Five Orders of Magnitude - Scale across which scaling laws have remained consistent
  • Neural Network Architecture - System design that could break scaling laws if implemented incorrectly
  • Training Bottlenecks - Hidden constraints that could make scaling laws appear to fail
  • Algorithm Precision - Mathematical accuracy in AI training implementations

Economic Principles:

  • Jevons Paradox - Economic principle where efficiency improvements increase rather than decrease total resource consumption
  • Frontier vs. Commodity Value - Distribution of economic value between cutting-edge and basic AI capabilities
  • 3x to 10x Annual Gains - Current rate of algorithmic and inference efficiency improvements

Career & Strategic Concepts:

  • AI Model Understanding - Deep technical knowledge of how AI systems function
  • Efficient AI Leverage - Skills in maximizing AI model effectiveness for specific applications
  • Frontier Building - Working on the most advanced AI applications and capabilities
  • End-to-End Task Completion - AI's ability to handle complex workflows without human orchestration
  • Human-AI Orchestration - Coordinating less capable AI models to complete complex tasks

Timestamp: [29:52-35:21]Youtube Icon

๐Ÿ“ˆ Why Does Linear AI Progress Suddenly Become Exponential Task Duration?

The Mystery Behind Meter's Surprising Finding on Time Horizon Scaling

An audience member asks a profound question about why scaling laws show linear progress in loss but exponential growth in task duration capabilities - a puzzle that even Kaplan finds intriguing.

The Scaling Paradox:

  • Linear Progress: More exponential compute leads to linear improvements in scaling loss
  • Exponential Jump: But task duration capability shows exponential growth patterns
  • The Question: Why does the relationship change from linear to exponential?

Kaplan's Honest Assessment:

Jared Kaplan
This is a really good question and I don't know. I mean the Meter finding was kind of an empirical finding.
Jared KaplanAnthropicAnthropic | Co-founder

The Self-Correction Theory:

Jared Kaplan
In order to do more and more complex longer horizon tasks, what you really need is some ability to self-correct. You need to be able to identify that you make a plan and then you start executing the plan.
Jared KaplanAnthropicAnthropic | Co-founder

The Plan Reality Check:

Jared Kaplan
Everyone knows that our plans are kind of worthless and we encounter reality. We get things wrong.
Jared KaplanAnthropicAnthropic | Co-founder

The Mistake Detection Mechanism:

  • Core Capability: AI's ability to notice when it's doing something wrong and correct course
  • Information Efficiency: "It doesn't necessarily require a huge change in intelligence to sort of notice one or two more times that you've made a mistake"
  • Horizon Doubling: Each improvement in error detection could double the task horizon length

The Breakthrough Insight:

Jared Kaplan
If you fix your mistake, maybe you on the order of double the horizon length of the task because instead of getting stuck here, you get stuck twice as far out.
Jared KaplanAnthropicAnthropic | Co-founder
  • The Amplification Effect: Small improvements in self-correction create large improvements in task completion capability
  • Exponential Nature: Each mistake fixed extends capability exponentially rather than linearly

The Empirical Focus:

Jared Kaplan
Those are just words. I think the empirical trend is maybe the most interesting thing.
Jared KaplanAnthropicAnthropic | Co-founder

Timestamp: [35:27-37:20]Youtube Icon

๐ŸŽฏ How Do You Train AI for Long-Horizon Tasks Without Perfect Verification?

The Challenge of Scaling Beyond Coding to Complex Real-World Domains

An audience member poses a critical question about training AI for extended tasks in domains where verification signals aren't as clear as coding success/failure.

The Training Philosophy:

"My mental model of neural networks is very simple. If you want them to do something, you train on such data." โ€” Audience Member

The Verification Signal Challenge:

  • Coding Success: Can deploy Claude agents and get clear verification signals from working/broken code
  • Other Domains: Lack clear binary success indicators for complex, long-term tasks
  • The Dilemma: Are we limited to "scaling data labelers to AGI"?

Kaplan's Worst-Case Scenario:

Jared Kaplan
There is some sort of very operationally intensive path where you just build more and more different tasks for AI models to do that are more and more complex, more and more long horizon and you just turn the crank and train with RL on those more complicated tasks.
Jared KaplanAnthropicAnthropic | Co-founder

Why the Worst Case Is Still Viable:

Jared Kaplan
I feel like that's the worst case for AI progress. And I mean given the level of investment in AI and I think the level of value that I think is being created with AI, I think people will do that if necessary.
Jared KaplanAnthropicAnthropic | Co-founder
  • Economic Justification: Massive AI investment makes even operationally intensive approaches economically feasible
  • Value Creation: The potential returns justify extensive human supervision efforts

The Better Solution - AI Supervision:

  • The Vision: AI models trained to oversee and supervise other AI models
  • Granular Feedback: Instead of binary success/failure, provide detailed continuous guidance
  • Efficiency Gain: Avoid waiting years for final task completion to get training signal

The Ridiculous Example:

Jared Kaplan
Did you become a faculty member and get tenure? Will that take six or seven years? Is that like an end-to-end task where at the end you either get tenure or not over seven years? That's ridiculous. That's very inefficient.
Jared KaplanAnthropicAnthropic | Co-founder
  • Better Approach: "You're doing this well, you're doing this poorly" throughout the process
  • Current Implementation: "I think we're already doing this to some extent"

Timestamp: [37:20-39:53]Youtube Icon

๐Ÿค– Are Humans Still Needed to Create AI Training Tasks?

The Meta-Question About AI Creating Its Own Training Data

The final audience question explores whether AI can bootstrap its own training by generating the tasks it learns from - a recursive approach to AI development.

The Training Task Creation Question:

  • The Process: Developing complex tasks for reinforcement learning training
  • The Method: Training AI models on these tasks to improve long-horizon capabilities
  • The Meta-Question: Can AI create its own training tasks?

Kaplan's Current Reality:

Jared Kaplan
So I would say a mix. I mean obviously we're building the tasks as much as possible using AI to generate tasks with code. We do also ask humans to create tasks.
Jared KaplanAnthropicAnthropic | Co-founder

The Hybrid Approach:

  • AI-Generated Tasks: Leveraging AI to automatically create training scenarios, especially with code generation
  • Human-Created Tasks: Still involving humans in task design and creation
  • Mixed Strategy: Combining both approaches for optimal results

The Future Trajectory:

Jared Kaplan
I think that as AI gets better and better, hopefully we're able to leverage AI more and more, but of course the frontier of the difficulty of these tasks also increases.
Jared KaplanAnthropicAnthropic | Co-founder

The Moving Target Challenge:

  • Increasing AI Capability: AI becomes better at generating training tasks
  • Rising Task Complexity: The frontier of difficult tasks also advances
  • Persistent Human Role: Humans remain necessary for the most challenging task design

The Bootstrap Limitation:

  • Self-Improvement Constraint: AI may struggle to create tasks significantly harder than its current capability level
  • Human Innovation: Humans still needed to push beyond current AI task design capabilities
  • Frontier Maintenance: Keeping humans involved ensures continued capability expansion

The Balanced Future:

AI will increasingly handle routine training task generation while humans focus on designing the most challenging and novel scenarios that push AI capabilities forward.

Timestamp: [39:53-40:43]Youtube Icon

๐Ÿ’Ž Key Insights from [35:27-40:43]

Essential Insights:

  1. Self-Correction Amplification - Small improvements in AI's ability to detect and fix mistakes could exponentially extend task completion horizons through reduced failure points
  2. Supervision Efficiency - AI-supervised training with granular feedback throughout long tasks is more efficient than waiting for final binary success/failure signals
  3. Human-AI Task Creation Balance - Current optimal approach mixes AI-generated training tasks with human-designed challenges, with humans remaining essential for frontier task complexity

Actionable Insights:

  • Training Strategy: Focus on developing AI self-correction capabilities as a lever for dramatic task horizon improvements
  • Supervision Design: Implement detailed, continuous feedback systems rather than binary end-state evaluations for complex tasks
  • Development Approach: Plan for hybrid human-AI task creation where AI handles routine generation while humans design frontier challenges

Timestamp: [35:27-40:43]Youtube Icon

๐Ÿ“š References from [35:27-40:43]

People Mentioned:

  • Audience Members - Y Combinator AI Startup School attendees asking technical questions about scaling laws and training approaches

Companies & Products:

  • Claude Agent - Anthropic's AI system used as example for verification signal collection in coding tasks

Research & Measurements:

  • METR Finding - Empirical research showing exponential growth in AI task duration capabilities despite linear scaling loss improvements
  • METR arXiv Paper - Peer-reviewed publication detailing the methodology and results of the study

Technical Concepts:

  • Scaling Loss - Mathematical measure of AI model performance that improves linearly with compute
  • Time Horizon Tasks - Long-duration activities that AI models can complete, measured in hours, days, or weeks
  • Self-Correction Capability - AI's ability to identify mistakes and adjust course during task execution
  • Verification Signals - Feedback mechanisms that indicate whether AI task performance is successful
  • Reinforcement Learning (RL) - Training method using reward signals to improve AI performance on complex tasks

Training Methodologies:

  • AI Supervision - Using AI models to oversee and provide feedback to other AI models during training
  • Task Generation - Creating training scenarios for AI models, potentially using AI itself
  • Granular Feedback - Detailed, continuous guidance rather than binary success/failure signals
  • Long-Horizon Training - Teaching AI to complete tasks spanning extended time periods

Domain Examples:

  • Academic Tenure - Seven-year process used as example of inefficient binary feedback for long-term tasks
  • Code Generation - Domain with clear verification signals making it ideal for AI training and deployment

Timestamp: [35:27-40:43]Youtube Icon