
Scaling and the Road to Human-Level AI | Anthropic Co-founder Jared Kaplan
Jared Kaplan on June 16th, 2025 at AI Startup School in San Francisco.Jared Kaplan started out as a theoretical physicist chasing questions about the universe. Then he helped uncover one of AIโs most surprising truths: that intelligence scales in a predictable, almost physical way.That insight became foundational to the modern era of large language modelsโand led him to co-found Anthropic.
Table of Contents
๐ How Does a Theoretical Physicist End Up Co-founding an AI Company?
From Faster-Than-Light Dreams to AI Reality
Jared Kaplan's journey to AI wasn't conventional. Starting as a theoretical physicist with dreams inspired by his science fiction writer mother, he spent the vast majority of his career in academia before making a dramatic pivot to artificial intelligence.
The Physics Foundation:
- Childhood Inspiration - Mother was a science fiction writer, sparking dreams of building faster-than-light drives
- Core Motivation - Deep fascination with understanding the universe's fundamental workings
- Big Questions - Is the universe deterministic? Do we have free will? How do the biggest trends underlying everything emerge?
The Academic Journey:
- Diverse Specializations: Large hadron collider physics, particle physics, cosmology, string theory
- Growing Frustration: Progress felt too slow, becoming bored with the pace of discovery
- Key Connections: Met future Anthropic co-founders during his physics career
The AI Skepticism and Conversion:


- Initial Dismissal: Thought AI was overhyped based on 2005-2009 knowledge of SVMs
- Friend Pressure: Physics colleagues kept insisting AI was becoming "a really big deal"
- Lucky Break: Knew the right people at the right time to make the transition
๐ง What Are the Two Secret Ingredients That Make Modern AI Work?
The Foundation of ChatGPT, Claude, and All Contemporary AI Models
Modern AI success comes down to a surprisingly simple two-phase training process that transforms raw computational power into intelligent behavior.
Phase 1: Pre-training - Learning the Patterns
What it does: Trains AI models to imitate human-written text and understand underlying correlations in data
The Process:
- Models learn what words are likely to follow other words
- Training on massive corpora of text (now multimodal data)
- Understanding statistical patterns in human communication
- Building foundational knowledge about language and concepts
Phase 2: Reinforcement Learning - Learning to Be Helpful
What it does: Optimizes models to perform useful tasks through human feedback
The Process:
- Conversation Interface - Early Claude versions had simple chat interfaces
- Human Preference Collection - Crowdworkers and users pick better responses
- Behavior Reinforcement - Reward helpful, honest, and harmless behaviors
- Behavior Discouragement - Penalize problematic or unhelpful responses
The Elegant Simplicity:


๐ What Happens When a Physicist Asks the "Dumbest Possible Question" About AI?
The Discovery That Changed Everything About AI Development
Sometimes the most profound discoveries come from asking embarrassingly simple questions. Kaplan's physicist training led him to uncover one of AI's most important secrets.
The "Dumb" Questions That Sparked Discovery:
- About Big Data - "How big should the data be? How important is it? How much does it help?"
- About Model Size - "How much better do these models perform [when they're larger]?"
- The Physicist Approach - "As a physicist, that's what you're trained to do. You sort of look at the big picture and you ask really dumb things."
The Shocking Discovery - Scaling Laws:
What They Found: AI performance follows precise, predictable mathematical relationships
- Performance improves systematically as you increase compute, data, and model size
- The relationships are "as precise as anything that you see in physics or astronomy"
- Trends hold across many orders of magnitude
The Revolutionary Implications:


Why This Mattered:
- Predictability - Could forecast AI improvements with scientific precision
- Confidence - Evidence the trend would continue for many orders of magnitude
- Investment Justification - Clear ROI on scaling up compute and data
๐ฏ How Did a Solo Researcher with One GPU Prove Scaling Works Beyond Language?
The Underrated Discovery That Connected Chess Ratings to AI Progress
While everyone focused on language models, a lone researcher made a crucial discovery that proved scaling laws work across different types of AI training - using nothing but a single GPU and a simple board game.
The Researcher and Setup:
- Who: Andy Jones, working independently about four years ago
- Resources: Just his own single GPU (in the "ancient days" of limited compute)
- Challenge: Couldn't study expensive AlphaGo, so chose simpler game called Hex
- Goal: Test if scaling laws applied to reinforcement learning
The Breakthrough Discovery:
ELO Scores Applied to AI: Used chess rating system to measure AI model performance
- ELO scores measure likelihood of one player beating another
- Now used to benchmark how often humans prefer one AI model over another
- Back then, just classic chess rating application
The Results:
- Studied different models training to play Hex (simpler than Go)
- Found "remarkable straight lines" in performance scaling
- Clear evidence that RL (reinforcement learning) follows scaling laws too
Why This Was Overlooked:


The Unified Theory:
Both Phases Scale: You can scale up compute in both pre-training AND reinforcement learning for predictable improvements
โ๏ธ What's Really Driving AI Progress - Genius or Just Good Engineering?
The Surprising Truth About Why AI Is Getting Better So Fast
The explosive progress in AI isn't what most people think. It's not about sudden breakthroughs in intelligence or researchers getting smarter - it's about something much more systematic and predictable.
The Real Driver of Progress:


The Systematic Approach:
Scaling Both Phases:
- Pre-training Compute - More computational power for initial training
- Reinforcement Learning Compute - More resources for human feedback optimization
- Predictable Results - Better and better performance following mathematical laws
Why This Approach Works:
- Simplicity - No need for complex algorithmic breakthroughs
- Reliability - Performance improvements are predictable and consistent
- Scalability - Can continue "turning the crank" for continued progress
- Evidence-Based - Proven across many orders of magnitude
The Implications:
Systematic Progress: Rather than waiting for genius insights, AI advancement becomes an engineering and resource allocation problem
Future Confidence: If scaling laws hold, continued investment in compute will yield continued improvements
๐ Key Insights from [0:00-8:18]
Essential Insights:
- Cross-Disciplinary Advantage - Physics training provided unique perspective to ask "dumb" fundamental questions that revealed AI's scaling laws
- Two-Phase Training Foundation - All modern AI success reduces to next-word prediction plus reinforcement learning from human feedback
- Scaling Law Discovery - AI performance follows precise mathematical relationships as predictable as physical laws, giving confidence in continued progress
Actionable Insights:
- Investment Strategy: Scaling laws provide scientific basis for predicting AI development ROI and resource allocation
- Research Approach: Sometimes the most obvious questions yield the most profound discoveries - embrace beginner's mind
- Industry Understanding: AI progress is now systematically engineerable rather than dependent on unpredictable breakthroughs
๐ References from [0:00-8:18]
People Mentioned:
- Andy Jones (Google Scholar) - AI researcher at Anthropic who discovered scaling laws for reinforcement learning using a single GPU and the game Hex, author of "Scaling Scaling Laws with Board Games"
- Jared Kaplan's Mother - Science fiction writer who inspired his initial interest in physics and faster-than-light travel
Companies & Products:
- Anthropic - AI safety company co-founded by Kaplan, creator of Claude
- OpenAI - Creator of ChatGPT and GPT-3, referenced as contemporary AI model
- Claude - Anthropic's AI assistant, with early versions dating back to 2022
- GPT-3 - OpenAI's language model that demonstrated scaling law principles
Games & Applications:
- AlphaGo - DeepMind's Go-playing AI that demonstrated reinforcement learning success
- Hex - Simple board game used by Andy Jones to study scaling laws in RL
- Support Vector Machines (SVMs) - Earlier AI technique that Kaplan found unexciting in 2005-2009
Technologies & Tools:
- ELO Scores - Chess rating system now used to benchmark AI model preferences
- Large Hadron Collider - Physics research facility where Kaplan worked before AI
Concepts & Frameworks:
- Scaling Laws - Mathematical relationships showing predictable AI performance improvements with increased compute, data, and model size
- Pre-training - First phase of AI training focused on next-word prediction from human text
- Reinforcement Learning from Human Feedback (RLHF) - Second training phase optimizing for helpful, honest, and harmless behavior
- Multimodal Data - Modern training data including text, images, and other formats
๐ How Do You Measure AI Progress on Two Critical Dimensions?
The Framework for Understanding Where AI Is Headed
Kaplan presents a compelling two-axis framework for understanding AI capabilities that reveals both current limitations and future potential.
The Y-Axis: Flexibility - Meeting Humans Where We Are
What it measures: The ability of AI to operate across different modalities and contexts
The Spectrum:
- Bottom: AlphaGo - superhuman at Go but confined to a single domain
- Current Progress: Large language models handling multiple modalities
- Missing Pieces: AI models don't have sense of smell yet (but "that's probably coming")
- Future Goal: AI systems that can handle all human-relevant modalities
The X-Axis: Task Duration - The More Interesting Dimension
What it measures: How long it would take a person to complete tasks that AI can now do
The Scaling Discovery:
- Task duration capability is "doubling roughly every 7 months"
- Another systematic scaling trend discovered by organizational research
- Predictable progression from minutes โ hours โ days โ weeks โ months โ years
The Trajectory Implications:


๐ฎ What Could AI Accomplish in 2027 and Beyond?
From Individual Tasks to Organizational-Level Work
The scaling trends point toward a future where AI doesn't just help with tasks - it could replace entire organizational functions and accelerate scientific progress by decades.
The 2027 Speculation:
Task Duration Expansion: AI models may handle tasks taking not just minutes or hours, but:
- Days of human work
- Weeks of complex projects
- Months of sustained effort
- Years of organizational initiatives
The Organizational Vision:
Collective AI Power: Millions of AI models working together could:
- Perform work of entire human organizations
- Handle tasks requiring whole scientific communities
- Coordinate complex, multi-year initiatives
The Scientific Acceleration Example:


Why This Works for Science:
- Math and theoretical physics progress through pure thinking
- No physical constraints on rapid iteration
- AI systems can collaborate without human coordination overhead
- Massive parallelization of intellectual work
The Broader Implications:
Organizational Transformation: AI won't just automate individual jobs, but could fundamentally change how large-scale work gets accomplished across industries and research domains.
๐งฉ What Are the Missing Pieces for Human-Level AI?
The Three Critical Ingredients Still Being Developed
Despite dramatic scaling progress, Kaplan identifies specific capabilities that need development to reach broadly human-level AI.
1. Organizational Knowledge - Beyond the Blank Slate
The Challenge: AI models currently start fresh with each interaction The Solution: Train models to work within specific organizational contexts
What This Means:
- Understanding company-specific processes, culture, and history
- Operating with institutional knowledge like long-term employees
- Contextual awareness of organizational relationships and dynamics
- Industry-specific expertise and unwritten rules
2. Memory - Tracking Long-Term Progress
The Distinction: Memory differs from general knowledge The Purpose: Essential for extended, complex tasks
Memory Requirements:
- Track progress on specific, long-duration tasks
- Build and maintain task-relevant memories
- Access and utilize accumulated context over time
- Maintain continuity across work sessions
Current Development:


3. Oversight - Handling Nuanced, Fuzzy Tasks
Current Limitation: Easy to train AI on crisp success/failure tasks (code that passes tests, correct math answers) The Challenge: Developing nuanced judgment for subjective tasks
Examples of Fuzzy Tasks:
- Tell good jokes
- Write good poems
- Have good taste in research
- Make nuanced creative decisions
The Solution: AI models that generate sophisticated reward signals to enable reinforcement learning on subjective tasks
๐ ๏ธ What Other Capabilities Need Development for Full AI?
The Simpler But Essential Remaining Ingredients
Beyond the three critical missing pieces, Kaplan outlines additional capabilities needed for comprehensive AI systems.
Progressive Complexity Training:
The Pathway: Work systematically up the capability ladder
- Text Models - Current foundation (largely solved)
- Multimodal Models - Handling images, audio, video alongside text
- Robotics - Physical world interaction and manipulation
Domain-Specific Scaling:
Continued Gains Expected: Scaling laws should continue applying as AI expands into:
- Physical robotics applications
- Real-world sensory integration
- Complex multi-modal reasoning tasks
- Embodied intelligence scenarios
The Scaling Confidence:


Why These Are "Simpler":
- Established Patterns: These follow known scaling law principles
- Technical Challenges: Engineering problems rather than fundamental research questions
- Resource Requirements: Mainly need more compute and data, not new theoretical breakthroughs
The Integration Challenge:
Moving from individual capabilities to comprehensive AI systems that can seamlessly operate across all these domains simultaneously.
๐ Why Should You Build Things That Don't Work Yet?
The Counterintuitive Strategy for AI-Era Product Development
Kaplan's first major recommendation challenges conventional product wisdom: deliberately build products that current AI can't quite handle.
The Core Strategy:


Why This Works in the AI Era:
Rapid Capability Growth: AI models are improving at unprecedented speed
- If Claude 4 is "still a little bit too dumb" for your product
- Claude 5 will likely make that product work and "deliver a lot of value"
- The gap between "almost works" and "works great" is shrinking rapidly
The Strategic Advantages:
- First-Mover Positioning - Ready when AI catches up to your vision
- Deep Understanding - Learn the problem space before solutions mature
- Competitive Timing - Launch when AI enables your solution
- Market Education - Build awareness before the technology is perfect
The Boundary Strategy:


Practical Application:
- Identify tasks AI almost but not quite handles well
- Build products assuming next-generation AI capabilities
- Focus on problems that seem just out of reach today
- Prepare for rapid capability expansion
The Risk Mitigation:
This isn't reckless speculation - it's informed betting based on predictable scaling laws and systematic improvement trends.
๐ How Can AI Help Solve Its Own Integration Problem?
Using AI to Accelerate AI Adoption
One of the biggest bottlenecks to AI progress isn't capability - it's integration speed. Kaplan proposes a meta-solution: leverage AI itself to solve this challenge.
The Integration Bottleneck:


The Core Problem:
Speed Mismatch: AI capabilities are advancing faster than our ability to:
- Integrate AI into existing products
- Adapt company workflows and processes
- Modify scientific research methodologies
- Update educational and training systems
The Meta-Solution:


Practical Applications of AI-Assisted Integration:
Product Development:
- AI helping design AI-integrated workflows
- Automated adaptation of existing systems for AI enhancement
- AI-generated integration documentation and training materials
Organizational Change:
- AI analyzing optimal integration points within companies
- Automated process redesign incorporating AI capabilities
- AI-driven change management for AI adoption
Technical Implementation:
- AI writing integration code for AI systems
- Automated testing and optimization of AI implementations
- AI-generated APIs and interfaces for easier AI adoption
The Acceleration Effect:
This creates a positive feedback loop where AI capabilities help overcome the primary constraint on AI utilization, potentially dramatically speeding overall AI integration across society.
๐ฏ What's the Next Software Engineering for AI Adoption?
Finding the Next Explosive Growth Opportunity
Software engineering has become the poster child for rapid AI integration, but what domain will experience similar explosive adoption next?
The Software Engineering Success Story:


Why Software Engineering Works So Well for AI:
Natural Fit Characteristics:
- Clear Success Metrics - Code either works or doesn't
- Immediate Feedback - Quick testing and iteration cycles
- Digital Native - No physical world constraints
- Modular Tasks - Breaking down complex problems into components
- Rapid Iteration - Fast cycles of improvement and testing
The Big Strategic Question:


Potential Candidate Domains:
What to Look For:
- Clear success/failure criteria
- Digital or easily digitized workflows
- High-frequency iteration opportunities
- Modular, decomposable tasks
- Strong economic incentives for efficiency
The Honest Assessment:


The Opportunity:
First-Mover Advantage: The domain that achieves software engineering-level AI integration next could see massive competitive advantages and market creation opportunities.
Strategic Approach: Look for fields with similar structural characteristics to software development but currently underserved by AI solutions.
๐ Key Insights from [8:25-15:46]
Essential Insights:
- Two-Dimensional Progress - AI advancement happens on both flexibility (handling more modalities) and task duration (7-month doubling of time horizon capabilities)
- Missing Pieces Are Specific - Human-level AI needs organizational knowledge, memory, and oversight for nuanced tasks - not just raw scaling
- Build Ahead Strategy - Deliberately create products that don't quite work yet, as AI capabilities are predictably improving to meet those needs
Actionable Insights:
- Product Strategy: Focus on boundaries of current AI capability, knowing those boundaries move rapidly
- Integration Acceleration: Use AI itself to solve AI integration challenges and speed adoption
- Market Opportunity: Identify the next domain after software engineering for explosive AI adoption growth
๐ข Promotional Content & Announcements
Program Announcements:
Y Combinator Applications:
- Program: YC's next batch now accepting applications
- Call to Action: Apply to YCombinator
- Benefits: "It's never too early and filling out the app will level up your idea"
- Timing: Applications currently open for next batch
Upcoming Content:
Interview Transition:
- Format Change: Moving from presentation to fireside chat Q&A
- Participants: Jared Kaplan and Diana Hu (YC General Partner)
- Focus: Deep dive discussion on scaling laws and AI development
๐ References from [8:25-15:46]
People Mentioned:
- Diana Hu - General Partner at Y Combinator, upcoming interview participant
Companies & Products:
- Y Combinator - Startup accelerator program with applications currently open
- Anthropic - Kaplan's company developing Claude 4 with new memory capabilities
- AlphaGo - DeepMind's Go-playing AI used as example of narrow but superhuman intelligence
- Claude 4 - Latest Anthropic model beginning to incorporate memory capabilities
- Claude 5 - Future model referenced as likely improvement over Claude 4
Research & Studies:
- AI 2027 Report - Study that examined and projected AI task duration capabilities
- METR Study - Research discovering 7-month doubling trend in AI task duration capabilities
- METR Study arXiv - Research paper on measuring AI ability to complete long tasks
Technologies & Tools:
- Large Language Models - AI systems that can handle multiple modalities beyond single-domain applications
- Multimodal Models - AI systems processing text, images, audio, and video
- Robotics - Physical world AI applications as next frontier beyond digital domains
Concepts & Frameworks:
- Task Duration Scaling - AI capability improvement measured by time horizon of completable tasks
- Organizational Knowledge - AI understanding of company-specific context and institutional knowledge
- AI Memory Systems - Capability for AI to maintain context and progress across extended tasks
- Oversight for Fuzzy Tasks - AI ability to handle subjective tasks requiring nuanced judgment
- AI Integration Bottleneck - Challenge of incorporating AI into existing systems faster than capabilities develop
๐ What's Wrong with Being "Too Eager" in AI Development?
How Claude 4 Fixes the Overzealous Assistant Problem
The conversation shifts to Diana's question about Claude 4's impact, revealing a fascinating problem with previous models - they were actually too helpful.
The "Too Eager" Problem with Claude 3.7:
What Users Experienced:
- Claude 3.7 Sonnet was excellent for coding applications
- But it became overly enthusiastic about making tests pass
- Would implement solutions users didn't actually want
- Added unnecessary "try-except" blocks and workarounds


Claude 4's Improvements:
Enhanced Agency: Better ability to act as an agent for:
- Coding applications with improved judgment
- Search functionality
- Various other application domains
Better Supervision: Improved oversight capabilities that:
- Follow user directions more precisely
- Improve overall code quality
- Balance helpfulness with user intent
The Modeling Challenge:
Timeline Pressure:


This reveals the intense competitive pressure in AI development, where 12-month cycles between major improvements could be considered slow.
๐ง How Does Claude 4's Memory System Enable Multi-Session Projects?
Breaking Through Context Window Limitations for Long-Term Work
Claude 4 introduces a game-changing memory system that allows AI to work on complex projects that span far beyond single conversations.
The Memory Innovation:
- Core Capability: Save and store memories as files or records
- Strategic Retrieval: Access stored information to continue work across multiple context windows
- Extended Collaboration: Enables Claude to work on projects that exceed single-session limitations
How It Works:
- Memory Storage - Claude can save important information, decisions, and progress as persistent records
- Context Bridging - When approaching context window limits, retrieve relevant memories
- Continuous Work - Maintain project continuity across "many many many context windows"
- File-Based Persistence - Memories stored as accessible files rather than just conversation history
The Unlock Potential:


Practical Applications:
- Complex Software Projects - Maintain architecture decisions across development sessions
- Research Projects - Track findings, hypotheses, and methodologies over time
- Business Strategy - Remember organizational context and long-term planning decisions
- Creative Projects - Maintain narrative consistency and character development
The Collaboration Evolution:
Moving from single-interaction assistant to persistent collaborative partner capable of sustained, complex work relationships.
๐ Are We Really at the "Hours-Long Task" Stage Already?
Measuring Current AI Capability Against the Scaling Predictions
Diana probes whether Kaplan's theoretical scaling predictions are already manifesting in current AI capabilities, particularly around task duration.
Current Capability Assessment:
- Software Engineering Focus: Meter's benchmarking reveals AI can now handle tasks taking hours of human time
- Measurement Approach: Direct comparison of how long various tasks take humans versus AI
- Imprecise but Real: While the measurement is "very imprecise," the trend is clear
The Scaling Law Manifestation:


The Trajectory Insight:
- Smooth Progression: Rather than sudden breakthroughs, expect steady, predictable improvements
- Multi-Dimensional Growth: Each release improves capabilities across various domains simultaneously
- AGI Pathway: This smooth curve leads toward "human level AI or AGI"
Current State Validation:


This confirms that the theoretical scaling predictions are already becoming practical reality in specific domains like software engineering.
๐ค What's the Strangest Thing About AI Intelligence Compared to Humans?
The Judgment vs. Generation Gap That Defines AI Collaboration
Kaplan reveals a fundamental difference between human and AI intelligence that explains both AI's limitations and the optimal way to work with it.
The Human Intelligence Pattern:
Clear Separation: Humans often can't perform a task but can judge if it was done correctly Examples:
- Can't write great poetry but recognize good poetry
- Can't solve complex math but verify solutions
- Can't code expertly but spot bugs and issues
The AI Intelligence Pattern:


- Compressed Gap: AI's ability to judge and generate are much more aligned
- Implication: AI that can generate solutions is nearly as good at evaluating them
The Collaboration Model:
- Human as Manager: People become supervisors and quality controllers
- AI as Generator: AI produces the actual work output
- Sanity Check Function: Humans provide oversight for reasonableness and correctness
The Dual Nature Challenge:


This creates a unique collaboration dynamic where AI can be simultaneously brilliant and foolish, requiring human oversight despite high capability.
๐ How Fast Are We Moving from Co-Pilot to Full Automation?
The Rapid Evolution Y Combinator Is Witnessing in Real-Time
Diana shares fascinating insights about how quickly AI product strategies are evolving, based on Y Combinator's unique vantage point across hundreds of startups.
The Co-Pilot Era (Last Year):
- Customer Support Example: Companies selling AI as assistants requiring human approval
- Human-in-the-Loop: Final human verification before customer-facing actions
- Safety-First Approach: Conservative implementation ensuring human oversight
The Spring Batch Transformation:


The New Reality:
- End-to-End Capability: AI models handling complete workflows without human intervention
- Full Replacement Strategy: Founders now selling AI as direct substitutes for entire processes
- Workflow Automation: Moving beyond assistance to complete task ownership
The Speed of Change:
- Timeline: Massive shift observed just between Y Combinator batches (approximately 6-month cycles)
- Founder Confidence: Entrepreneurs now comfortable betting on full automation
- Market Acceptance: Customers willing to adopt end-to-end AI solutions
The Validation:
This real-world evidence from hundreds of startups confirms Kaplan's scaling law predictions are manifesting in practical applications faster than expected.
โ๏ธ What Determines Whether You Need 70% or 99.9% Accuracy?
The Strategic Framework for Choosing AI Implementation Approaches
Kaplan provides a practical framework for understanding when different levels of AI accuracy are acceptable and how this impacts product development strategy.
The Accuracy Spectrum Decision:


The 70-80% Sweet Spot:
Strategic Advantage:


Benefits of Lower Accuracy Requirements:
- Access to cutting-edge AI capabilities
- Faster time to market
- More innovative applications
- Greater competitive differentiation
The 99.9% Necessity Cases:
High-Stakes Applications:
- Medical diagnoses and treatment recommendations
- Financial trading and investment decisions
- Safety-critical system controls
- Legal document generation and analysis
The Reliability Trajectory:
Continuous Improvement:


The Implementation Strategy:
- Current Optimal Approach: Human-AI collaboration for advanced tasks
- Future Evolution: Increasing full automation as reliability improves
- Strategic Positioning: Build for current accuracy levels while preparing for higher reliability
The Collaboration Timeline:


This provides a roadmap for when to implement different AI strategies based on accuracy requirements and risk tolerance.
๐ Key Insights from [15:53-21:13]
Essential Insights:
- Memory Revolution - Claude 4's persistent memory system enables multi-session collaboration on complex, long-horizon projects
- Intelligence Gap Analysis - AI's judgment and generation capabilities are more aligned than humans', requiring different collaboration models
- Rapid Market Evolution - Y Combinator data shows startups moving from co-pilot to full automation strategies within 6-month cycles
Actionable Insights:
- Accuracy Strategy: Choose 70-80% accuracy applications for cutting-edge AI capabilities vs. 99.9% for high-stakes deployment
- Collaboration Model: Position humans as managers/supervisors rather than co-workers when working with AI
- Implementation Timing: Current optimal approach is human-AI collaboration with preparation for increasing full automation
๐ References from [15:53-21:13]
People Mentioned:
- Diana Hu - General Partner at Y Combinator conducting the interview
Companies & Products:
- Anthropic - Developer of Claude 4 with new memory and supervision capabilities
- Y Combinator - Startup accelerator observing rapid evolution from co-pilot to full automation strategies
- Claude 3.7 Sonnet - Previous Anthropic model that was "too eager" in coding applications
- Claude 4 - Latest model with improved memory, supervision, and agent capabilities
- Claude 5 - Future model referenced as next improvement iteration
Research & Benchmarking:
- METR - Organization that benchmarked AI task duration capabilities against human performance using the "50%-task-completion time horizon" metric
- Y Combinator Spring Batch - Recent cohort showing dramatic shift toward end-to-end AI automation
Technologies & Tools:
- Context Windows - AI conversation memory limitations that Claude 4's memory system overcomes
- Memory Storage System - Claude 4's ability to save information as files/records across sessions
- Agent Capabilities - AI's ability to act autonomously in coding, search, and other applications
Concepts & Frameworks:
- Human-AI Collaboration Model - Humans as managers providing oversight and sanity checks for AI work
- Accuracy Requirements Spectrum - 70-80% vs 99.9% accuracy determining implementation strategy
- Co-pilot to Full Automation Evolution - Rapid transition from human-supervised to fully automated AI workflows
- Judgment vs Generation Gap - Fundamental difference between human and AI intelligence patterns
- Scaling Law Manifestation - Theoretical predictions now visible in practical applications
๐ What Does Dario's "Machines of Loving Grace" Vision Look Like in Practice?
From Optimistic Essays to Real-World Human-AI Collaboration
Diana references Dario Amodei's influential essay about AI's potential, prompting Kaplan to share concrete examples of how this vision is already materializing.
Current Reality in Biomedical Research:


The Orchestration Key:
Critical Success Factor: "With the right sort of orchestration" - not just raw AI capability, but thoughtful integration and management
Drug Discovery Applications:
- Frontier AI models already producing valuable insights
- Real researchers achieving meaningful results
- Practical applications beyond theoretical potential
The Optimistic Foundation:
- Dario's Vision: "Machines of Loving Grace" paints an optimistic picture of AI-human collaboration
- Current Evidence: Early manifestations already visible in high-stakes research domains
- Implementation Reality: Success depends on skillful orchestration rather than just AI capability
The Bridge to Reality:
This represents the practical manifestation of ambitious AI visions - not just theoretical possibilities, but working applications in critical domains like healthcare and drug development.
๐ง Why Is AI's "Breadth vs. Depth" Advantage Perfect for Scientific Breakthroughs?
How AI's Unique Intelligence Pattern Unlocks New Research Possibilities
Kaplan reveals a fundamental distinction between types of intelligence that explains why AI may excel in certain scientific domains over others.
The Two Types of Intelligence:
Depth Intelligence:
- Requires intensive focus on single problems
- Example: Working on one theorem for a decade (Riemann Hypothesis, Fermat's Last Theorem)
- Traditional strength of human experts
Breadth Intelligence:
- Requires synthesizing vast amounts of information across domains
- More common in biology, psychology, history
- AI's natural advantage due to pre-training
AI's Unique Advantage:


The Cross-Domain Synthesis Opportunity:
- AI's Superpower: Ability to connect insights across multiple areas of expertise
- Human Limitation: No single human expert has knowledge spanning all relevant domains
- Research Application: Eliciting insights that combine biology, chemistry, physics, and other fields simultaneously
The Knowledge Integration Advantage:


Practical Implications:
- AI can synthesize insights across traditionally separate research silos
- Breakthrough potential in interdisciplinary research
- Leveraging AI's comprehensive knowledge base for novel connections
๐ฎ How Do You Predict the Unpredictable Future of AI Implementation?
Why Scaling Laws Work for Trends But Fall Short for Details
Kaplan provides a nuanced view of prediction in AI, distinguishing between what can be reliably forecasted and what remains fundamentally uncertain.
What Scaling Laws Can Predict:
- Reliable Trend Continuation: The overall trajectory of AI capability improvement
- Macro-Economic Parallels: GDP, economic growth, and other long-term trends provide precedent
- Capability Progression: General advancement in AI performance and task complexity
What Remains Unpredictable:


- Implementation Details: Specific ways AI will be integrated into society and business
- Adoption Patterns: Which industries will adopt AI first and how quickly
- Social Dynamics: How humans will adapt to and interact with AI systems
The Prediction Framework:
Reliable Long-Term Trends:


- Uncertain Specifics: The details of implementation, timing, and social adaptation
- Scientific Approach: Use what can be predicted (scaling laws) while acknowledging uncertainty about specifics
The Intellectual Honesty:
Rather than overconfident predictions, Kaplan demonstrates scientific rigor by clearly distinguishing between what scaling laws can and cannot forecast about the future.
๐ผ What Are the Most Promising "Green Field" Opportunities for AI Builders?
Beyond Coding: Identifying the Next Wave of AI Applications
Diana asks about untapped opportunities, and Kaplan identifies specific domains ripe for AI transformation based on clear criteria.
The Green Field Criteria:


Specific High-Potential Domains:
Finance:
- Complex data analysis and pattern recognition
- Quantitative modeling and risk assessment
- Algorithmic trading and portfolio management
Excel-Heavy Professionals:
- Financial analysts and accountants
- Business analysts and consultants
- Operations managers and planners
Legal (with caveats):
- Document review and contract analysis
- Legal research and case law synthesis
- BUT: "Maybe law is more regulated, requires more expertise as a stamp of approval"
The Meta-Opportunity:
AI Integration Services:


The Electricity Analogy:
- Historical Parallel: Early electricity adoption simply replaced steam engines with electric motors
- Better Approach: "You wanted to sort of remake the way that factories work"
- AI Implication: Don't just replace human tasks - reimagine entire workflows and business processes
The Leverage Opportunity:


๐ฌ How Does a Physicist's "Dumb Questions" Approach Revolutionize AI Research?
The Power of Precision in Identifying Breakthrough Opportunities
Diana explores how Kaplan's physics training contributed to discovering scaling laws, revealing a methodology that others can apply.
The Physics Mindset:


The "Dumb Questions" Method:
- Real Example: Encountering brilliant AI researchers saying "learning is converging exponentially"
Critical Questions:
- "Are you sure it's an exponential?"
- "Could it just be a power law?"
- "Is it quadratic?"
- "Like exactly how is this thing converging?"
Why Precision Matters:


The Strategic Value:
- The Holy Grail: Finding a better slope to the scaling law
- Competitive Advantage: "As you put in more compute, you're going to get a bigger and bigger advantage over other AI developers"
- Systematic Progress: Know exactly what it means to improve and how to measure success
The Precision Requirement:


The Transferable Skill:
This approach isn't limited to physics - anyone can apply rigorous questioning to make vague trends precise and actionable.
๐งฎ What Physics Concepts Actually Transfer to AI Research?
From Matrix Limits to Naive Questions: The Real Physics Tools for AI
Diana probes deeper into specific physics techniques, but Kaplan reveals that the most powerful tools are surprisingly fundamental.
The Matrix Mathematics Connection:


- Practical Application: Studying approximations where neural networks are very large
- Physics Heritage: Well-known approximation techniques from physics and mathematics
- Current Relevance: Applied to understanding behavior of massive neural networks
The Counter-Intuitive Truth:


Why "Fancy Techniques" Aren't Needed:
- AI's Youth: "AI is really in a certain sense only like maybe 10-15 years old in terms of the current incarnation"
- Fundamental Gaps: "A lot of the most basic questions haven't been answered like questions of interpretability, how AI models really work"
- Low-Hanging Fruit: More value in basic understanding than advanced mathematical techniques
The Specific Physics Reality:


The Research Opportunity:
- Basic Questions Remain: Fundamental interpretability and understanding challenges
- New Field Advantage: Incredible opportunity for foundational discoveries
- Simple Tools Win: Naive questioning more valuable than sophisticated mathematical machinery
๐ Why Is AI Interpretability More Like Biology Than Physics?
The Advantage AI Has Over Neuroscience in Understanding Intelligence
Kaplan draws fascinating parallels between AI interpretability challenges and biological research, while highlighting AI's unique research advantages.
The Biological Analogy:


- Research Approach: Similar to trying to understand brain features and neural networks
- Complexity Level: More biological investigation than mathematical derivation
- Methodology: Reverse engineering complex systems rather than deriving from first principles
AI's Massive Research Advantage:


The Data Advantage:
- Complete Observability: Every parameter, activation, and connection is measurable
- Perfect Monitoring: Can track all neural network activity during training and inference
- Unlimited Experimentation: Can modify and test AI systems in ways impossible with biological brains
The Research Implications:
- Much More Data: "There's much much much more data for reverse engineering how AI models work"
- Better Tools: Complete system access versus limited biological measurement capabilities
- Faster Progress: Potential for more rapid interpretability breakthroughs than neuroscience
The Methodological Insight:
AI interpretability combines the systematic approach of biology with the complete data access that biological systems can never provide, creating unprecedented opportunities for understanding intelligence.
๐ Key Insights from [21:18-29:45]
Essential Insights:
- Breadth Intelligence Advantage - AI's ability to synthesize knowledge across all human domains creates unique research opportunities that no single human expert could achieve
- Physics Methodology Transfer - Asking precise, "dumb" questions about vague trends yields more breakthroughs than applying sophisticated mathematical techniques
- AI Research Superiority - Unlike neuroscience, AI interpretability benefits from complete observability of all system components and behaviors
Actionable Insights:
- Research Strategy: Focus on interdisciplinary problems where AI can synthesize knowledge across multiple expert domains
- Business Opportunities: Target skill-intensive, computer-based work in finance, Excel-heavy roles, and AI integration services
- Scientific Approach: Make vague trends precise through rigorous questioning to identify competitive advantages and systematic improvement paths
๐ References from [21:18-29:45]
People Mentioned:
- Dario Amodei - Anthropic CEO and author of "Machines of Loving Grace" essay painting optimistic AI future
- Diana Hu - Y Combinator General Partner conducting the interview
Publications & Essays:
- "Machines of Loving Grace" - Dario Amodei's influential essay about optimistic AI collaboration future
Mathematical Concepts:
- Riemann Hypothesis - Famous unsolved mathematical conjecture used as example of depth intelligence
- Fermat's Last Theorem - Historical mathematical problem requiring decade-long focused work
- Power Laws vs. Exponentials - Mathematical distinctions Kaplan uses to make AI trends precise
- Matrix Mathematics - Large matrix approximation techniques from physics applied to neural networks
Technologies & Applications:
- Drug Discovery - Biomedical research domain where AI is already producing valuable insights
- Excel Spreadsheets - Business tool representing skill-intensive, computer-based work ripe for AI
- Neural Network Parameters - AI models now have billions to trillions of parameters forming large matrices
Research Domains:
- Biomedical Research - Field where AI orchestration is already producing meaningful results
- Finance - High-potential domain for AI applications in data analysis and modeling
- Legal Services - Promising but regulated domain requiring expertise approval
- AI Integration Services - Meta-opportunity helping businesses adopt AI effectively
Concepts & Frameworks:
- Breadth vs. Depth Intelligence - Distinction between synthesizing across domains vs. deep focus on single problems
- Scaling Law Precision - Making vague trends mathematically precise to identify improvement opportunities
- AI Interpretability - Understanding how AI models work, compared to neuroscience methodology
- Electricity Adoption Analogy - Historical parallel for AI integration requiring workflow redesign rather than simple replacement
๐ What Would It Take to Convince a Scaling Laws Pioneer That the Curve Is Breaking?
The Contrarian Question That Reveals Deep Conviction About AI Progress
Diana poses a challenging contrarian question about scaling law durability, revealing just how robust these patterns have proven and why Kaplan remains convinced they'll continue.
The Remarkable Track Record:


- Scale of Validation: Scaling laws have remained consistent across enormous ranges of compute, data, and model sizes
- Statistical Significance: Five orders of magnitude represents unprecedented consistency in empirical observations
- Foundation for Confidence: This track record provides strong basis for continued belief
Kaplan's Diagnostic Approach:


The Default Assumption - Training Problems, Not Law Failures:


Potential Issues:
- Wrong neural network architecture
- Hidden bottlenecks in training process
- Precision problems in algorithms
- Implementation errors rather than fundamental limits
The Experience-Based Conviction:


What It Would Take:


- High Bar for Evidence: Would require overwhelming proof that the laws themselves, not implementation, are failing
- Scientific Rigor: Distinguishes between execution problems and fundamental physical limits
โก How Far Down the Precision Ladder Will AI Go When Compute Gets Scarce?
From FP4 to Binary: The Future of Efficient AI Computing
Diana explores the technical challenge of maintaining scaling progress when compute becomes scarce, leading to fascinating insights about AI efficiency and the "back to binary" future.
The Compute Scarcity Challenge:
- Current Reality: Massive compute requirements to maintain scaling curve progress
- Future Constraint: Compute will become increasingly scarce and expensive
- Technical Question: How low can precision go while maintaining performance?
The Current Inefficiency Situation:


The Dual Focus Strategy:
- Frontier Capabilities: Priority on unlocking most advanced AI capabilities
- Efficiency Improvements: Simultaneously making training and inference more efficient
- Speed of Innovation: Companies like Anthropic moving as quickly as possible on both fronts
Current Efficiency Gains:


The "Back to Binary" Joke:


- Technical Evolution: From high-precision floating point (FP16, FP32) toward binary representations
- Efficiency Driver: Lower precision dramatically reduces computational requirements
- Multiple Avenues: Precision reduction is one of many efficiency improvement strategies
๐ช๏ธ Why Are We "Very Very Very Out of Equilibrium" with AI Development?
Understanding the Chaotic State of Current AI Progress
Kaplan describes the current AI landscape as fundamentally unstable, with implications for how we should think about efficiency, cost, and future development.
The Disequilibrium State:


Multiple Rapid Changes Simultaneously:
- Capability Improvements: AI models getting smarter faster than expected
- Unrealized Potential: Haven't fully exploited current model capabilities
- New Capabilities: Continuously unlocking additional functionality
- Implementation Lag: Can't integrate improvements as fast as they emerge
The Equilibrium Question:


The Potential Perpetual Acceleration:


The Jevons Paradox Application:
- Economic Principle: As efficiency increases, demand often increases more than efficiency savings
- AI Context: Better AI creates more demand for AI capabilities
- Cost Implications: Instead of cheaper AI, we get more capable (and expensive) AI


๐ฐ Will All the Value Stay at the AI Frontier or Spread to Cheaper Models?
The Strategic Question That Could Shape the Entire AI Economy
Kaplan grapples with a fundamental economic question about AI value distribution that has massive implications for businesses and developers.
The Core Strategic Question:


The Task Complexity Framework:
- Simple Tasks: Can be handled by cheaper, less capable models
- Complex Tasks: Require frontier model capabilities for end-to-end completion
The Convenience Factor:


The Human Orchestration Challenge:
- Complex Coordination: Breaking complex tasks into small pieces requires significant human oversight
- Integration Overhead: Putting small task results together adds complexity and cost
- End-to-End Value: Single capable model eliminates coordination overhead
Kaplan's Expectation:


The Uncertainty Factor:
- Integration Capabilities: Depends on how efficiently humans can leverage less capable AI
- Market Development: Could change based on AI integration tool sophistication
- Economic Dynamics: Value distribution may shift as market matures
The Practical Implication:
This suggests frontier AI capabilities command premium pricing while commodity AI applications face cost pressure - a critical consideration for AI business strategy.
๐ฏ How Do You Stay Relevant When AI Models Become "So Awesome"?
Career Advice for Thriving in the Age of Superhuman AI
Diana asks the ultimate career question for young professionals: how to remain valuable when AI capabilities explode beyond current imagination.
The Direct Career Advice:


The Three-Pronged Strategy:
1. Deep Model Understanding:
- Technical Literacy: Understand how AI models actually function
- Capability Assessment: Know what models can and cannot do
- Limitation Awareness: Recognize current constraints and failure modes
2. Efficient Leverage and Integration:
- Optimization Skills: Maximize AI model effectiveness for specific tasks
- Integration Expertise: Seamlessly incorporate AI into existing workflows
- Orchestration Abilities: Coordinate multiple AI capabilities for complex outcomes
3. Frontier Building:
- Cutting-Edge Development: Work on the most advanced AI applications
- Innovation Focus: Create novel uses of emerging capabilities
- Early Adoption: Be among first to exploit new AI model features
The Meta-Skill Implication:
- Adaptive Learning: The ability to quickly understand and leverage new AI capabilities as they emerge
- Human-AI Collaboration: Becoming expert at the interface between human judgment and AI execution
- System Thinking: Understanding how AI fits into larger workflows and business processes
The Positioning Strategy:
Rather than competing with AI, become the person who makes AI most effective - the translator, integrator, and optimizer who maximizes AI value creation.
๐ Key Insights from [29:52-35:21]
Essential Insights:
- Scaling Law Robustness - Five orders of magnitude validation creates extremely high confidence; apparent failures typically indicate implementation problems rather than fundamental limits
- AI Development Disequilibrium - Current rapid AI improvement creates chaotic conditions where efficiency may remain secondary to capability advancement
- Frontier Value Concentration - Most AI economic value likely concentrates in most capable models due to end-to-end task completion convenience
Actionable Insights:
- Career Strategy: Focus on understanding AI models deeply, leveraging them efficiently, and building at the frontier rather than competing with AI
- Business Planning: Expect continued prioritization of capability over efficiency while AI remains in rapid improvement phase
- Technical Preparation: Prepare for dramatic efficiency improvements (including binary precision) when AI development eventually reaches equilibrium
๐ References from [29:52-35:21]
People Mentioned:
- Diana Hu - Y Combinator General Partner asking probing questions about scaling law durability and career advice
Companies & Technologies:
- Anthropic - Kaplan's company working on both frontier capabilities and efficiency improvements
- FP4 and FP2 - Low-precision floating point formats for efficient AI computation
- Binary Representations - Ultra-low precision computing format representing the efficiency frontier
Technical Concepts:
- Five Orders of Magnitude - Scale across which scaling laws have remained consistent
- Neural Network Architecture - System design that could break scaling laws if implemented incorrectly
- Training Bottlenecks - Hidden constraints that could make scaling laws appear to fail
- Algorithm Precision - Mathematical accuracy in AI training implementations
Economic Principles:
- Jevons Paradox - Economic principle where efficiency improvements increase rather than decrease total resource consumption
- Frontier vs. Commodity Value - Distribution of economic value between cutting-edge and basic AI capabilities
- 3x to 10x Annual Gains - Current rate of algorithmic and inference efficiency improvements
Career & Strategic Concepts:
- AI Model Understanding - Deep technical knowledge of how AI systems function
- Efficient AI Leverage - Skills in maximizing AI model effectiveness for specific applications
- Frontier Building - Working on the most advanced AI applications and capabilities
- End-to-End Task Completion - AI's ability to handle complex workflows without human orchestration
- Human-AI Orchestration - Coordinating less capable AI models to complete complex tasks
๐ Why Does Linear AI Progress Suddenly Become Exponential Task Duration?
The Mystery Behind Meter's Surprising Finding on Time Horizon Scaling
An audience member asks a profound question about why scaling laws show linear progress in loss but exponential growth in task duration capabilities - a puzzle that even Kaplan finds intriguing.
The Scaling Paradox:
- Linear Progress: More exponential compute leads to linear improvements in scaling loss
- Exponential Jump: But task duration capability shows exponential growth patterns
- The Question: Why does the relationship change from linear to exponential?
Kaplan's Honest Assessment:


The Self-Correction Theory:


The Plan Reality Check:


The Mistake Detection Mechanism:
- Core Capability: AI's ability to notice when it's doing something wrong and correct course
- Information Efficiency: "It doesn't necessarily require a huge change in intelligence to sort of notice one or two more times that you've made a mistake"
- Horizon Doubling: Each improvement in error detection could double the task horizon length
The Breakthrough Insight:


- The Amplification Effect: Small improvements in self-correction create large improvements in task completion capability
- Exponential Nature: Each mistake fixed extends capability exponentially rather than linearly
The Empirical Focus:


๐ฏ How Do You Train AI for Long-Horizon Tasks Without Perfect Verification?
The Challenge of Scaling Beyond Coding to Complex Real-World Domains
An audience member poses a critical question about training AI for extended tasks in domains where verification signals aren't as clear as coding success/failure.
The Training Philosophy:
"My mental model of neural networks is very simple. If you want them to do something, you train on such data." โ Audience Member
The Verification Signal Challenge:
- Coding Success: Can deploy Claude agents and get clear verification signals from working/broken code
- Other Domains: Lack clear binary success indicators for complex, long-term tasks
- The Dilemma: Are we limited to "scaling data labelers to AGI"?
Kaplan's Worst-Case Scenario:


Why the Worst Case Is Still Viable:


- Economic Justification: Massive AI investment makes even operationally intensive approaches economically feasible
- Value Creation: The potential returns justify extensive human supervision efforts
The Better Solution - AI Supervision:
- The Vision: AI models trained to oversee and supervise other AI models
- Granular Feedback: Instead of binary success/failure, provide detailed continuous guidance
- Efficiency Gain: Avoid waiting years for final task completion to get training signal
The Ridiculous Example:


- Better Approach: "You're doing this well, you're doing this poorly" throughout the process
- Current Implementation: "I think we're already doing this to some extent"
๐ค Are Humans Still Needed to Create AI Training Tasks?
The Meta-Question About AI Creating Its Own Training Data
The final audience question explores whether AI can bootstrap its own training by generating the tasks it learns from - a recursive approach to AI development.
The Training Task Creation Question:
- The Process: Developing complex tasks for reinforcement learning training
- The Method: Training AI models on these tasks to improve long-horizon capabilities
- The Meta-Question: Can AI create its own training tasks?
Kaplan's Current Reality:


The Hybrid Approach:
- AI-Generated Tasks: Leveraging AI to automatically create training scenarios, especially with code generation
- Human-Created Tasks: Still involving humans in task design and creation
- Mixed Strategy: Combining both approaches for optimal results
The Future Trajectory:


The Moving Target Challenge:
- Increasing AI Capability: AI becomes better at generating training tasks
- Rising Task Complexity: The frontier of difficult tasks also advances
- Persistent Human Role: Humans remain necessary for the most challenging task design
The Bootstrap Limitation:
- Self-Improvement Constraint: AI may struggle to create tasks significantly harder than its current capability level
- Human Innovation: Humans still needed to push beyond current AI task design capabilities
- Frontier Maintenance: Keeping humans involved ensures continued capability expansion
The Balanced Future:
AI will increasingly handle routine training task generation while humans focus on designing the most challenging and novel scenarios that push AI capabilities forward.
๐ Key Insights from [35:27-40:43]
Essential Insights:
- Self-Correction Amplification - Small improvements in AI's ability to detect and fix mistakes could exponentially extend task completion horizons through reduced failure points
- Supervision Efficiency - AI-supervised training with granular feedback throughout long tasks is more efficient than waiting for final binary success/failure signals
- Human-AI Task Creation Balance - Current optimal approach mixes AI-generated training tasks with human-designed challenges, with humans remaining essential for frontier task complexity
Actionable Insights:
- Training Strategy: Focus on developing AI self-correction capabilities as a lever for dramatic task horizon improvements
- Supervision Design: Implement detailed, continuous feedback systems rather than binary end-state evaluations for complex tasks
- Development Approach: Plan for hybrid human-AI task creation where AI handles routine generation while humans design frontier challenges
๐ References from [35:27-40:43]
People Mentioned:
- Audience Members - Y Combinator AI Startup School attendees asking technical questions about scaling laws and training approaches
Companies & Products:
- Claude Agent - Anthropic's AI system used as example for verification signal collection in coding tasks
Research & Measurements:
- METR Finding - Empirical research showing exponential growth in AI task duration capabilities despite linear scaling loss improvements
- METR arXiv Paper - Peer-reviewed publication detailing the methodology and results of the study
Technical Concepts:
- Scaling Loss - Mathematical measure of AI model performance that improves linearly with compute
- Time Horizon Tasks - Long-duration activities that AI models can complete, measured in hours, days, or weeks
- Self-Correction Capability - AI's ability to identify mistakes and adjust course during task execution
- Verification Signals - Feedback mechanisms that indicate whether AI task performance is successful
- Reinforcement Learning (RL) - Training method using reward signals to improve AI performance on complex tasks
Training Methodologies:
- AI Supervision - Using AI models to oversee and provide feedback to other AI models during training
- Task Generation - Creating training scenarios for AI models, potentially using AI itself
- Granular Feedback - Detailed, continuous guidance rather than binary success/failure signals
- Long-Horizon Training - Teaching AI to complete tasks spanning extended time periods
Domain Examples:
- Academic Tenure - Seven-year process used as example of inefficient binary feedback for long-term tasks
- Code Generation - Domain with clear verification signals making it ideal for AI training and deployment