undefined - Prompt Engineering Advice From Top AI Startups

Prompt Engineering Advice From Top AI Startups

At first, prompting seemed to be a temporary workaround for getting the most out of large language models. But over time, it's become critical to the way we interact with AI.On the Lightcone, Garry, Harj, Diana, and Jared break down what they've learned from working with hundreds of founders building with LLMs: why prompting still matters, where it breaks down, and how teams are making it more reliable in production.They share real examples of prompts that failed, how companies are testi...

May 30, 202531:26

Table of Contents

0:00-12:05
12:12-17:23
17:29-23:12
23:19-27:23
27:29-31:04

🚀 Introduction: The New Frontier of Prompt Engineering

Welcome to the Lightcone Podcast, where YC Partners Garry, Harj, Diana, and Jared dive into what's actually happening inside the best AI startups when it comes to prompt engineering. They've surveyed more than a dozen companies to pull back the curtain on practical techniques from the frontier of building AI products.

"Metaprompting is turning out to be a very very powerful tool that everyone's using now. It kind of actually feels like coding in you know 1995 like the tools are not all the way there we're you know in this new frontier but personally it also kind of feels like learning how to manage a person where it's like how do I actually communicate uh you know the things that they need to know in order to make a good decision." - Garry Tan

The conversation sets up the reality that while prompting may have seemed like a temporary workaround initially, it has become critical to how we interact with AI systems effectively.

Timestamp: [0:00-0:58] Youtube Icon

🎯 Real-World Example: Parahelp's Production Prompt

Jared shares an exclusive look at a production prompt from Parahelp, an AI customer support company that powers support for major AI companies like Perplexity, Replit, and Bolt. This represents a rare opportunity to see the "crown jewels" of a vertical AI agent company's intellectual property.

The Parahelp team graciously agreed to open-source their actual prompt that powers their AI agent, providing unprecedented insight into how professional-grade AI customer support actually works behind the scenes. When you email a customer support ticket to Perplexity, what's responding is actually Parahelp's AI agent using this sophisticated prompt structure.

This example demonstrates the level of sophistication required for AI agents operating in production environments where reliability and consistency are paramount.

Timestamp: [0:58-1:44] Youtube Icon

📋 Anatomy of a Professional Prompt: Six Pages of Precision

Diana walks through the detailed structure of Parahelp's production prompt, revealing it to be six pages long with very specific architectural decisions. The prompt demonstrates several key principles that separate professional-grade prompts from amateur attempts.

The prompt begins by establishing the LLM's role as "a manager of a customer service agent" and breaks down responsibilities into clear bullet points. It then defines the specific task of approving or rejecting tool calls, since the system orchestrates agent calls from multiple other agents.

The structure follows a step-by-step approach with numbered steps (one through five) and includes important constraints about what kinds of tools it should not call. Output formatting is meticulously specified because agents need to integrate with other agents, requiring precise API-like interactions.

"The big thing that a lot of the best prompts start with is this concept of setting up the role of the LLM... then the big thing is telling the task which is to approve or reject a tool call because it's orchestrating agent calls from all these other ones." - Diana Hu

The prompt uses markdown-style formatting with clear headings and sub-bullet sections, making it easier for LLMs to parse and follow. It includes three major sections covering planning methodology, step creation processes, and high-level planning examples.

Timestamp: [1:44-4:15] Youtube Icon

🛠️ The Programming-Like Nature of Modern Prompts

The conversation reveals how sophisticated prompts have evolved to look more like programming than natural English writing. The Parahelp prompt uses XML tag formatting to specify plans and structure, which has proven more effective than traditional prose approaches.

"One thing that's interesting about this it looks more like programming than writing English because it has this XML tag kind of format to specify sort of the plan. We found that it makes it a lot easier for LLMs to follow because a lot of LLMs were post-trained in LHF with kind of XML type of input and it turns out to produce better results." - Diana Hu

This technical approach stems from understanding how LLMs were trained - many were post-trained with XML-type input during their RLHF (Reinforcement Learning from Human Feedback) process, making them naturally better at parsing structured, tag-based instructions.

The hosts note that what they're seeing is just the general system prompt, with customer-specific examples and workflows handled in subsequent stages of the pipeline. This separation allows for scalability while maintaining customization.

Timestamp: [3:34-4:44] Youtube Icon

🏗️ Prompt Architecture: System, Developer, and User Layers

The hosts break down the emerging architecture of professional prompt systems into three distinct layers, each serving different purposes in the AI application stack.

System Prompt: Defines the high-level API of how the company operates. The Parahelp example represents a pure system prompt with nothing customer-specific - it establishes the fundamental operating principles and capabilities of the AI agent.

Developer Prompt: Contains all the customer-specific context and workflows. For Parahelp, this layer would include specific instructions for handling Perplexity's FAQ questions differently from Bolt's technical support needs. This is where customization happens without rebuilding the entire system.

User Prompt: Contains the end-user input. For products like Replit or Cursor, this would be where users type requests like "generate me a site that has these buttons." Parahelp doesn't have a user prompt layer since their product isn't consumed directly by end users.

This architectural approach addresses a critical challenge for vertical AI agent companies: how to build flexible, general-purpose products without becoming consulting companies that build custom prompts for every customer. The layered approach allows for systematic scaling while maintaining necessary customization.

Timestamp: [5:02-6:04] Youtube Icon

🔧 The Automation Opportunity in Prompt Engineering

The conversation identifies significant startup opportunities in building tooling around prompt engineering, particularly for automatically generating and optimizing worked examples - a critical component for improving AI output quality.

The hosts envision an ideal scenario where an agent automatically extracts the best examples from customer datasets and seamlessly integrates them into the appropriate pipeline layer without manual intervention. Currently, companies like Parahelp need high-quality worked examples specific to each customer, but this process requires significant manual effort.

"In your dream world what you want is just like an agent itself that can pluck out the best examples from like the customer data set and then software that just like ingests that straight into like wherever it should belong in the pipeline without you having to manually go out and plug that all and ingest it in all of yourself." - Harj Taggar

This automation challenge represents a natural segue into meta-prompting, where AI systems help improve their own prompting strategies. The need for better tooling around example selection, prompt optimization, and pipeline integration suggests a rich ecosystem of potential solutions for teams building AI products at scale.

Timestamp: [6:11-6:55] Youtube Icon

🔄 Meta-Prompting: AI Helping AI Get Better

Garry introduces meta-prompting through the example of Tropir, a YC startup that helps companies debug and understand multi-stage AI workflows. They've developed "prompt folding" - a technique where one prompt dynamically generates better versions of itself.

The concept works by taking an existing prompt that may have failed or underperformed, feeding it to an LLM along with examples of where it went wrong, and asking the AI to improve the prompt rather than manually rewriting it. This approach leverages the fact that LLMs understand themselves surprisingly well.

"You can actually go in take the existing prompt that you have and actually feed it more examples where maybe the prompt failed or didn't quite do what you wanted and you can actually instead of you having to go and rewrite the prompt you just put it into the raw LLM and say help me make this prompt better and because it knows itself so well strangely metaprompting is turning out to be a very very powerful tool that everyone's using now." - Garry Tan

A practical example involves classifier prompts that generate specialized prompts based on previous queries, creating a dynamic optimization loop where the AI system continuously improves its own performance based on real-world usage patterns.

Timestamp: [6:55-8:02] Youtube Icon

📚 Complex Tasks: Learning from Expert Examples

For particularly complex tasks, Diana discuss how companies like Jasberry use sophisticated example-based training. Jasberry builds automatic bug-finding tools for code, which requires the AI to identify subtle issues that even expert programmers find challenging.

Their approach involves feeding the AI numerous examples of complex bugs that only expert programmers could typically identify. For instance, detecting N+1 query problems requires understanding both database optimization and code structure patterns that are difficult to describe in prose alone.

"The way they do it is they feed a bunch of really hard examples that only expert programmers could do. Let's say if you want to find an N+1 query it's actually hard for today for even like the best LLMs to find those and the way to do those is they find parts of the code then they add those into the prompt a meta prompt that's like hey this is an example of n plus1 type of error and then that works it out." - Diana Hu

This pattern of using examples instead of trying to write detailed prose instructions works particularly well because it helps LLMs reason around complicated tasks and provides concrete steering mechanisms. The approach resembles unit testing in programming - it's like test-driven development for LLM behavior.

When tasks are too complex to parameterize exactly, showing the AI what good and bad outputs look like becomes more effective than trying to describe the nuances in natural language.

Timestamp: [8:02-9:00] Youtube Icon

⚠️ The Hallucination Trap: When AI Tries Too Hard to Help

Tropir discovered a critical insight about LLM behavior: models are so eager to help that they'll fabricate responses rather than admit uncertainty. When asked for output in a specific format, LLMs will often generate plausible-looking responses even when they lack sufficient information.

"The model really wants to actually help you so much that if you just tell it give me back output in this particular format even if it doesn't quite have the information it needs it'll actually just tell you what it thinks you want to hear and it's literally a hallucination." - Garry Tan

The solution involves providing LLMs with explicit "escape hatches" - clear instructions to stop and ask for clarification rather than making up answers. This requires telling the AI that if it doesn't have enough information to make a determination, it should pause and request additional context rather than generating a potentially incorrect response.

This insight challenges the common approach of being overly prescriptive about output formats without considering the model's tendency to comply even when inappropriate. Building in explicit uncertainty handling becomes crucial for reliable AI systems in production environments.

Timestamp: [9:06-9:41] Youtube Icon

🔧 YC's Debug Info Innovation: AI That Reports Its Own Problems

Harj describes an inventive approach developed at YC for giving LLMs an escape hatch through structured debugging information. Instead of just asking the AI to stop when confused, they built a systematic way for the AI to report issues back to developers.

Their response format includes a dedicated "debug info" parameter where the LLM can essentially file complaints about confusing or underspecified information it receives. This creates a feedback loop where the AI actively helps developers identify problems with their prompts and workflows.

"We came up with a different way which is in the response format to give it the ability to have part of the response be essentially a complaint to you the developer that like you have given it confusing or underspecified information and it doesn't know what to do." - Jared Friedman

The system runs in production with real user data, allowing developers to review outputs and extract actionable feedback. The debug info parameter becomes a to-do list for agent developers, with the AI itself identifying specific areas that need improvement.

"It literally ends up being like a to-do list that you the agent developer has to do it's like really kind of mind-blowing stuff." - Jared Friedman

This approach transforms the AI from a passive tool into an active participant in improving the development process, creating a collaborative dynamic between human developers and AI systems.

Timestamp: [9:41-10:47] Youtube Icon

🎓 Getting Started: Simple Meta-Prompting for Everyone

Harj provides practical advice for hobbyists and developers interested in experimenting with meta-prompting techniques. The approach is surprisingly accessible and follows the same structural principles used by professional teams.

The simple method involves giving the AI a role as an expert prompt engineer who provides detailed critiques and improvement advice. You then feed it your existing prompt and ask for feedback and enhancement suggestions. This creates an iterative improvement loop that often yields significantly better results.

"A very simple way to get started with meta prompting is to follow the same structure of the prompt is give it a role and make the role be like you know you're a expert prompt engineer who gives really like detailed great critiques and advice on how to improve prompts and give it the prompt that you had in mind and it will spit you back a much more expanded better prompt." - Harj Taggar

The process works surprisingly well and can be repeated multiple times, with each iteration potentially improving the prompt further. This democratizes access to sophisticated prompt optimization techniques that were previously available only to teams with extensive AI expertise.

Timestamp: [10:47-11:13] Youtube Icon

⚡ Production Optimization: Big Models Train Small Models

Companies frequently use meta-prompting with large, powerful models to create optimized prompts that can then run efficiently on smaller, faster models. This approach balances quality with performance requirements, particularly important for applications requiring low latency.

The typical workflow involves using models with hundreds of billions of parameters (like Claude 3.5 or GPT-4) to perform meta-prompting and generate highly refined prompts. These optimized prompts are then deployed on smaller, faster models that can respond quickly enough for real-time applications.

This pattern is especially common among voice AI agent companies, where response latency is critical for maintaining the illusion of natural conversation. If there's too much pause before the agent responds, humans can detect that something is artificial, breaking the conversational flow.

"Sometimes for companies when they need to get responses from elements in their product a lot quicker they do the meta prompting with a bigger beefier model... and they do this meta prompting and then they have a very good working one that then they use into the distilled model... specifically sometimes for voice AI agents companies because latency is very important to get this whole turing test to pass." - Diana Hu

The result is a two-stage optimization process: use powerful models for prompt development, then deploy optimized prompts on fast models for production use. This allows companies to achieve both high quality and low latency in their AI applications.

Timestamp: [11:13-12:05] Youtube Icon

💎 Key Insights

  • Modern prompt engineering resembles programming more than natural language writing, with XML tags and structured formatting proving more effective than prose
  • Professional prompts follow a three-layer architecture: system prompts (general operations), developer prompts (customer-specific context), and user prompts (end-user input)
  • Meta-prompting allows AI systems to improve their own prompts through iterative feedback loops, with LLMs surprisingly effective at self-optimization
  • Complex tasks benefit more from expert examples than detailed written instructions, similar to test-driven development approaches
  • LLMs need explicit "escape hatches" to avoid hallucinating responses when they lack sufficient information
  • Production systems increasingly use large models to optimize prompts that then run on smaller, faster models for latency-sensitive applications
  • The biggest opportunity lies in automating the extraction and integration of worked examples from customer datasets

Timestamp: [0:00-12:05] Youtube Icon

📚 References

Companies:

  • Parahelp - AI customer support company powering Perplexity, Replit, and Bolt
  • Tropir - YC startup helping companies debug multi-stage AI workflows
  • Jasberry - Company building automatic bug-finding tools for code
  • Perplexity - AI search company using Parahelp for customer support
  • Replit - Online coding platform using Parahelp for customer support
  • Bolt - Development platform using Parahelp for customer support
  • YC (Y Combinator) - Startup accelerator mentioned as context for examples

Technical Concepts:

  • Meta-prompting - Using AI to improve its own prompts through iterative feedback
  • Prompt folding - Technique where prompts dynamically generate better versions of themselves
  • RLHF (Reinforcement Learning from Human Feedback) - Training process that makes LLMs better at parsing XML-structured input
  • N+1 query problems - Database optimization issues that expert programmers must identify

AI Models:

  • Claude 3.5 - Large language model used for meta-prompting optimization
  • GPT-4 - Large language model used for meta-prompting optimization

Timestamp: [0:00-12:05] Youtube Icon

📝 Managing Long Prompts: Documentation and Iteration Strategies

As prompts grow into large working documents spanning multiple pages, managing their evolution becomes critical. The panel discusses practical strategies for tracking improvements and managing complex prompt development cycles.

One effective approach involves maintaining a Google Doc to note down specific issues with outputs or areas for improvement. Rather than trying to fix everything immediately, developers can collect observations about where the AI isn't performing as expected and batch these notes for systematic improvement.

"As the prompt gets longer and longer like it becomes a large working doc... one thing I found useful is as you're using it if you just note down in a Google doc things that you're seeing just the outputs not being how you want or not ways that you can think of to improve it you can just write those in note form and then give Gemini Pro like your notes plus the original prompt and ask it to suggest a bunch of edits to the prompt to incorporate these in well and it does that quite well." - Host

This documentation-driven approach allows teams to systematically collect feedback and then leverage AI tools to suggest specific improvements rather than making ad-hoc changes that might introduce new problems.

Timestamp: [12:12-12:42] Youtube Icon

🔍 Thinking Traces: The Hidden Debug Information

Gemini 2.5 Pro's thinking traces provide unprecedented insight into how AI models process prompts and make decisions. These traces reveal the internal reasoning process, showing exactly where prompts succeed or fail in guiding the model's behavior.

The thinking traces function as critical debug information that was previously unavailable through API access. This capability has recently been added to the API, allowing developers to integrate this debugging information directly into their development tools and workflows.

"The thinking traces are like the critical debug information to like understand like what's wrong with your prompt they just added it to the API so you can now actually like pipe that back into your developer tools and workflows." - Jared Friedman

The long context windows in Gemini Pro enable a particularly effective debugging approach where developers can process prompts example by example, watching the reasoning trace in real time to understand how to better steer the model toward desired outcomes.

Timestamp: [12:42-13:38] Youtube Icon

🛠️ REPL-Style Debugging with Large Context Windows

Gemini Pro's extensive context window enables a REPL (Read-Eval-Print Loop) style debugging approach where developers can interactively test and refine prompts. This method involves putting a prompt alongside one example and literally watching the reasoning trace in real time.

"I think it's an underrated consequence of Gemini Pro having such long context windows is you can effectively use it like a REPL... put your prompt on like one example then literally watch the reasoning trace in real time to figure out like how you can steer it in the direction you want." - Harj Taggar

YC's software team has built specialized workbenches for debugging AI workflows, but the panel notes that sometimes direct interaction through gemini.google.com proves more effective. The platform allows developers to drag and drop JSON files directly into the interface without requiring special containers or complex setup.

This approach democratizes access to sophisticated debugging capabilities, making advanced prompt optimization techniques available even to teams without custom tooling infrastructure.

Timestamp: [13:27-14:02] Youtube Icon

🏆 Evals: The True Crown Jewels of AI Companies

While prompts often get attention as the core intellectual property of AI companies, the panel reveals that evaluations (evals) represent the true crown jewels. Parahelp's willingness to open-source their prompt stems from their belief that the real value lies in their evaluation systems.

"One reason that Parahelp was willing to open source the prompt is they told me that they actually don't consider the prompts to be the crown jewels like the evals are the crown jewels because without the evals you don't know why the prompt was written the way that it was and it's very hard to improve it." - Jared Friedman

Evals provide the critical context for understanding why specific prompt decisions were made and enable systematic improvement over time. Without comprehensive evaluation systems, even the most sophisticated prompts become difficult to maintain, debug, or enhance as requirements evolve.

This insight reframes the competitive landscape for AI companies, suggesting that sustainable advantages come from evaluation capabilities rather than prompt engineering alone.

Timestamp: [14:24-14:53] Youtube Icon

🚜 The Nebraska Principle: Deep Domain Knowledge as Competitive Moat

YC funds numerous vertical AI and SaaS companies, but the panel emphasizes that true competitive advantage comes from intimate understanding of specific industries and workflows. This requires founders to literally sit side-by-side with domain experts to understand their real-world needs.

"You can't get the eval unless you sitting literally side by side with people who are doing X Y or Z knowledge work... you need to sit next to the tractor sales regional manager and understand well you know this person cares you know this is how they get promoted this is what they care about this is that person's reward function." - Garry Tan

The process involves taking in-person interactions from places like Nebraska and codifying that knowledge into very specific evaluations. For example, understanding how a particular user wants outcomes handled when an invoice comes in and they need to decide whether to honor a tractor warranty.

"If you are out there in particular places understanding that user better than anyone else and having the software actually work for those people that's the moat." - Garry Tan

This deep domain expertise addresses concerns about AI companies being mere "wrappers" by establishing defensible competitive advantages through superior understanding of specific user needs and workflows.

Timestamp: [14:59-16:10] Youtube Icon

🎯 The Founder Profile: Technical Excellence Meets Domain Obsession

The panel describes the core competency required of successful AI startup founders today: maniacal obsession with the details of specific user workflows combined with technical sophistication. This represents a unique intersection of skills that creates significant barriers to entry.

"That's your job as a founder of a company like this is to be really good at that thing and like maniacally obsessed with like the details of the regional tractor sales manager workflow." - Jared Friedman

The challenge lies in finding founders who are simultaneously great engineers and technologists while also understanding parts of the world that very few people understand. This creates a narrow but valuable opportunity space for those who can bridge both domains.

"The classic view is that the best founders in the world they're you know sort of really great cracked engineers and technologists and just really brilliant and then at the same time they have to understand some part of the world that very few people understand and then there's this little sliver that is you know the founder of a multi-billion dollar startup." - Garry Tan

The example of Ryan Peterson from Flexport illustrates this principle perfectly - someone who understands software development but also became the third biggest importer of medical hot tubs for an entire year, giving him unique insights into logistics and import/export workflows.

Timestamp: [16:10-17:23] Youtube Icon

💎 Key Insights

  • Long prompts require systematic documentation and iteration strategies, with Google Docs serving as effective tracking mechanisms for improvement opportunities
  • Thinking traces in Gemini 2.5 Pro provide critical debug information that was previously unavailable, now accessible through API integration
  • Large context windows enable REPL-style debugging where developers can watch reasoning traces in real time for immediate feedback
  • Evaluations, not prompts, represent the true intellectual property and competitive advantage for AI companies
  • Sustainable competitive moats require deep domain expertise gained through direct interaction with end users in their actual work environments
  • Successful AI startup founders need the rare combination of technical excellence and obsessive understanding of specific industry workflows
  • The "weirder" the domain knowledge a technical founder possesses, the greater the startup opportunity potential

Timestamp: [12:12-17:23] Youtube Icon

📚 References

People:

  • Eric Bacon - YC's head of data who has helped with meta-prompting and using Gemini Pro 2.5 as a REPL
  • Ryan Peterson - Founder of Flexport, example of technical founder with deep domain expertise (was third biggest importer of medical hot tubs)

Companies:

  • Flexport - Logistics company founded by Ryan Peterson, example of technical founder with unique domain knowledge
  • Parahelp - AI customer support company mentioned for their approach to prompts vs. evals as intellectual property

Technologies:

  • Gemini Pro 2.5 - Google's large language model with long context windows and thinking traces
  • Thinking traces - Debug information showing AI model reasoning process, recently added to Gemini API

Concepts:

  • REPL (Read-Eval-Print Loop) - Interactive debugging approach enabled by large context windows
  • Evals - Evaluation systems that represent the true crown jewels of AI companies
  • Nebraska Principle - The idea that competitive advantage comes from deep understanding of specific geographic/industry domains

Timestamp: [12:12-17:23] Youtube Icon

🚀 The Forward Deployed Engineer: Palantir's Revolutionary Approach

Garry explains how the concept of Forward Deployed Engineers (FDE) originated at Palantir and why it's become the essential model for AI startup founders today. The term traces back to Palantir's core recognition that Fortune 500 companies and government agencies lacked technologists who truly understood computer science at the highest level.

Palantir's founders - Peter Thiel, Alex Karp, Stefan Cohen, Joe Lonsdale, and Nathan Gettings - identified that these organizations faced multi-billion and sometimes trillion-dollar problems but had no one in the room who could apply cutting-edge technology to solve them.

"Go into anywhere in the Fortune 500 go into any government agency in the world including the United States and nobody who understands computer science and technology at the level that you at the highest possible level would ever even be in that room." - Garry

Before AI became mainstream, these organizations were drowning in data - giant databases of people, things, and transactions - with no idea how to extract value. Palantir's insight was to deploy the world's best technologists directly into these environments to build software that could make sense of petabytes of data and find needles in haystacks.

Timestamp: [17:29-19:26] Youtube Icon

👮 Inside the FBI: Turning File Cabinets into Software

The Forward Deployed Engineer role involved literally sitting next to FBI agents investigating domestic terrorism, observing their actual workflows in their offices. This immersive approach revealed the stark reality of how critical work was being done with primitive tools.

"How do you sit right next to them in their actual office and see what does the case coming in look like what are all the steps when you actually need to go to the federal prosecutor what are the things that they're sending is it I mean what's funny is like literally it's like word documents and Excel spreadsheets right." - Garry

Forward deployed engineers would observe these file cabinet and fax machine workflows, then convert them into clean, powerful software. The goal was ambitious: make investigation work at three-letter agencies as easy as posting a photo to Instagram.

"The classic view is that it should be as easy to actually do an investigation at a three-letter agency as going and taking a photo of your lunch on Instagram and posting it to all your friends." - Garry

This direct observation and rapid iteration approach has proven so effective that many former Palantir Forward Deployed Engineers have become some of the most successful founders in YC's current portfolio.

Timestamp: [19:33-20:36] Youtube Icon

🔨 Engineers vs. Salespeople: A Fundamental Difference in Approach

Palantir's breakthrough came from sending engineers instead of traditional salespeople to engage with clients. While other companies deployed relationship-focused sales teams with lengthy sales cycles, Palantir revolutionized the process by putting technical builders directly in front of decision-makers.

Traditional enterprise sales involved charismatic salespeople with "hair and teeth" taking clients to steakhouses, building relationships over months or years, trying to secure seven-figure contracts through personality and promises. The timeline could stretch from 6 weeks to 5 years, and often the software would never actually work as promised.

"Instead of sending someone who's like hair and teeth and they're in there and you know let's go to the let's go to the steakhouse you know it's all like relationship and you'd have one meeting uh they would really like the salesperson and then through sheer force of personality you'd try to get them to give you a seven-figure contract... and the software would never work." - Garry

Palantir's approach was radically different: put an engineer in the room with Palantir Foundry (their core data visualization and mining suite), and instead of the next meeting being about reviewing contracts or specifications, it would be about demonstrating working software.

"Instead of the next meeting being reviewing 50 pages of you know sort of sales documentation or a contract or a spec or anything like that it's literally like 'Okay we built it.' And then you're getting like real live feedback within days." - Garry

Timestamp: [20:47-22:03] Youtube Icon

🥊 David vs. Goliath: How Engineers Beat Enterprise Giants

The Forward Deployed Engineer model provides a blueprint for how small startups can compete against enterprise giants like Salesforce, Oracle, and Booz Allen. The key isn't trying to out-sales the big companies with their fancy offices and strong handshakes - it's showing something revolutionary.

"How does a really good engineer with a weak handshake go in there and beat them it's actually you show them something that they've never seen before and like make them feel super heard you have to be super empathetic about it like you actually have to be a great designer and product person." - Garry

Success requires engineers who can combine technical excellence with deep empathy and design thinking. The goal is to create software so powerful that when clients see something that makes them feel truly understood, they want to buy it immediately.

The strategy works because it cuts through traditional enterprise sales friction with demonstrable value. Instead of lengthy relationship-building cycles, engineers can create immediate "wow" moments that translate directly into business outcomes.

"You can just blow them away like the software is so powerful that you know the second you see something that you know makes you feel seen you want to buy it on the spot." - Garry

This approach represents the biggest opportunity for startup founders today - the ability to compete not on sales process but on actual problem-solving capability delivered through superior technology.

Timestamp: [22:09-22:47] Youtube Icon

👥 Founders as Forward Deployed Engineers: The Non-Delegatable Advantage

The panel emphasizes that founders must personally embody the Forward Deployed Engineer role - this critical function cannot be outsourced or delegated. Technical founders need to become the ethnographer, designer, and product person all in one.

"Founders should think about themselves as being the forward deployed engineers of their own company... you definitely can't farm this out like literally the founders themselves they're technical they have to be the great product people they have to be the ethnographer they have to be the designer." - Jared Friedman

The goal is to create such a compelling demonstration in the second meeting that prospects have never seen anything like it. This requires founders to personally observe user workflows, understand pain points, and rapidly build solutions that address real needs.

"You want the person on the second meeting to see the demo you put together based on the stuff you heard and you want them to say 'Wow I've never seen anything like that.' And take my money." - Garry Tan

This hands-on approach ensures that founders maintain direct connection to user needs and can iterate quickly based on real-world feedback. It's the difference between building what you think users want versus building what they actually need based on firsthand observation.

The Forward Deployed Engineer mindset transforms founders from distant product managers into embedded problem-solvers who understand their users' worlds better than anyone else.

Timestamp: [22:47-23:12] Youtube Icon

💎 Key Insights

  • Forward Deployed Engineers originated at Palantir to bridge the gap between world-class technologists and organizations with trillion-dollar problems
  • The most critical work in major institutions often relies on primitive tools like Word documents and Excel spreadsheets, creating massive optimization opportunities
  • Sending engineers instead of traditional salespeople fundamentally changes the sales process from relationship-building to value demonstration
  • Small startups can beat enterprise giants by showing revolutionary capabilities rather than competing on traditional sales processes
  • Founders must personally serve as their company's Forward Deployed Engineers - this role cannot be delegated or outsourced
  • Success requires combining technical excellence with empathy, design thinking, and ethnographic observation skills
  • The goal is creating immediate "wow" moments where prospects say "take my money" after seeing demonstrations built from direct user observation

Timestamp: [17:29-23:12] Youtube Icon

📚 References

People:

  • Peter Thiel - Co-founder of Palantir
  • Alex Karp - Co-founder of Palantir
  • Stefan Cohen - Co-founder of Palantir
  • Joe Lonsdale - Co-founder of Palantir
  • Nathan Gettings - Co-founder of Palantir

Companies:

  • Palantir - Data analytics company that pioneered the Forward Deployed Engineer model
  • Salesforce - Enterprise software company mentioned as competition
  • Oracle - Enterprise database company mentioned as competition
  • Booz Allen - Consulting firm mentioned as competition
  • Meta (formerly Facebook) - Social media company referenced for comparison
  • Google - Technology company referenced for comparison

Technologies/Products:

  • Palantir Foundry - Palantir's core data visualization and data mining suite

Concepts:

  • Forward Deployed Engineer (FDE) - Palantir's model of embedding engineers directly with clients
  • Data mining - What machine learning was called before AI became mainstream

Timestamp: [17:29-23:12] Youtube Icon

🤖 Vertical AI Agents: The FDE Model Accelerated

Vertical AI agents are successfully leveraging the Forward Deployed Engineer model to close unprecedented deals with large enterprises. The combination of FDE methodology with AI capabilities creates a powerful acceleration effect that's transforming enterprise sales cycles.

These companies can meet with end buyers and champions at big enterprises, capture that context, and immediately integrate it into their prompts. What previously required teams of engineers and longer development cycles can now be accomplished by just two founders working rapidly.

"This is why we're seeing a lot of the vertical AI agents take off is precisely this because they can have these meetings with the end buyer and champion at these big enterprises they take that context and then they stuff it basically in the prompt and then they can quickly come back in a meeting like just the next day." - Diana Hu

The speed advantage is remarkable - while Palantir might have taken longer with a full engineering team, AI-enabled startups can iterate overnight and return with working demonstrations. This has enabled them to close six and seven-figure deals with large enterprises, something that was previously impossible for small teams.

This model represents a fundamental shift in how enterprise software can be built and sold, with AI serving as a force multiplier for the Forward Deployed Engineer approach.

Timestamp: [23:19-24:03] Youtube Icon

🎤 Giga ML: Engineering Excellence Meets Forward Deployment

Giga ML exemplifies how talented software engineers can succeed by forcing themselves into the Forward Deployed Engineer role, even when they're not natural salespeople. The company specializes in customer support, particularly voice support, and has closed significant deals through technical demonstration rather than traditional sales approaches.

The founders physically go on-site following the Palantir model - after closing deals, they sit with customer support teams to continuously tune and optimize their LLM performance. However, their real innovation comes in the demo phase, where they win deals through superior technical capabilities.

"They force themselves to be essentially forward deployed engineers and they closed a huge deal with Zepto and then a couple of other companies they can't announce yet... once they close the deal they go on site and they sit there with all the customer support people and figuring out how to keep tuning and getting the software or the LLM to work even better." - Harj Taggar

Their competitive advantage lies in RAG pipeline innovations that enable voice responses to be both accurate and extremely low latency - a technically challenging combination that creates impressive demonstrations. This technical differentiation allows them to win against incumbents in ways that weren't possible before LLMs.

Timestamp: [24:03-25:04] Youtube Icon

⚡ The Demo Differentiation Advantage

The current LLM landscape has created unprecedented opportunities for technical differentiation in the demo phase of enterprise sales. Previously, it was nearly impossible to beat established players like Salesforce with incremental improvements to CRM interfaces or user experience.

"In the like pre sort of the current LLM rise you couldn't necessarily differentiate enough in the demo phase of sales to beat out incumbent so you can really beat Salesforce by having a slightly better CRM with a better UI." - Harj Taggar

Now, because AI technology evolves rapidly and achieving the last 5-10% of performance is extremely difficult, Forward Deployed Engineers can create dramatic competitive advantages. The process involves meeting with prospects, rapidly tweaking systems for their specific needs, and returning with demonstrations that create "wow" moments.

"Now because the technology evolves so fast and it's so hard to get this like last five 10 five to 10% correct you can actually if you're a forward deployed engineer go in do the first meeting tweak it so that it works really well for that customer go back with the demo and just get that oh wow like we've not seen anyone else pull this off before experience and close huge deals." - Harj Taggar

This represents a fundamental shift where technical excellence in AI implementation can directly translate to sales success, bypassing traditional enterprise sales processes through superior product demonstration.

Timestamp: [25:04-25:35] Youtube Icon

📞 Happy Robot: Seven-Figure Success in Logistics

Happy Robot demonstrates the scalability of the Forward Deployed Engineer model with AI voice agents in the logistics industry. They've achieved remarkable success by selling seven-figure contracts to the top three largest logistics brokers in the world, showcasing how quickly this approach can scale to major enterprise deals.

The company builds AI voice agents specifically for logistics brokers and follows the FDE model by engaging directly with CIOs and decision-makers. Their success stems from rapid product iteration and extremely quick turnaround times based on direct customer feedback.

"Happy Robot who has sold seven figure contracts to the top three largest logistic brokers in the world they build AI voice agents for that they are the ones doing the forward deploy engineer model and talking to like the CIOs of these companies and quickly shipping a lot of product like very very quick turnaround." - Diana Hu

Their trajectory demonstrates the acceleration possible with this model - starting with six-figure deals and progressing to seven-figure contracts within just a couple of months. This rapid scaling illustrates how sophisticated prompt engineering combined with the FDE approach can compress traditional enterprise sales timelines dramatically.

"It started from six figure deals now doing closing and seven figure deals which is crazy this is just a couple months after so that's the kind of stuff that you can do with uh I mean unbelievably very very smart prompt engineering actually." - Diana Hu

Timestamp: [25:35-26:11] Youtube Icon

🎭 The Personalities of Different LLMs

Each large language model exhibits distinct personalities and characteristics that make them suitable for different types of tasks and interactions. Understanding these personalities has become crucial for founders who need to select the right model for specific use cases.

Claude is recognized as the more approachable and human-steerable model, making it easier to work with for applications requiring nuanced human-like interactions. Its personality lends itself well to tasks where empathy and natural communication are important.

"Claude is sort of the more happy and more human steerable model." - Diana Hu

In contrast, Llama models require significantly more steering and feel more like interacting with a developer. This could be an artifact of having less Reinforcement Learning from Human Feedback (RLHF) training, making them more challenging to work with but potentially more powerful for users skilled in advanced prompting techniques.

"Llama 4 is one that needs a lot more steering it's almost like talking to a developer and part of it could be an artifact of not having done as much RLHF on top of it so is a bit more rough to work with but you could actually steer it very well if you actually are good at actually doing a lot of prompting and almost doing a bit more RLHF but it's a bit harder to work with." - Diana Hu

This personality understanding helps founders choose the right model for their specific applications and user interactions, similar to selecting different team members for different types of projects.

Timestamp: [26:11-27:06] Youtube Icon

📊 LLMs for Investment Decision Scoring

YC has begun using LLMs internally to help founders evaluate potential investors using structured scoring rubrics. This represents a practical application of AI for high-stakes decision-making where clear, quantifiable guidance is essential.

The system uses a straightforward 0-100 scoring rubric where 0 represents "never ever take their money" and 100 means "take their money right away - they help you so much that you'd be crazy not to take their money." This binary-to-graduated scale helps founders make critical funding decisions with more objective analysis.

"Sometimes you need a very straightforward rubric a zero to 100 zero being never ever take their money and 100 being take their money right away like they actually help you so much that you'd be crazy not to take their money." - Garry Tan

The approach demonstrates how LLMs can be applied to complex business decisions that traditionally relied on intuition and experience. By systematizing investor evaluation, YC can help founders make more informed decisions about whose money to accept - a choice that can significantly impact startup trajectories.

This application showcases the versatility of LLMs beyond customer-facing applications, extending into internal business operations and strategic decision-making processes.

Timestamp: [27:06-27:23] Youtube Icon

💎 Key Insights

  • Vertical AI agents can compress enterprise sales cycles from months to days by integrating customer context directly into prompts
  • The combination of Forward Deployed Engineer methodology with AI creates unprecedented acceleration in enterprise deal-making
  • Technical differentiation in AI demos can now overcome traditional enterprise sales advantages that incumbents previously held
  • Engineers without natural sales skills can succeed by forcing themselves into Forward Deployed Engineer roles and leading with technical excellence
  • Different LLMs have distinct personalities requiring different interaction approaches - Claude is more human-steerable while Llama needs more technical steering
  • Startups are achieving six to seven-figure enterprise deals within months using FDE+AI methodology
  • LLMs can be effectively applied to internal business decisions like investor evaluation using structured scoring rubrics

Timestamp: [23:19-27:23] Youtube Icon

📚 References

Companies:

  • Giga ML - Customer support company specializing in voice support that closed deals with Zepto using FDE model
  • Zepto - Company that became a customer of Giga ML
  • Happy Robot - AI voice agent company that sold seven-figure contracts to top logistics brokers
  • Salesforce - Enterprise CRM company mentioned as incumbent competition

AI Models:

  • Claude - LLM described as more happy and human-steerable
  • Llama - LLM that requires more steering and feels like talking to a developer

Technical Concepts:

  • RAG pipeline - Retrieval Augmented Generation pipeline that Giga ML innovated for voice responses
  • RLHF (Reinforcement Learning from Human Feedback) - Training process that affects LLM personality and steerability

Industries:

  • Logistics brokers - Industry where Happy Robot achieved seven-figure contracts with top three largest companies
  • Customer support - Industry focus for Giga ML's voice support solutions

Timestamp: [23:19-27:23] Youtube Icon

📊 Rubrics: The Foundation for Numerical Scoring

Best practices for LLM prompting include providing detailed rubrics, especially when seeking numerical outputs. Rubrics help models understand how to evaluate and differentiate between score levels, such as distinguishing between an 80 versus a 90 rating.

However, rubrics are inherently imperfect tools with inevitable exceptions and edge cases. The key insight is that different models handle these limitations in dramatically different ways, revealing distinct approaches to following guidelines versus exercising judgment.

"It's certainly best practice to give LLMs rubrics especially if you want to get a numerical score as the output you want to give it a rubric to help it understand like how should I think through and what's like a 80 versus a 90 but these rubrics are never perfect there's often always exceptions." - Harj Taggar

This fundamental tension between structured guidance and flexible judgment becomes particularly important when evaluating complex, nuanced scenarios where strict adherence to rules might miss important contextual factors.

The effectiveness of rubric-based prompting depends heavily on understanding how different models interpret and apply scoring frameworks.

Timestamp: [27:29-27:53] Youtube Icon

⚖️ Model Personalities: Soldier vs. High-Agency Employee

Comparing O3 and Gemini 2.5 Pro reveals fascinating differences in how models approach rubric interpretation. These differences reflect distinct "personalities" that affect their suitability for different types of evaluation tasks.

O3 demonstrates rigid adherence to provided rubrics, functioning like a disciplined soldier who follows instructions precisely. It heavily penalizes anything that doesn't fit the established criteria, prioritizing consistency and strict rule-following over contextual interpretation.

"O3 was very rigid actually like it really sticks to the rubric it's heavily penalizes for anything that doesn't fit like the rubric that you've given it." - Harj Taggar

In contrast, Gemini 2.5 Pro exhibits flexibility and judgment, behaving more like a high-agency employee. While it applies the rubric as guidance, it can reason through exceptions and adjust scores based on contextual factors that the rubric might not fully capture.

"Gemini 2.5 Pro was actually quite good at being flexible in that it would apply the rubric but it could also sort of almost reason through why someone might be like an exception or why you might want to push something up more positively or negatively than the rubric might suggest." - Harj Taggar

"O3 felt a little bit more like the soldier sort of like okay I'm definitely like check check check check check and Gemini Pro 2.5 felt a little bit more like a high agency sort of employee was like 'Oh okay I think this makes sense but this might be an exception in this case.'" - Harj Taggar

Timestamp: [27:53-28:57] Youtube Icon

💰 Real-World Investor Evaluation Examples

The investor scoring system reveals how LLM personality differences play out in practical applications. Some investors clearly merit immediate acceptance based on their exceptional processes and track records.

Top-tier firms like Benchmark and Thrive represent the high end of the scale - investors whose processes are so polished that founders should "take their money right away." These investors never ghost potential portfolio companies, respond to emails faster than most founders, and maintain consistently impressive operational standards.

"Sometimes you have investors like a Benchmark or a Thrive it's like 'Yeah take their money right away their process is immaculate they never ghost anyone they answer their emails faster than most founders it's you know very impressive.'" - Garry Tan

However, many situations involve nuanced judgment calls where exceptional investors might have operational weaknesses. Some investors have outstanding track records and genuinely care about their portfolio companies but struggle with time management, leading to slow responses and accidental ghosting despite good intentions.

"There are plenty of investors who are just overwhelmed and maybe they're just not that good at managing their time and so they might be really great investors and their track record bears that out but they're sort of slow to get back they seem overwhelmed all the time they accidentally probably not intentionally ghost people." - Garry Tan

These scenarios demonstrate exactly why LLMs are valuable - they can process complex, contradictory signals and provide nuanced scores (like 91 instead of 89) based on comprehensive evaluation rather than simple rule application.

Timestamp: [29:03-29:49] Youtube Icon

🧠 The Art of Communication: Managing AI Like People

The hosts reflect on prompt engineering as fundamentally similar to managing and communicating with people. The process requires clearly conveying information needed for good decision-making and establishing transparent evaluation criteria.

"It kind of actually feels like coding in you know 1995 like the tools are not all the way there there's a lot of stuff that's unspecified we're you know in this new frontier but personally it also kind of feels like learning how to manage a person where it's like how do I actually communicate uh you know the things that they need to know in order to make a good decision." - Garry Tan

This management analogy extends to ensuring AI systems understand how they'll be evaluated and scored, similar to setting clear expectations for human employees. The communication challenge involves translating complex requirements into clear, actionable guidance that enables consistent performance.

The parallel between AI prompting and people management suggests that many traditional management and communication skills translate directly to working with AI systems, making this a more intuitive process than purely technical approaches might suggest.

Timestamp: [30:15-30:37] Youtube Icon

🏭 Kaizen: The Meta-Prompting Manufacturing Philosophy

The podcast concludes with a powerful analogy connecting meta-prompting to Kaizen, the Japanese manufacturing philosophy that revolutionized car production in the 1990s. This principle holds that the people actually doing the work are best positioned to improve the process.

"There's this aspect of Kaizen you know this manufacturing technique that created really really good cars for Japan in the '90s and that principle actually says that the people who are the absolute best at improving the process are the people actually doing it that's literally why Japanese cars got so good in the '90s and that's metaprompting to me." - Garry Tan

Just as Japanese automakers achieved superior quality by empowering front-line workers to continuously refine manufacturing processes, meta-prompting enables AI systems to improve their own performance through iterative refinement. The practitioners using the prompts daily are positioned to identify improvements and optimize performance.

This philosophical framework positions meta-prompting not as a technical hack but as a systematic approach to continuous improvement that mirrors proven manufacturing excellence methodologies. It suggests that the future of AI development lies in creating systems that can self-improve through structured feedback loops.

Timestamp: [30:37-30:58] Youtube Icon

🎬 Conclusion: A Brave New World

The hosts wrap up by acknowledging they're operating in uncharted territory - a "brave new world" where the tools and best practices are still emerging. Despite being early in this frontier, the principles and techniques discussed provide a foundation for the rapidly evolving field of prompt engineering.

The episode concludes with encouragement for listeners to experiment with these concepts and develop their own prompting innovations, recognizing that the community is collectively building the knowledge base for this new discipline.

"I don't know it's a brave new world we're sort of in this new moment so with that we're out of time but can't wait to see what kind of prompts you guys come up with and we'll see you next time." - Garry Tan

Timestamp: [30:58-31:04] Youtube Icon

💎 Key Insights

  • Rubrics are essential for numerical scoring but must account for exceptions and edge cases that strict rule-following might miss
  • Different AI models exhibit distinct personalities: O3 acts like a rigid soldier while Gemini 2.5 Pro behaves like a high-agency employee with judgment
  • Real-world evaluation scenarios often require nuanced scoring that balances multiple contradictory factors
  • Prompt engineering resembles people management more than pure programming, requiring clear communication and expectation-setting
  • Meta-prompting follows Kaizen principles where practitioners are best positioned to improve the processes they use daily
  • The field is still in its early stages, comparable to coding in 1995, with tools and best practices rapidly evolving
  • Success requires experimentation and community knowledge-building as the discipline continues to mature

Timestamp: [27:29-31:04] Youtube Icon

📚 References

AI Models:

  • O3 - OpenAI model that demonstrates rigid adherence to rubrics, described as soldier-like
  • Gemini 2.5 Pro - Google model that shows flexibility and judgment in applying rubrics

Investment Firms:

  • Benchmark - Venture capital firm cited as example of top-tier investor with impeccable process
  • Thrive - Investment firm mentioned alongside Benchmark as having exceptional operational standards

Business Concepts:

  • Kaizen - Japanese manufacturing philosophy emphasizing continuous improvement by front-line workers
  • Rubrics - Scoring frameworks used to guide LLM evaluation and decision-making

Historical Context:

  • Japanese car manufacturing in the 1990s - Example of how Kaizen principles led to superior automotive quality

Timestamp: [27:29-31:04] Youtube Icon