Guild

Guilds

Events

Dark Mode

English

Your AI Application Needs Evals: Evaluation-driven development in the era of prompts

Presentation byUras Mutlu

This talk introduces a crucial but often overlooked aspect of AI application development: evaluation-driven development (EDD). Using a simple LangGraph agent as a practical example, we'll demonstrate why and how to build a robust evaluation framework that goes beyond simple unit tests. We'll explore the importance of continuous evaluation during the development cycle and how this practice directly translates to the need for comprehensive observability in production, ensuring your AI application remains accurate, reliable, and effective in the real world.

Presented at these Events

Tue, Sep 30th · 5PMIn-Person

JSMonthly September Meetup

JavaScript Monthly London Meetup

Presented with these Guilds

JavaScript Monthly London Meetup

Javascript evolution has sped up (a lot) in recent years and event the most veterans developers find it hard to keep up with the latest trends. This meetup group aims to bring you monthly bite-sized updates on the world of Javascript along with a healthy dose of nice people, beer and pizza.

Please use your full name when registering, as some of our venues require a full list of attendees beforehand. You have an idea and you want to be a speaker?

We are always looking for more speakers - submit your talk here (https://docs.google.com/forms/d/e/1FAIpQLSdFaatfveOUbrmer47jYb5J4J4ttxAFc1CgTjUDltBXmDOJmg/viewform)

1.2KMembers

Similar Presentations

Intro to Building and Observing LangGraph Agents

AI agents are everywhere, but how do you go from demo to dependable? In this talk, we’ll build agents using LangGraph, and take it beyond toy examples into enterprise-grade readiness.

We’ll start with context engineering concepts like: control flow, prompt engineering, prompt management and tool calling. From there, we’ll build an agent step-by-step, enabling contextual reasoning via Retrieval-Augmented Generation (RAG), and integrating tool use for dynamic task execution. We'll also cover agent observability with open source observability tools such as LangFuse to bring full observability into the decision our agents make.

Whether you're exploring AI agents or planning your first production deployment, you'll leave this session with the patterns, tools, and clarity to do it right.

Carl Lapierre

Taming your LLM Application

It's pretty common to evaluate new changes with a vibe check but that makes it difficult to really know what's up with your system. We'll talk a bit about how to build more complex apps.

Ivan Leo

Agentic retrieval in practice - Multi stage agentic RAG and bringing down LangChain

Evals are the difference between building a demo and a product you can actually use. This talk is a little story about building a new class of product at StackOne. A front door for AI agents into the HR tech ecosystem. I'll talk about building workflows, agentic rag and how to make sure it doesn't all break all the time.

Matt Carey

Make Agents Talk to You

Agentic React apps where UIs autonomously act, chain tool calls, and evolve state—bring complexity that traditional observability tools can't unravel. How do you actually understand what your agents are doing in production?

In this talk, we'll explore agent observability: capturing not just logs, metrics, and traces, but the reasoning, tool usage, and decision paths behind agent behavior. You'll learn how to bring visibility into agent workflows, how OpenTelemetry semantic conventions help standardize telemetry, and how modern tools let your agents "talk back," so you can debug, understand, and trust what they're doing.

By the end, you will take away a practical workflow to instrument, trace, and interrogate agents—so your apps aren't operating in the dark.

Sergiy Dybskiy

Platform Sponsors

Torc is a community-first platform bringing together remote-first software engineer and developer opportunities from across the globe. Join a network that’s all about connection, collaboration, and finding your next big move — together.

Join our community today!

Don't let broken lines of code, busted API calls, and crashes ruin your app. Join the 4M developers and 90K organizations who consider Sentry “not bad” when it comes to application monitoring. Use code “guild” for 3 free months of the team plan.

https://sentry.io

Your AI Application Needs Evals: Evaluation-driven development in the era of prompts

Presentation byUras Mutlu

Presented at these Events

Tue, Sep 30th · 5PMIn-Person

JSMonthly September Meetup

JavaScript Monthly London Meetup

Presented with these Guilds

JavaScript Monthly London Meetup

Please use your full name when registering, as some of our venues require a full list of attendees beforehand. You have an idea and you want to be a speaker?

We are always looking for more speakers - submit your talk here (https://docs.google.com/forms/d/e/1FAIpQLSdFaatfveOUbrmer47jYb5J4J4ttxAFc1CgTjUDltBXmDOJmg/viewform)

1.2KMembers

Similar Presentations

Intro to Building and Observing LangGraph Agents

AI agents are everywhere, but how do you go from demo to dependable? In this talk, we’ll build agents using LangGraph, and take it beyond toy examples into enterprise-grade readiness.

Whether you're exploring AI agents or planning your first production deployment, you'll leave this session with the patterns, tools, and clarity to do it right.

Carl Lapierre

Taming your LLM Application

It's pretty common to evaluate new changes with a vibe check but that makes it difficult to really know what's up with your system. We'll talk a bit about how to build more complex apps.

Ivan Leo

Agentic retrieval in practice - Multi stage agentic RAG and bringing down LangChain

Matt Carey

Make Agents Talk to You

By the end, you will take away a practical workflow to instrument, trace, and interrogate agents—so your apps aren't operating in the dark.

Sergiy Dybskiy

Platform Sponsors

Join our community today!

https://sentry.io

Guild

Docs Terms Privacy

Get in touch!

hi@guild.host