Observing LLMs in Production: Establishing the Foundation for AI Reliability

All Events

RapDev Webinar

Observing LLMs in Production: Establishing the Foundation for AI Reliability

March 31, 2026

12:00 pm

-

1:00 pm

ET

Online (Zoom)

,

Webinar

Register

Join experts from RapDev to explore best practices for instrumenting and monitoring production LLM systems in Datadog, including OpenAI, Anthropic, Amazon Bedrock, and Azure OpenAI integrations.

Tuesday

,

Mar 31, 2026

Your Local
Timezone

2026-03-31 12:00 pm

-

2026-03-31 1:00 pm

ET

Online (Zoom)

,

Webinar

,

Deploying LLM-powered applications but struggling to manage cost, latency, and unpredictable model behavior? What if you could bring structure and full observability to your AI workloads from day one?

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
Correlating LLM prompts with APM traces to measure impact on application performance
Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
Monitoring retries, timeouts, fallback models, and anomalous usage spikes

Don't miss the expert sessioN

Deploying LLM-powered applications but struggling to manage cost, latency, and unpredictable model behavior? What if you could bring structure and full observability to your AI workloads from day one?

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
Correlating LLM prompts with APM traces to measure impact on application performance
Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
Monitoring retries, timeouts, fallback models, and anomalous usage spikes

SPEAKERS

Alex Glenn

Senior Datadog Engineer

RapDev

Paul Noer-Rønning

Head of IT Operations

Coop Norge

Join our session for insights into some lessons we've learned along the way.

Deploying LLM-powered applications but struggling to manage cost, latency, and unpredictable model behavior? What if you could bring structure and full observability to your AI workloads from day one?

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
Correlating LLM prompts with APM traces to measure impact on application performance
Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
Monitoring retries, timeouts, fallback models, and anomalous usage spikes

Join our session for insights into some lessons we've learned along the way.

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
Correlating LLM prompts with APM traces to measure impact on application performance
Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
Monitoring retries, timeouts, fallback models, and anomalous usage spikes

SPEAKERS

SPEAKER

Alex Glenn

Senior Datadog Engineer

RapDev

RapDev Expertise

Check out the latest from RapDev

No items found.