_
_

Join experts from RapDev to explore best practices for instrumenting and monitoring production LLM systems in Datadog, including OpenAI, Anthropic, Amazon Bedrock, and Azure OpenAI integrations.

Tuesday
,
Mar 31, 2026
Your Local
Timezone
2026-03-31 12:00 pm
-
2026-03-31 1:00 pm
ET
Online (Zoom)
,
Webinar
,
Online (Zoom)
,
Webinar
,

Deploying LLM-powered applications but struggling to manage cost, latency, and unpredictable model behavior? What if you could bring structure and full observability to your AI workloads from day one?

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

  • Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
  • Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
  • Correlating LLM prompts with APM traces to measure impact on application performance
  • Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
  • Monitoring retries, timeouts, fallback models, and anomalous usage spikes
Don't miss the expert sessioN

Deploying LLM-powered applications but struggling to manage cost, latency, and unpredictable model behavior? What if you could bring structure and full observability to your AI workloads from day one?

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

  • Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
  • Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
  • Correlating LLM prompts with APM traces to measure impact on application performance
  • Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
  • Monitoring retries, timeouts, fallback models, and anomalous usage spikes
SPEAKERS
Alex Glenn
Senior Datadog Engineer
RapDev
Head of IT Operations
Coop Norge

Join our session for insights into some lessons we've learned along the way.

Deploying LLM-powered applications but struggling to manage cost, latency, and unpredictable model behavior? What if you could bring structure and full observability to your AI workloads from day one?

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

  • Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
  • Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
  • Correlating LLM prompts with APM traces to measure impact on application performance
  • Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
  • Monitoring retries, timeouts, fallback models, and anomalous usage spikes

Join our session for insights into some lessons we've learned along the way.

This webinar explores the new observability challenges introduced by large language models, including token-based cost variability, latency fluctuations, prompt and response quality concerns, and downstream service dependencies. You’ll walk away with actionable guidance to ensure your LLM workloads are observable, governed, and production-ready.

In this session, we’ll dive into:

  • Establishing token usage conventions and cost tracking across LLM providers to implement safe logging, detect model drift, and restrict access to sensitive data
  • Defining SLIs such as latency, error rate, cost per request, and hallucination proxy metrics
  • Correlating LLM prompts with APM traces to measure impact on application performance
  • Instrumenting OpenAI, Anthropic, Bedrock, and Azure OpenAI integrations in Datadog
  • Monitoring retries, timeouts, fallback models, and anomalous usage spikes
SPEAKERS
SPEAKER
Alex Glenn
Senior Datadog Engineer
RapDev