Building and Scaling GenAI Workloads with Amazon EKS

This event has limited capacity. Register today to secure your spot.

Hands-on workshop Level 400

Upcoming Sessions

English Online
Register 5 seats left
English Online
Register 9 seats left
Español Online
Register 7 seats left

Share this workshop

About the event

A hands-on workshop where you'll deploy a real large language model on Amazon EKS with NVIDIA GPUs, then push it to the performance ceiling using the same patterns AWS customers run in production today.

Most teams can stand up a model. Far fewer know how to serve it efficiently, prove it under load, or scale it to many users without GPU costs spiraling. By the end of this workshop, you will.

What you'll build:

  • Serve a real model: Deploy an open-source Mistral LLM on EKS with GPU-accelerated nodes, behind a standard inference API your apps can plug into
  • See what's actually happening: Wire up Grafana dashboards for token throughput, latency, and GPU utilization in real time
  • Build an agent on top: Ship a working agentic app that calls tools against your self-hosted model — the pattern behind real assistants and copilots
  • Benchmark and optimize: Measure your deployment under load, find where it breaks, then apply targeted fixes and verify the gains with data
  • Cut response time up to 3.6× with KV cache offloading: Reuse cached prompt history across requests and pods — a huge win for chatbots, RAG, and anything with repeated context
  • Scale with distributed serving: Combine vLLM with Ray to handle bursty traffic, autoscale across GPUs, and serve users through a chat UI

Why it matters:

A working deployment is the easy part. Production GenAI workloads fail on cost, latency, and operability — not on whether the model loads. This workshop gives you the measurement and optimization habits to defend GPU spend and hit your latency goals.

Who should attend:

ML and data engineers, platform engineers, developers, and technical founders who want production answers, not a demo. Familiarity with Kubernetes basics helps. We bring the GPUs, cluster, and code.

You'll leave with a working stack, the Terraform to redeploy it in your own account, and the playbook to keep optimizing.

Register