This event has limited capacity. Register today to secure your spot.
A hands-on workshop where you'll deploy a real large language model on Amazon EKS with NVIDIA GPUs, then push it to the performance ceiling using the same patterns AWS customers run in production today.
Most teams can stand up a model. Far fewer know how to serve it efficiently, prove it under load, or scale it to many users without GPU costs spiraling. By the end of this workshop, you will.
What you'll build:
Why it matters:
A working deployment is the easy part. Production GenAI workloads fail on cost, latency, and operability — not on whether the model loads. This workshop gives you the measurement and optimization habits to defend GPU spend and hit your latency goals.
Who should attend:
ML and data engineers, platform engineers, developers, and technical founders who want production answers, not a demo. Familiarity with Kubernetes basics helps. We bring the GPUs, cluster, and code.
You'll leave with a working stack, the Terraform to redeploy it in your own account, and the playbook to keep optimizing.