ECS GenAI Inference: vLLM on AWS Inferentia with Neuron #250

boringgeek · 2024-12-28T08:41:22Z

Description

This new example in the ECS Blueprints project demonstrates how to set up infrastructure for running GenAI inference using vLLM with AWS Neuron on Inferentia 2 instances. It creates an ECS cluster with an autoscaling group of inf2 instances, deploys a vLLM service for handling inference requests, and sets up an Application Load Balancer to expose the service endpoint. The solution uses pre-compiled Neuron-compatible models and is designed for scalable GenAI workloads. It includes steps for preparing a custom Docker image with vLLM and necessary dependencies, deploying the infrastructure using Terraform, and provides an example of how to send inference requests to the deployed service. This blueprint offers a streamlined way to leverage AWS's specialized AI hardware for efficient large language model inference within an ECS environment.

Motivation and Context

This change addresses a significant gap in the available examples for implementing inference workloads on ECS, particularly when compared to existing resources for EKS and EC2. Multiple AWS partners have requested comparable ECS-based solutions, especially for projects like vLLM. This example fills that need by providing a functional, ECS-specific implementation modeled after recent examples for other platforms. It ensures that ECS users have access to up-to-date, practical guidance for deploying GenAI inference workloads, bringing ECS documentation in line with resources available for other AWS compute services.

How Has This Been Tested?

I have tested and validated these changes using one or more of the provided examples/* projects
[ x] I have executed pre-commit run -a on my pull request - NOTE: this also applied changes to files in other examples within the project. This explains the changes made outside of my additional example.

boringgeek added 2 commits December 28, 2024 00:03

Added vLLM-Inferentia example to terraform ec2-examples

77cbfd0

Added vLLM-Inferentia example to terraform ec2-examples

1fa34e9

boringgeek requested a review from a team as a code owner December 28, 2024 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECS GenAI Inference: vLLM on AWS Inferentia with Neuron #250

ECS GenAI Inference: vLLM on AWS Inferentia with Neuron #250

boringgeek commented Dec 28, 2024

ECS GenAI Inference: vLLM on AWS Inferentia with Neuron #250

Are you sure you want to change the base?

ECS GenAI Inference: vLLM on AWS Inferentia with Neuron #250

Conversation

boringgeek commented Dec 28, 2024

Description

Motivation and Context

How Has This Been Tested?