Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECS GenAI Inference: vLLM on AWS Inferentia with Neuron #250

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

boringgeek
Copy link

Description

This new example in the ECS Blueprints project demonstrates how to set up infrastructure for running GenAI inference using vLLM with AWS Neuron on Inferentia 2 instances. It creates an ECS cluster with an autoscaling group of inf2 instances, deploys a vLLM service for handling inference requests, and sets up an Application Load Balancer to expose the service endpoint. The solution uses pre-compiled Neuron-compatible models and is designed for scalable GenAI workloads. It includes steps for preparing a custom Docker image with vLLM and necessary dependencies, deploying the infrastructure using Terraform, and provides an example of how to send inference requests to the deployed service. This blueprint offers a streamlined way to leverage AWS's specialized AI hardware for efficient large language model inference within an ECS environment.

Motivation and Context

This change addresses a significant gap in the available examples for implementing inference workloads on ECS, particularly when compared to existing resources for EKS and EC2. Multiple AWS partners have requested comparable ECS-based solutions, especially for projects like vLLM. This example fills that need by providing a functional, ECS-specific implementation modeled after recent examples for other platforms. It ensures that ECS users have access to up-to-date, practical guidance for deploying GenAI inference workloads, bringing ECS documentation in line with resources available for other AWS compute services.

How Has This Been Tested?

  • I have tested and validated these changes using one or more of the provided examples/* projects
  • [ x] I have executed pre-commit run -a on my pull request - NOTE: this also applied changes to files in other examples within the project. This explains the changes made outside of my additional example.

@boringgeek boringgeek requested a review from a team as a code owner December 28, 2024 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant