Welcome to From model weights to API endpoint with TensorRT-LLM presented at The AI Engineer World's Fair!
We're your hosts, Pankaj Gupta and Philip Kiely from Baseten, and we're thrilled to have you here today.
This workshop has three live coding components, which correspond to numbered folders:
- Building a TensorRT engine manually with TensorRT-LLM
- Building an engine automatically on deployment with Truss
- Benchmarking deployed models
Specific instructions for each component are in the respective folders' READMEs.
Let's get some TPS!
— Pankaj and Philip