Many customers have ML applications with intermittent or unpredictable traffic patterns. Selecting a compute instance with the best price performance for deploying machine learning (ML) models is a complicated, iterative process that can take weeks of experimentation. Rather than provisioning for peak capacity upfront, which can result in idle capacity or building complex workflows to shut down idle instances, you can now use Amazon SageMaker serverless inference and Amazon SageMaker Inference Recommender. In this session, learn to select serverless when deploying your ML model and how Amazon SageMaker automatically provisions, scales, and turns off compute capacity based on the volume of inference requests. Use Amazon SageMaker Inference Recommender to load test and automatically select the right compute instance type, instance count, container parameters, and model optimizations for inference to maximize performance and minimize cost. Dive deep into these new features, available in preview.
Speaker: Kapil Pendse, Principal Solutions Architect, AWS
Download slides