Programming

Pure serverless machine learning inference with AWS Lambda and Layers

6 years ago

Anonymous $L9wC17otzH

https://medium.com/merapar/pure-serverless-machine-learning-inference-with-aws-lambda-and-layers-979702d9ae49

After designing and learning a ML model, the hardest part is actually running and maintaining it in production. AWS is offering to host and deploy models via On-Demand ML Hosting on various instance types and sizes, packaged as the SageMaker service. These start with a ml.t2.medium and go up to GPU accelerated ml.p3.16xlarge instances.
To stay serverless in an already serverless deployment and also to keep costs low and allow for flexible scaling, a pure AWS Lambda based ML model inference is desirable. Such approaches exist and have already been described elsewhere, but have some caveats, like running via a REST interface and an API Gateway endpoint. In a bigger Lambda and step functions production deployment it is possible just to invoke a pure Lambda event triggered flow without the REST and API gateway detour. Since a lot of AI and ML tooling is based on Python and uses Python based tools like scikit-learn, a Python based Lambda runtime is chosen.