Pure serverless machine learning inference with AWS Lambda and Layers
https://medium.com/merapar/pure-serverless-machine-learning-inference-with-aws-lambda-and-layers-979702d9ae49
After designing and learning a ML model, the hardest part is actually running and maintaining it in production. AWS is offering to host and deploy models via On-Demand ML Hosting on various instance types and sizes, packaged as the SageMaker service. These start with a ml.t2.medium and go up to GPU accelerated ml.p3.16xlarge instances.
To stay serverless in an already serverless deployment and also to keep costs low and allow for flexible scaling, a pure AWS Lambda based ML model inference is desirable. Such approaches exist and have already been described elsewhere, but have some caveats, like running via a REST interface and an API Gateway endpoint. In a bigger Lambda and step functions production deployment it is possible just to invoke a pure Lambda event triggered flow without the REST and API gateway detour. Since a lot of AI and ML tooling is based on Python and uses Python based tools like scikit-learn, a Python based Lambda runtime is chosen.