Serverless ML: Lessons from Capital One

Best practices for migrating an ML model to serverless for scalability and manageability.

When Capital One started its migration to the cloud with Amazon Web Services (AWS) a decade ago, we made history as the first U.S. financial services company to go all-in on the public cloud. Today, we run more than 2,000 applications in AWS and have pivoted to a serverless-first approach, with more than a third of our apps leveraging serverless infrastructure. 

The case for serverless ML

Seeing the many benefits of running serverless at scale across the enterprise, our team wanted to try leveraging serverless for our anomaly detection model. However, there was doubt if machine learning (ML) models would even qualify as a viable use case. 

What our team does

Our team’s mission is to provide early detection of volume anomalies for critical transactions on our digital platforms, such as our mobile app or web browsers. For example, we need to identify anomalies that customers experience when logging into their mobile app. (Yes, login is considered a critical transaction. With a login failure, customers cannot perform their desired financial transactions.) We built an ML solution that detects volume anomalies for those critical transactions, allowing any necessary fixes to be made as quickly as possible so we can minimize the impact to customers. 

Why we considered serverless

Unsurprisingly, this model requires a lot of data and processing time to function efficiently. As our model was scaled up to monitor additional critical transactions, our team became burdened by the manual toil and cost to maintain the ML code. The scalability we desired was not as easy to come by as it was for other Capital One teams with serverless applications. It was at that time that our team started to consider if our ML model would be a viable use case for serverless computing. 

The benefits of a serverless ML model

Long story short, it is a viable use case. We successfully moved our ML model to serverless architecture, and found three major improvements: 

  • Provisioning. Because serverless is a full stack of services you pay for as you go, provisioning is simple. It responds well to requests and specific events and can scale to zero when the function is no longer in use.   

  • Scaling. With our model now on serverless architecture, when we send data up to the host, our databases automatically scale up to handle all the expanded requirements. We’re still paying for the read/write traffic, and storage fees are worked into our subscription with AWS, but that’s it.

  • Maintaining. Before going serverless, our ML model took up an inordinate amount of our DevOps teams’ working time. Trying to scale container architecture requires somebody to be working in Kubernetes, while serverless outsources the work. Now we are entirely hands-off with maintenance functions. 

Capital One uses serverless at scale

See how we’re building and running serverless applications at a massive scale.

Overcoming the challenges to serverless ML

When we first got started, the concern was that constraints such as code package size and runtime seemed to rule out ML for AWS Lambda. Many ML models and their libraries are presumably too large to load for serverless compute. A model that has years of training data and is overly heavy in compute might not be a good fit for serverless architecture. However, we ascertained that we had a model that was lightweight enough and we might be able to find solutions to reap the benefits of serverless that many other teams at Capital One were enjoying.

In migrating our ML model to serverless architecture, we encountered challenges that tested our capabilities. Some were anticipated and planned for, while others caught us off guard. And at times, we found that tackling a problem led to us improving our ML model. 

Problem 1: Data size

The team determined that the training data set was too large. It was taking way too much time to train our ML model. This was a new problem for us since the legacy system had always been built with extra slack space for databases and processing. Serverless ML models depend on their superior efficiency, so that meant we were struggling to keep up at the points where we needed the most speed and processing.

Solution

We started to sample our datasets in order to reduce the size of data and still keep enough information to train the model. Through experimenting with data sampling, we were able to reduce to only an order of magnitude data from the original data volume.  We noticed vastly reduced processing times immediately without sacrificing model performance.

Problem 2: Memory

Problem number two was the memory issue. Machine learning needs large libraries to function properly, and it’s hard to imagine an exception to this rule. Even though our model was considered relatively lightweight, it was still too big to comfortably fit into AWS Lambdas as is.

Primary solution

The first fix we tried with our own model was layers. The Capital One team used layers to implement each of the libraries. Using layers, you can run up to 250MB in a package, with up to five layers per function. If it helps you picture what we were up to, you could say we were uploading each library to a layer. From the Lambda, we were able to call on the related layer to directly import the library from it.

Secondary solution

As we hit our limits at five layers each, we were still straining to fit. So, we decided to try dynamic library installations. We compressed and uploaded the libraries, which were then downloaded into Lambda. After that, we just had to decompress and import the library contents. This solution can compromise runtime, so we didn’t do this everywhere. The best approach was to pick and choose which libraries we were going to compress and leave others to run normally.

Problem 3: Runtime

Capital One uses AWS Lambda, which works well for us but a challenge we have found is the runtime constraint. Lambda’s limit at the moment is 15 minutes. That’s usually enough for applications that don’t run ML code, but some of our functions within the model needed longer than 15 minutes to operate.

Solution

We attacked this issue by reviewing any code that ran longer than 15 minutes and deciding on a case-by-case basis whether we should use a different solution or library to get the same effect. We learned that our cloud provider offered ML-specific tools, with one that offered an asynchronous inference option to handle long processing times. The same solution can work on other tools but we opted for the most challenging one. Ultimately, this exercise to find a solution actually helped us improve our ML. Since we had to work to find faster solutions, our ML model ultimately ran in less time overall. 

 Lessons learned in the journey to serverless architecture

It wasn’t a straight line to the serverless ML model we’re using now, but the struggle was worth the reward. We’re now able to share what we learned along the way, so you too can find success. 

Tips to ensure success prior to migration

Before you get started, make sure you are set up to succeed. These tips will help position your model for an easier migration:

  • Start from the cloud. The migration to serverless is far more efficient if everything is already in the right place to share across applications, so be on the cloud before starting the move. If nothing else, your DevOps crew will have a much easier time getting the various components to talk to each other in the same ecosystem.

  • Select the appropriate serverless option for your model. Once your ML ecosystem is entirely on the cloud, it’s time to decide on a serverless option that will work best for you. Check what options are offered by your cloud provider. Take your time with this stage. Weigh the pros and cons of each package they offer and then choose your path accordingly. 

  • Review your ML solution prior to the move. Decide whether your model seems to fit into the serverless architecture you have access to. If not, game out what changes to make before you commit to the move. 

Best practices for migrating to serverless ML

Now that your model is set up for the move, you can begin the shift to serverless. As you are transitioning to serverless, consider the following best practices:

  • Break your model down into smaller components. ML models and their libraries tend to be huge. Try breaking your model down into smaller pieces to make it more manageable, and try out the serverless applications you’ve taken up until you’re comfortable working in the new environment. 

  • Decouple your data engineering. If at all possible, try to decouple your data engineering from the rest of the model. This part of the project is notoriously processing-heavy, so treat it like its own thing and work through the inevitable early hiccups until you have a smooth flow you’re comfortable with. 

  • Deploy each component of the model on its own. More than anything else, deploy each piece of the model on its own. Every component of your ML model can probably operate as something close to a standalone unit for all intents and purposes. Take advantage of that and decouple components from each other and into bite-sized microservices architecture for better scalability and manageability. 

As you navigate to serverless architecture, continue to revisit the evolving machine learning tech and look for replacements for the older models you’ve been using. Above all else, we recommend that you stay flexible throughout the transition. 

Serverless ML models: The future of machine learning

We are excited to be part of leveraging new approaches and forming the basis of future best practices for serverless machine learning. Our success would not have been possible without the collective work of our team at Capital One. Fawaz Mohsin, Ethan Cole, Nour Omar and Gaurav Jain were instrumental in the shift to serverless ML. 

We hope that, like us, you discover and embrace the thrill of discovering new learnings as you explore the uncharted territory of serverless ML models.

Learn more about machine learning at Capital One.


Fatma Ben Yemna and Lina Xiang, lead machine learning engineer & data scientist

Fatma Ben Yemna is a lead machine learning engineer, building predictive models for Cloud Productivity and Engineering teams at Capital One. She has a master’s degree in computer science. Lina Xiang is a data scientist for Capital One’s Enterprise Platforms Tech business. She has a master’s degree in management of information systems. At Capital One, Lina works on time series anomaly detection and root cause analysis ML modeling.