A/B Testing Architecture in the Cloud

Introduction

In a data driven economy, being able to explore different models and compare them side to side in a real life setup is paramount. Therefore, there is a serious need to build architectures able to seamlessly deploy different models side by side. In this article we explore different ways of exposing a Machine Learning model via an API and to test it against a different variant We analyse different architectures viable when the application is deployed on in-house infrastructure and when the application is deployed in the cloud. The article focus on a very specific problem. Still, most of the considerations also apply to any substantial modification of a website in an A/B Testing setup.

Goal

The goal of this short tutorial is illustrating and comparing different architectures for large scale A/B Testing in the cloud, with particular focus on the advantages and disadvantages offered by each implementation in a service oriented framework.

Per-requirements

As many things in life conveying general ideas is often most important than providing to much detail, therefore this article is kept super simple, so that any reader able to understand the basics of a micro-service architecture should be able to follow.

Method

Here we compare three stages of the implementation of an A/B Testing architecture generalised from a real life business case.

Architecture 1

The first architecture we deal with is a port of a system designed to deploy on in-house hardware, therefore the focus is on having minimally invasive changes with regards to the hardware need, as hardware is a scarce resource. The system works fine as long as the changes to the main application are minimal. However, the process of deploying and run an A/B test is relatively cumbersome as it requires the main application to be modified and tested thoroughly. Moreover, the process is not easily reversible as consolidating one model or the other requires to persist the changes in the codebase, which in turn interferes with the usual feature development of the software product.To note, there is no trivial way to bail out of the test and restore the normal behavior of the website if e.g. the variant proves to seriously detrimental.

Architecture 2

The second architecture we analyse is a little improvement on the previous version, it is again bound to in-house infrastructure and it need to be able to work conservatively on the hardware provisioning. However, at this stage we can initially see the decoupling of the main web application from the logic driving the test and the load split across the variants of the experiment. The main web site is unaffected, the test is totally transparent to the main web application and the logic of the splitter is deployed as a single component. The advantages of this approach are multiple, the development of independent features of main application are not affected by the test, we have an easy bailout device, simply redirecting the traffic on one or the other variant of the test, once the test is consolidated we can easily reconfigure the application to use the more convenient service.The only little limit affecting this architecture is due to the fact that bot services must implement the same interface and communication contract. Thus, it might be a bit limiting in case we are not able to achieve this condition.

Architecture 3

The third version of the architecture fully leverages the advantages offered by versioning systems and deployment in the cloud. The use of a versioning system (Bitbucket) allows to easily branch and modify the main application, implement variations and fully deploy in the cloud without affecting the main application, once the test is ready to be activated the two versions of the website run next to each other in a hot-hot configuration, with the load splitter (usually implemented at CDN level) redistributing the load across the two versions of the website. The advantages are multiple, the test is totally decoupled from the main website, there is an easy device to redistribute the load and bail out from an under-performing test, if the test is positive it is generally easy to merge the test branch back to dev and cut a release. The time to market of each change is much shorter and one can implement as many changes as possible and the limiting factor becomes the user flow as each test needs to have sufficient statistical power to validate the results.If one adds a hint of GitOps to the CI/CD pipelines and a sufficiently sized and auto-scalable kubernetes cluster, most of the effort required to setup and run the test boils down to a few operations on the versioning system, some little change to the codebase to implement the variant, and a merge to right branch, automation takes care of deployment, load distribution and scaling rebalancing the load across the variants without breaking the bank. As the cost overhead is limited to a fraction of a single pod, while an application at full load usually saturates capacity of 10+ pods.

Results and Discussion

There are many ways of implementing a full fledged architecture able to run A/B Testing, any of the approaches fits the needs at a given point of the life-cycle of an application and can be adapted to flaming new green field projects and legacy applications.

The main difference between the approaches is given by budget and infrastructural constraints, I.e. cloud gives much higher flexibility and has a cheaper and faster time to market, working on on-prem has a bit higher cost and a slightly slower time tomarket as we can run a limited amount of test in parallel, still it does not significantly impair the ability of running a proper A/B Test.

To note, adoption of the proper architecture in a cloud based environment, providing the right tools for versioning, and an efficient pipeline for CI/DC dramatically improves productivity and reduces time to market and allows to run a higher number of tests with the same number of developers, therefore increasing productivity.

Do you want to try it out on your project?

Are you ready to take smarter decision?

Otherwise you can always drop a comment…