System design: A captcha service — Part 1 of 2
In this system design scenario, we will go through exercise for adding captcha support for your websites to make them more secure. This is a two part series
- Part 1: Introduction and architecture
- Part 2: Detail design and implementation
Lets first understand what is captcha — CAPTCHA stands for the Completely Automated Public Turing test to tell Computers and Humans Apart. CAPTCHAs are tools you can use to differentiate between real users and automated users, such as bots. CAPTCHAs provide challenges that are difficult for computers to perform but relatively easy for humans. For example, identifying stretched letters or numbers, or clicking in a specific area. You can see a few examples images below
What are the uses of captcha
CAPTCHAs are used by any website that wishes to restrict usage by bots. a few uses are
- Limit fake registrations on your website done using bots
- Prevent false comments on social media sites, message boards and channels
How Does CAPTCHA Work?
CAPTCHAs work by providing information to a user for interpretation. Traditional CAPTCHAs provided distorted or overlapping letters and numbers that a user then has to submit via a form field. The distortion of the letters made it difficult for bots to interpret the text and prevented access until the characters were verified.
Since CAPTCHA was introduced, bots that use machine learning have been developed. These bots are better able to identify traditional CAPTCHAs with algorithms trained in pattern recognition. Due to this development, newer CAPTCHA methods are based on more complex tests. For example,
- Picture captcha requires you to solve a picture puzzle by clicking on specific objects in parts of the image
- reCAPTCHA requires clicking in a specific area and waiting until a timer runs out
Lets also understand drawbacks of CAPTCHA -
The overwhelming benefit of CAPTCHA is that it is highly effective against all but the most sophisticated bad bots. However, CAPTCHA mechanisms can negatively affect the user experience on your website:
- Disruptive and frustrating for users
- May be difficult to understand or use for some audiences
- Some CAPTCHA types do not support all browsers
- Some CAPTCHA types are not accessible to users who view a website using screen readers or assistive devices
Problem statement
Lets say we want to make our website more secure by adding captcha. We will start by listing requirements. We will then proceed to create high level architecture. Based on the architecture we will then decide the tech stack to use. Finally we will look at implementation details
Lets list our functional and non-functional requirements
First of all we want to build a captcha service that will help to generate captcha and verify the user response provided for the captcha that is rendered on user’s browser.
Since there can be many types of captcha, we want to support strategy in our captcha service. This strategy will help to generate required type of captcha and response verification logic based on our requirements
We will need to store the generated captcha information for response verification. This can be stored in in-memory (session or local cache) or database. Later we will see when we should use cache or database and why
We also want to ensure that our captcha service supports various non-functional requirements. There are many non-functional requirements and in this system design, we will consider the following requirements
- High availability
- Scalability
- Performance
- Security
High level architecture
Lets start with basic high level architecture -
Lets consider, we are developing a web application that will be accessed by user using web browser. The user sees a login page or a registration page. This is where we want to show a captcha so that login or registration is protected from automated users or bots. This is shown by number 1 in the diagram
We have a captcha service that will provide a random captcha everytime. This is represented by number 2 in the diagram.
The generated captcha needs to be stored in some data store so that the user response can be verified. An obvious choice is session or local cache. However if the machine where captcha service is running goes down or is restarted, the data will be lost and we will not be able to verify the captcha. This is one reason we will use an external cache service. The cache service can either store user sessions in a persistent way or it can be used to store the captcha information with some unique identifier. this is shown by number 3 in the diagram.
If not using cache, we can use a database to store session information or captcha information. The database may have higher latency compared to cache, nevertheless its an option that you can consider. This is shown as number 4 in the diagram
Non functional requirements
Now lets consider non functional requirements. Here is the final state architecture and we will talk about various aspects
What if the captcha service goes down? this will impact all login and registration operations. This counts as system downtime and not at all desired. To improve the availability, we need to add more instances of captcha service. So lets add one more instance. Lets also add a load balancer that distribute traffic to captcha service instances. If any one of the captcha service instance goes down, the other one can service requests and thus the system will not be completely down. This is shown as point 2 in diagram above.
Same reason, we also want cache or database not to be single point of failure. Highly available cache service can be used which has a master and multiple replicas. This is shown in point 3 in the diagram.
Similarly a SQL database can use active/standby setup to improve availability. The standby will become active if the active database instance goes down. It can also support primary/secondary topology with multiple read replicas. This further improves availability and performance since read operations can be redirected to read replicas and primary is not overloaded with read requests. This is shown as point 4 in the diagram.
NoSQL databases also supports sharding which again improves the overall availability.
What if our website if a hit and we have more and more users logging into web application and using it? It may happen that the 2 instances of captcha service are not sufficient to handle all the increased load. we can bump up the resources by increasing CPU and memory to handle increased load. This is called vertical scaling. Vertical scaling has a limitation that you cannot increase resources above a certain limit.
Alternately we can add more captcha service instances. This is called horizontal scaling. In cloud, you would typically use scale-set or auto scaling group to add more instances when the load increases. The load is determined based on service metrics like CPU usage or memory usage. We will not go into those details as of now. Note that, in order to horizontally scale, you must ensure that your service is stateless. This means that the service instance does not store any user specific data locally which can prevent other service instances to address user requests if the instance holding user data goes down. This is shown as point 5 in the diagram.
The third non-functional requirement is performance. By horizontally scaling we address the performance requirement to some extent. The load balancer can distribute requests to multiple captcha service instances in a round robin fashion which ensures that a single instance is not overloaded. We also talked previously that a cache service will typically have lower latency compared to database since the data is stored in memory and retrieved very fast. It however comes with a cost. Depending on your application, if it is acceptable to have a comparatively higher latency then you can go with database instead of cache.
Your web application may be subjected to various attacks including sql injection, cross site scripting, DDOS, man-in-the-middle and brute force attack to name a few. Some of these needs to be handled at application level while some of these can be mitigated using firewall. We will not go into details of what firewall to use and its configuration, but lets add a firewall to protect our web app. This is shown as point 6 in the diagram.
From a security perspective, we don’t want to protect the captcha service apis since these will always be required to be called for login or registration. We should however secure them using https.
In the next article, we will take a look at tech stack, detail design and implementation details.
If you like this, please follow me and give a clap.