Meet The New Faster FastOrSlow.com
Last Friday, Fast or Slow got hit hard with traffic from Hacker News. As I mentioned in the comments on the HN thread, we had been working on a re-architecture of the application, in addition to a migration into AWS to make Fast or Slow massively scaleable. We soft launched a few weeks earlier and were planning on a hard launch after we had completed the AWS migration.
But as the quote goes: Life is what happens when you’re planning something else. And it did.
This post is fairly technical and we include the general architecture of Fast or Slow below. We’re hoping that other developers and ops teams may learn from some of our experience. We’ve certainly learned a lot.
This is what our traffic spike on Friday looked like:
For some of our sites, 15,000 visitors in a day is no big deal. On Wordfence.com we have seen over 200,000 visitors in a single day. This was fine because we had fast, horizontally scaleable web servers to handle application requests with edge caching.
Fast or Slow is different. For each website profile request, the application performs a huge amount of work. We launch a headless Chrome instance in all of our locations, profiling performance and aggregating the results for every job.
Friday’s spike from HN arrived suddenly. It looked like this:
This was an interesting challenge. Our team had not yet battle tested the infrastructure and we got slammed with what, for a new application that does this much work, was a lot of traffic. We remained at #1 on the Hacker News home page for hours, and traffic scaled up several times Friday night. The last time was 3AM mountain time, 5AM eastern, when we briefly went down because our central infrastructure needed more resources. We upgraded and were back within a few minutes.
Thankfully Ryan B, our lead architect, had built in a queuing system into Fast or Slow. It was breaking our hearts that HN was kind enough to take an interest in us, but users had to wait in line. The queuing system served its purpose in keeping the site responsive through the night with graceful degradation.
The HN comment thread was supportive. I think that most of the community knows what happens when your project ends up at #1 on HN; they are smart enough to understand how much work our application was doing.
Moving into AWS
Our early implementation of Fast or Slow used AWS for the central infrastructure, and a mix of Linode, Vultr and other VPS providers around the world for our headless Chrome instances. We call those geographically dispersed instances ‘Gnomons’. We wanted the Gnomons to be able to run anywhere, including places like South Africa. They needed to be flexible and platform agnostic.
As we rolled out to various locations, we discovered that some providers, particularly one in India and another in Johannesburg, South Africa, had severe jitter in their network and server performance. None of the machines running Gnomons in AWS showed these kinds of issues. When our customers are testing the performance of their own sites, we don’t want to have to account for large performance variance on the machine doing the benchmarking, or in the provider’s network.
So we removed the Johannesburg location and another that was cause problems, and we got more consistent results. But AWS proved to provide consistent network and server performance, which was a catalyst towards rethinking our design.
It also quickly became clear that we needed to scale up rapidly to handle demand spikes. We knew this before the HN traffic came along; our large audience via Wordfence allows us to throw thousands of concurrent users at anything we launch. We tested sending small traffic spikes to Fast or Slow, found it was sensitive, and learned that we had to be careful.
Scott started playing with spot instances in AWS, which are low priced server instances that are available due to lower demand, that can go away at any time. We found that spot instances were ideal for our purposes.
The final clincher was that AWS launched in Cape Town, South Africa. This gave us a high quality location on the tip of Africa. When we summed up the total locations we would get by committing 100% to AWS, we had a total of 18, which was awesome. Incidentally, I’m from Cape Town. So this made me doubly happy.
Moving from POST pushes to SQS queues
The first version of Fast or Slow had the central application server ‘push’ jobs out to our Gnomons via a signed HTTPS POST request. The server would hold that connection open and wait for the result to come back. This worked fine for version 1, but was resource intensive on the main application server and the application server had to know about all of the Gnomons. That meant if we deployed a new Gnomon, we’d have to update the application config.
As part of the AWS migration, we wanted to just bring new workers online within minutes without updating the central application. So we moved to AWS SQS queues and to a ‘pull’ model instead. The new design inserts jobs into SQS queues. Each location has a manager Gnomon that pulls jobs off the queue and hands them to individual workers at that location. This way we don’t need to change anything on the central server as we bring more capacity online. We can simply bring more workers online, and the jobs disappear from the queue faster.
The results are ‘pushed’ back to the central server by the server doing the profiling. Once the job is finished, the server doing the work will make a signed POST request back to the central server to deliver the results. Those are aggregated, along with results from all other locations, and the final report is presented to our user.
The New Application Design
The diagram below describes the current application design, now that we are fully migrated into AWS, with 18 locations online.
The application request flow is as follows:
- A profiling request is kicked off from someone kind enough to visit Fast or Slow and give us a try. This is handled by the application server.
- The user’s browser establishes a web socket connection to the Echo Server, which is used to receive results in real-time. A Redis instance provides the back-end queue for Echo.
- The application server inserts a job into the SQS queue for each location. Each location has its own unique queue and can’t see the queues for other locations.
- At each geographical location, a manager Gnomon pulls a new job from its queue and hands it to one of the workers.
- The worker does the profiling and POSTs the result back to the application via a signed POST request that hits a REST endpoint.
- The results are written to the database and sent back to the customer in real-time via the Echo server and via an open websocket.
- If the customer has gone away, all they need to do is revisit the URL they were on when the job was running, and the results will appear. Or if the job is still running, they’ll reestablish the websocket and will see the job progress.
We have various graphics that are generated as SVGs, and the Gnomon workers write those directly to S3. Those images are served up straight from S3 to the customer. We need to render those images from SVG into PNGs for emails to subscribers. The PNGs are also used for unfurl images. So we have a set of render workers that grab SVGs from S3 and render them into PNGs.
Our database is running Aurora and is partitioned based on date of the profile job. We maintain performance profiling results for 90 days. By partitioning on date of job, we can easily delete old data without having to run a query on a large data set that needs to delete individual records. Instead, we’re just deleting a partition that contains data older than our cutoff. This is as fast as dropping a table, and if you’ve played with MySQL or Aurora, then you know that drop or truncate (which is a delete and recreate) is way faster than deleting by query.
You’ll notice a Redis cache that talks to the main application server. This is where we cache profile jobs to speed things up. It performs other miscellaneous application caching.
The End Result and TODO
Since we’ve migrated into AWS, we now provide 18 locations for performance profiling. Each location is remarkably stable and consistent in its performance. We are able to rapidly scale up our infrastructure to handle massive spikes in traffic. Give Fast or Slow a try, and you’ll see what I mean. It’s incredibly snappy and waiting in line should be a thing of the past.
Our next release will include some additional horizontal scaling capability for the application server. But we’re reasonably confident that even if we see a spike like last Friday, Fast or Slow will be able to handle it.
Our release after that includes an internal tool to do large scale web performance surveys, which will allow us to publish statistically significant data on which platforms and configurations perform the best. Our new scalability allows us to ramp up capacity for those internal surveys, too.
We hope you find the new capacity we’ve provided helpful. Thanks for giving Fast or Slow a try.