TL;DR: I wrote a distributed cloud based compiler that I’ve now made into a public facing API. Take a look over at http://apidocs.mimirplatform.io/v1.0/docs for details.
My name is Jacobi Petrucciani and I am one of the founders of an EdTech startup with a focus in Computer Science called Mimir (http://mimirhq.com). I’m happy to announce that we just released V1.0 of our cloud compilation API!
Working as a full stack developer, it is my job to manage everything from the infrastructure of the VPC on AWS to the APIs and backend compilers that actually drive a large part of our web application. I am here to discuss our journey in developing a pretty cool backend service for our platform, and what design decisions we made along the way.
A year ago we designed the first implementation for the compiler as more of a proof of concept than anything else. We made the first version of the web application almost entirely in PHP, using the CakePHP MVC framework as the front end (we now use Ruby on Rails). We made up the design as we went and that resulted in a lot of technical debt throughout the application.
Looking back I cringe at our design and implementation, but working on version 2 means we can try to fix that!
Looking at this implementation, it is quite inefficient and not really user friendly. It was written directly inside of our old PHP application as well so it was not modular at all (I know, I know, I need separation of concerns and all that). There was no concurrency at all so this would not scale well past a few users at a time.
Our submission queue system was implemented using CakeResque, a plugin for CakePHP that can be used to queue up tasks in the background of the web application. Our underlying messenger was Redis, which while easy and fast to setup, had a few issues that I will explore later. This setup didn’t support concurrent execution of tasks, so the queue was useful for making sure every request was handled, no matter how long it took. One of the pitfalls of this setup was the lack of robustness along with CakeResque in our implementation. One job failing could cause a worker to fail or the queue itself to fail. This was somewhat acceptable for V1.0, as we did not have very much traffic going through our application. However, for V2.0, we decided this was unacceptable.
We started working on Version 2.0 of our platform in Ruby on Rails about a month ago, and it gave us a golden opportunity to fix the design and implementation.
Looking at our designs from a fresh viewpoint, we decided there was a lot to be improved on. First off, scalability was a must, so concurrency was priority number 1. I also figured that it may be a good idea to separate it entirely from our web platform, both for satisfying separation of concerns and also for opening it up to other developers if an API could be of any use to others. Lastly, queue robustness and reliable delivery was a must.
In deciding the underlying queue system for the backend, I looked back at our old implementation. The Redis server, while fast and easy to set up, was not very efficient, and also lacked reliable delivery. This was a must for V2.0, where we could have many different users submitting concurrently. Any hold up in the messaging broker or queue could potentially drop submissions, causing a bunch of errors in our platform. After researching for a few hours, I finally decided on using RabbitMQ as a messaging broker, and Celery as a queue as it is commonly used with RabbitMQ. I chose RabbitMQ as it is widely deployed and tuned for speed and durability, as well as being extremely efficient. It also supports reliable delivery, which is one of the main points that sold us on using it. It did take quite a bit longer to set up initially, but when I finally did get it all set up, it performed much better than the old Redis server.
After thinking for quite a long time about how to design V2.0, we finally decided that a Master/Slave architecture would be best for achieving what we wanted in V2.0.
As a reminder, the master slave architecture is a fairly common model used in large scale software applications where one main server, a master, will control what all of the connected slaves will work on.
Our use of Celery as a task queue helped here, as it was built as a distributed task queue. It was quite easy to implement the queue to automatically distribute tasks among available workers once we set up the associations. With the master slave architecture, we can now spin up as many slaves as we want and have them all connect to the master, allowing for the compiler to be scalable with demand. It also could theoretically scale until the master server itself was unable to handle the volume of requests coming in.
In the end, this is the stack we went with:
In addition to this stack, we use:
The basic flow of V2.0 (on Mimir Platform):
The basic flow of V2.0 (API):
The architecture was now in place for the distribution of tasks. I could now easily make AMIs of both the master and slave instances in EC2, allowing me to redeploy these machines anywhere on our AWS account, and also allowing me to scale the number of slaves with the push of a button.
Due to the design we chose, the compiler setup was now already modular. The master handled requests and what the slaves did, and the slaves can only compile and run code based on what the master gives them. Also, we are now running on a different set of servers instead of running on the same server as the web platform as it was in V1.0.
When we first started work on V2.0, we had trouble deciding what we should use to allow calls to the Compiler API. I needed something fast and lightweight. Researching a bit into languages and REST frameworks, I didn’t see a lot of convincing arguments on one framework over another. I decided in the end to use what I’m comfortable with and build it out in Python.
Originally, we used the SimpleHTTPRequestHandler built into python to make a script that constantly listens for API calls and handles them appropriately. This proved to be quite fast and fairly lightweight, and was definitely fun to build.
Over time, we added more and more features to the internal API and configuring the API handler became more and more time consuming. We started looking for replacements, and wound up using a framework called CherryPy. CherryPy made it surprisingly easy to handle requests and respond. It seemed about as fast as the one we built from scratch, but was so much easier to create new API methods in.
Building this API out has been a fantastic learning experience for me. Looking into different architectures and different options for the infrastructure has opened my eyes to the world of distributed computing for speed and scalability.
Some things that I’d like to work on or explore next:
Hopefully this version doesn't make me cringe as much when we relook at its design in a few months.
Be sure to check out the documentation at for our public compiler api at
Looking forward to seeing what people might use this for! Shoot us an email with what you’re working on, or if you have any questions, thoughts, or suggestions at email@example.com.