On November 11, 2014 Mozilla announced the Polaris Privacy Initiative. One key part of the initiative is us supporting the Tor network by deploying Tor middle relay nodes. On January 15, 2015 our first proof of concept (POC) went live.
TL;DR; here are our Tor relays: https://globe.torproject.org/#/search/query=mozilla
When we started this POC, the requirements we had were:
The current design is fully redundant. This allows us to complete maintenance or have node failure without impacting 100% of traffic. The worst case scenario is a 50% loss of capacity.
The design also allows us to easily add more servers in the event we need more capacity, with no anticipated impact.
There is a large body of knowledge available on building Tor nodes. I read mailing lists archives, blog posts, and tutorials. I had exchanges with people already running large relays. There are still data points Mozilla needs to understand before our experiment is complete. This section is a “quick run down” on some of those data points.
This seems to be more of gut feeling from existing operators than a proven value (let me know if I’m wrong), but it makes sense. We do have available transit and capacity. Understanding throughput and resource utilization is a key criteria for us.
Important Note: An operator running relays must use the “MyFamily” option in torrc. This ensures a user doesn’t bounce through several of your servers.
A new Tor instance (identified by its private/public key pair) will take time (up to 2 months) to use all its available bandwidth. This is explained in this blog post: The lifecycle of a new relay. We will be updating our blog posts and are curious how closely our nodes mirror the lifecycle.
This is based on mailing list discussions, as we haven’t reached that bandwidth yet. We run several instances per physical server.
This helps people behind strict firewall to access Tor. Don’t worry about running the process as root (needed to listen on ports < 1024), as long as you have the “User” option in torrc, Tor will drop the privileges after binding to the ports.
We decided to use Ansible for configuration management. A few things motivated us to make that choice.
And look! Mozilla’s Ansible configuration is available on GitHub!
The security team helped us a lot along this project. Together we have put together a list of requirements, such as
The only place for the infrastructure administration is the jumphost. Systems don’t accept management connection from anywhere else.
It is important to note, that many of the security requirements align nicely with what’s considered a good practices in general system and network administration. Take enabling NTP or centralized syslog for example – equally important for some services to run smoothly, for troubleshooting and for Incident Response. Similar concepts apply with the principle “make sure the network devices security is at least as good as system’s one”.
We’ve also implemented a periodic security check to be run on these systems. All of them are scanned from inside for security updates and outside for opened ports.
One of the points we’re wondering are: how do we figure out if we’re running an efficient relay (in terms of cost, participation in the Tor network, hardware efficiency, etc). Which metrics to use and how to use them?
Looking around it seems like there is no “good answer”. We’re graphing everything we can about bandwidth and servers utilization using Observium. The Tor network already has a project to collect relays statistics called Tor metrics. Thanks to it, tools like Globe and others can exists.
Note that we have just started them and they are far from running at their maximal bandwidth (for the reasons listed above). We will share more information down the road about performances and scaling.
Depending on the results of the POC, we may move the nodes to a managed part of our infrastructure. As long as their private keys stay the same, their reputation will follow them wherever they go, no more ramp up period.
On a technical side there are a lot of possible things to do like adding IPv6 connectivity. We’re reviewing opportunities to more parts of the deployment (like iptables, logs, etc…).
Here are a few links that you might find interesting:
[blog] IPredator – building a Tor server [mailing list] [tor-dev] Scaling tor for a global population [mailing list] How to Run High Capacity Tor Relays [wiki] tor – archwiki [blog] Run A Tor-Relay On Ubuntu Trusty [mailing list] [tor-relays] Someone broke the tor-relay speed record? [tor website] Configuring a Tor relay on Debian/Ubuntu [wiki] tor exit full setup
Of course, none of that would have been possible without the help of Van, Michal (who wrote the part about security) and Opsec, Javaun, James, Moritz and the people of #tor!