# Latency Tricks¶

By: Clark Gaebel [web] [keybase] [email]

Special thanks to reviewers: Ben Foppa [keybase], Norm Tasfi [web] [twitter]

## The Setup¶

Servers take time to respond to messages. In many systems, the time it takes is commonly quite low (tens of milliseconds), while rare events cause it to be very high (hundreds of milliseconds, or even seconds) in other cases. The distribution of service latency in real systems tends to have a long, fat tail: many requests have a response latency of an order of magnitude or more higher than the average latency, but the vast majority are responded to in a reasonable amount of time.

Systems that have this property aren't just annoying. For soft realtime systems like websites, the infrequent requests that take a long time can frustrate users to the point of leaving. What's worse is that, as a user spends more time on the site, they have a higher likelihood of eventually hitting one of these frustrating events.

This problem is so common that it has a folk solution: replicate the service to multiple servers, and send to all of them at once. Whichever answers first has its results returned.

How does this strategy affect our latency? Intuitively, it will decrease. The average won't decrease by much, but the tail of the latency PDF will be dramatically reduced.

Before trying to answer this question, a single server's latency must be modeled {we need to model the latency of a single server}. The Pareto distribution is a common distribution with the properties mentioned above.

## Pareto Distributions¶

The probability distribution function of a Pareto distribution

The cumulative distribution function of a Pareto distribution

A pareto distribution has two parameters: $x_m$ and $\alpha$.

$x_m$ is the minimum (and most common) latency, which determines how far to the right the PDF lies. In the above illustrations, $x_m = 1$.

$\alpha$ is the "shape", which defines how fat the tail of the PDF is. The lower it is, the more likely high latency events are to occur.

Let's explore this distribution a little. What does it look like with a variety of alphas? For that, we sample the distribution a few million times, sort the latencies into buckets, then plot it: