Updated: Aug 16, 2021
Let's consider that you have developed a complex backend system that is battle-tested with all the functional requirements, and you are ready to hit the production and take a bonus for this hard work 💰💰💰. But just before the deployment, your boss sent an email and asked you to send these details.
How is Mean, Median, Max, p90, p95,p99 percentile latency looks like?
Also at what load did you test
And, you have no idea about these Buzzwords 😳
The behavior of software systems is hard for humans to understand, so we need some metrics to judge whether the system is running fine or not in particular scenarios.
Metrics are a proxy for reality !!!!
Latency Vs Response time:
Latency and Response time are often used synonymously but they are different.
Response time is what the client sees; besides the actual time to process the request (the service time), it includes network delays and queuing delays. Response times cannot be the same at all times. For example, refer to Figure 1-1.
Let's suppose your service returns a response in 100ms and you can not guarantee that you will get 100ms on every request because delays can be included. We, therefore, need to think that response time is not a single number, but a distribution of values that you can measure.
Credits: Designing Data-Intensive Applications by Martin Kleppmann
Latency is the duration that a request is waiting to be handled--during which it is latent, awaiting service.
In simple terms, latency is the time taken for a request to happen, and this time will include waiting time.
Why Averages won't work?
Average response times don't give much information. Let's take an example of a system whose response times look like below.
Response times of 10 requests in ms: 60, 120, 30, 20, 40, 55, 25, 65, 90, 920 Average: 143ms (Rounded)
The average response time is 143 ms, but it does not give enough information about how many users actually experienced this. Frankly speaking, your system is performing well, but due to that one request (920ms), your average went up.
Use Percentiles (p50, p95, p99, p99.999...) :
Let's take the same example,
Sort the response times
Take 50/100th Value for the 50th percentile and multiply it with the number of requests (0.5*10 = 5)
Consider Nth index value as 50th Percentile or p50
Response times of 10 requests in ms: 60, 120, 30, 20, 40, 55, 25, 65, 90, 920 1. Sort it: 20, 25, 30, 40, 55, 60, 65, 90, 120, 920 2. Take 50/100th value: (50/100)*10 = 0.5*10 = 5 3. Take Nth index value: 5th index value is 55ms 50th percentile or p50: 55ms (taking 1 as the starting index)
50th Percentile or p50 value is 55 ms, which means half of your requests return response in less than 55 ms and the other half returns a response in more than 55 ms.
Why do we need this Metric?
Latency Percentiles are often used in Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
For example, you are running a company XYZ and you have a contract with your clients that defines the expected performance and availability of your service, like below:
p99th percentile is 900ms
p95th percentile is 500ms
These Metrics set expectations for the clients and allow customers to demand a refund if the SLA is not met by the company.
To figure out how bad your outliers are, you need to look at higher percentiles like p95, p99, p99.9. Higher percentiles of response times are also called Tail Latencies, and this is important because they directly affect the user experience. For example, if the p99th percentile response time is 1.5 seconds, then 99 out of 100 requests will get completed within 1.5 seconds, and 1 out of 100 takes more than 1.5 seconds.
Why Tail Latency is Important?
There was a study conducted by Amazon that showed a 100ms increase in response time reduced sales by 1%. Similarly, Google conducted a study where it was shown that a half a second delay in load time reduced site traffic by 20%. As per Amazon, customers with the slowest requests are often those who have the most data on their accounts because they have made many purchases, and those customers are considered valuable customers, so losing that valuable customer because of the slowest request is an expensive mistake for a company.
Optimizing Tail Latency:
Optimizing the tail latency is not as easy as you think. For example, the 99.99th percentile (slowest 1 in 10,000 requests) was deemed too expensive and did not yield enough benefit for Amazon's purposes because reducing response times at very high percentiles is difficult because they are easily affected by random events outside of your control.
Percentiles in Practice:
The approach which I mentioned earlier is considered naive and some algorithms can calculate a good approximation of percentiles at minimal CPU and memory cost, such as
Hdr Histograms https://github.com/HdrHistogram/HdrHistogram
Gil Tene is the GC and Latency Guru and he created this library in Java, and later ported it to other languages like Erlang, Go, etc. It is opensource ❤️
Also, if you haven't watched any of his talks about latency or GC, please watch them.
The below code takes response times in microseconds and can measure the percentiles.
Do not re-invent the wheel:
In the real world, you will be deploying your service on multiple servers across the globe for scalability and availability, so calculating these percentiles by ourselves is not practical. There are a lot of technologies which can help us, and those are
By this time, you would have understood that latency is an important metric for a company and it should be given equal weightage with your use-case testing.