Calculating Percentile (and Median) in PostgreSQL

Modern versions of PostgreSQL (≥ 9.4) make it easy to calculate the percentile for a list of values at any percentage using the percentile_cont and percentile_disc ordered-set aggregation functions.

The two functions work very similarly, but differ in how they merge the final result:

  • percentile_disc will return a value from the input set closest to the percentile you request
  • percentile_cont will return an interpolated value between multiple values based on the distribution. You can think of this as being more accurate, but can return a fractional value between the two values from the input

In these examples we'll imagine we have a table called things with a column called value.

Calculating The Median

The median is generally the 50th percentile. You can calculate it with the following query:

select percentile_disc(0.5) within group (order by value)
from things

The within group syntax is called an ordered-set aggregate

For a percentile that lies between values from the input, use percentile_cont to get the interpolated result:

select percentile_const(0.5) within group (order by value)
from things

Calculating A Perentile

Any percentile can be calculated by providing a fractional amount from 0 to 1. The percentile function can be used multiple times within the query:

  percentile_disc(0.25) within group (order by value),
  percentile_disc(0.5) within group (order by value),
  percentile_disc(0.75) within group (order by value)
from things

within group can be used with other claues like group by. Here’s an example where different percentiles are calculated for numbers less than 100, and everything else.

  value < 100 as less_than_100,
  percentile_disc(0.75) within group (order by value)
from things
group by 1

group by 1 groups by the first expression in the select statement

Calculating All Percentiles

You can easily calculate all percentiles using generate_series. In this example, every percentile from 1 to 100 is returned as individual rows:

select k, percentile_disc(k) within group (order by value)
from things, generate_series(0.01, 1, 0.01) as k
group by k

generate_series generates table values between a starting an ending value, with an optional step. In this example, it returns 0.01, 0.02, etc..