zblog/110.md at 02f704abe13b44ecb440dc40c6380788d4760235

gibheer

zblog

2.9 KiB

Raw Blame History

+++ title = "range types in postgres" date = "2014-08-08T20:23:36+00:00" author = "Gibheer" draft = false +++

Nearly two years ago, Postgres got a very nice feature - range types. These are available for timestamps, numerics and integers. The problem is, that till now, I didn't have a good example what one could do with it. But today someone gave me a quest to use it!

His problem was, that they had id ranges used by customers and they weren't sure if they overlapped. The table looked something like this:

create table ranges(
  range_id serial primary key,
  lower_bound bigint not null,
  upper_bound bigint not null
);

With data like this

insert into ranges(lower_bound, upper_bound) values
  (120000, 120500), (123000, 123750), (123750, 124000);

They had something like 40,000 rows of that kind. So this was perfect for using range type queries.

To find out, if there was an overlap, I used the following query

select *
  from ranges r1
  join ranges r2
    on int8range(r1.lower_bound, r1.upper_bound, '[]') &&
       int8range(r2.lower_bound, r2.upper_bound, '[]')
 where r1.range_id != r2.range_id;

In this case, int8range takes two bigint values and converts it to a range. The string [] defines if the two values are included or excluded in the range. In this example, they are included. The output for this query looked like the following

 range_id │ lower_bound │ upper_bound │ range_id │ lower_bound │ upper_bound
──────────┼─────────────┼─────────────┼──────────┼─────────────┼─────────────
        2 │      123000 │      123750 │        3 │      123750 │      124000
        3 │      123750 │      124000 │        2 │      123000 │      123750
(2 rows)

Time: 0.317 ms

But as I said, the table had 40,000 values. That means the set to filter has a size of 1.6 billion entries. The computation of the query took a very long time, so I used another nice feature of postgres - transactions.

The idea was to add a temporary index to get the computation done in a much faster time (the index is also described in the documentation).

begin;
create index on ranges using gist(int8range(lower_bound, upper_bound, '[]'));
select *
  from ranges r1
  join ranges r2
    on int8range(r1.lower_bound, r1.upper_bound, '[]') &&
       int8range(r2.lower_bound, r2.upper_bound, '[]')
 where r1.range_id != r2.range_id;
rollback;

The overall runtime in my case was 300ms, so the writelock wasn't that much of a concern anymore.

2.9 KiB Raw Blame History

2.9 KiB

Raw Blame History