zblog/content/post/110.md

+++
title = "range types in postgres"
date = "2014-08-08T20:23:36+00:00"
author = "Gibheer"
draft = false
+++

Nearly two years ago, Postgres got a very nice feature - [range types][range-types]. These are available for timestamps, numerics and integers.
The problem is, that till now, I didn't have a good example what one could do with it. But today someone gave me a quest to use it!

His problem was, that they had id ranges used by customers and they weren't sure if they overlapped. The table looked something like this:

    create table ranges(
      range_id serial primary key,
      lower_bound bigint not null,
      upper_bound bigint not null
    );

With data like this

    insert into ranges(lower_bound, upper_bound) values
      (120000, 120500), (123000, 123750), (123750, 124000);

They had something like 40,000 rows of that kind. So this was perfect for using range type queries.

To find out, if there was an overlap, I used the following query

    select *
      from ranges r1
      join ranges r2
        on int8range(r1.lower_bound, r1.upper_bound, '[]') &&
           int8range(r2.lower_bound, r2.upper_bound, '[]')
     where r1.range_id != r2.range_id;

In this case, int8range takes two bigint values and converts it to a range. The string `[]` defines if the two values are included or excluded in the range. In this example, they are included.
The output for this query looked like the following

     range_id │ lower_bound │ upper_bound │ range_id │ lower_bound │ upper_bound
    ──────────┼─────────────┼─────────────┼──────────┼─────────────┼─────────────
            2 │      123000 │      123750 │        3 │      123750 │      124000
            3 │      123750 │      124000 │        2 │      123000 │      123750
    (2 rows)

    Time: 0.317 ms

But as I said, the table had 40,000 values. That means the set to filter has a size of 1.6 billion entries. The computation of the query took a very long time, so I used another nice feature of postgres - transactions.

The idea was to add a temporary index to get the computation done in a much faster time (the index is also described in the [documentation][index]).

    begin;
    create index on ranges using gist(int8range(lower_bound, upper_bound, '[]'));
    select *
      from ranges r1
      join ranges r2
        on int8range(r1.lower_bound, r1.upper_bound, '[]') &&
           int8range(r2.lower_bound, r2.upper_bound, '[]')
     where r1.range_id != r2.range_id;
    rollback;

The overall runtime in my case was 300ms, so the writelock wasn't that much of a concern anymore.

[range-types]: http://www.postgresql.org/docs/current/static/rangetypes.html
[index]: http://www.postgresql.org/docs/current/static/rangetypes.html#RANGETYPES-INDEXING
initial commit and import This is the initial import of all blog posts and a pretty workable theme. More work will follow. 2015-10-25 01:41:33 +02:00			`+++`
			`title = "range types in postgres"`
			`date = "2014-08-08T20:23:36+00:00"`
			`author = "Gibheer"`
			`draft = false`
			`+++`

			`Nearly two years ago, Postgres got a very nice feature - [range types][range-types]. These are available for timestamps, numerics and integers.`
			`The problem is, that till now, I didn't have a good example what one could do with it. But today someone gave me a quest to use it!`

			`His problem was, that they had id ranges used by customers and they weren't sure if they overlapped. The table looked something like this:`

			`create table ranges(`
			`range_id serial primary key,`
			`lower_bound bigint not null,`
			`upper_bound bigint not null`
			`);`

			`With data like this`

			`insert into ranges(lower_bound, upper_bound) values`
			`(120000, 120500), (123000, 123750), (123750, 124000);`

			`They had something like 40,000 rows of that kind. So this was perfect for using range type queries.`

			`To find out, if there was an overlap, I used the following query`

			`select *`
			`from ranges r1`
			`join ranges r2`
			`on int8range(r1.lower_bound, r1.upper_bound, '[]') &&`
			`int8range(r2.lower_bound, r2.upper_bound, '[]')`
			`where r1.range_id != r2.range_id;`

			In this case, int8range takes two bigint values and converts it to a range. The string `[]` defines if the two values are included or excluded in the range. In this example, they are included.
			`The output for this query looked like the following`

			`range_id │ lower_bound │ upper_bound │ range_id │ lower_bound │ upper_bound`
			`──────────┼─────────────┼─────────────┼──────────┼─────────────┼─────────────`
			`2 │ 123000 │ 123750 │ 3 │ 123750 │ 124000`
			`3 │ 123750 │ 124000 │ 2 │ 123000 │ 123750`
			`(2 rows)`

			`Time: 0.317 ms`

			`But as I said, the table had 40,000 values. That means the set to filter has a size of 1.6 billion entries. The computation of the query took a very long time, so I used another nice feature of postgres - transactions.`

			`The idea was to add a temporary index to get the computation done in a much faster time (the index is also described in the [documentation][index]).`

			`begin;`
			`create index on ranges using gist(int8range(lower_bound, upper_bound, '[]'));`
			`select *`
			`from ranges r1`
			`join ranges r2`
			`on int8range(r1.lower_bound, r1.upper_bound, '[]') &&`
			`int8range(r2.lower_bound, r2.upper_bound, '[]')`
			`where r1.range_id != r2.range_id;`
			`rollback;`

			`The overall runtime in my case was 300ms, so the writelock wasn't that much of a concern anymore.`

			`[range-types]: http://www.postgresql.org/docs/current/static/rangetypes.html`
			`[index]: http://www.postgresql.org/docs/current/static/rangetypes.html#RANGETYPES-INDEXING`