The full script for this post can be downloaded here.

The following post was written primarily in Postgres 14 and tested in both Postgres and MySQL.

While the data generation and use of EXCEPT are not supported natively in MySQL, the principles of row comparisons are equivalent.

Setup

To demonstrate how a row constructor comparison can be confusing, let’s create some dummy data with two columns of integers:
--create a table to hold some random integers
drop table if exists random_numbers;
create temp table random_numbers (a int, b int);

--create 100 rows of random integers
--between 0 and 50
insert into random_numbers (a, b)
select floor(random()*50), ceiling(random()*50)
from generate_series(1,10000);

 

Comparisons

PostgreSQL allows us to compare two sets of values using what is known as a row constructor comparison. 

A row can be constructed with parentheses, like this:

--with columns in a query
(a_column, b_column)

--or with literals
(10, 40)

We can create complex rows in this way and then compare them with each other.

For example:

(a_column, b_column) = (10, 40)

This can be incredibly powerful and is a useful shortcut for complex logic. 

However, the usual Spiderman caveat applies. 

Comparing two row constructors is not always intuitive. 

= operator

Consider this:

select 'This: "where (a,b) = (10,40)"' as message, count(*) 
from (
    select a, b
    from random_numbers 
    where (a,b) = (10,40)
) x
union all 
select 'Is the same as: "where a = 10 and b = 40"' as message, count(*) 
from (
    select a, b
    from random_numbers 
    where a = 10 and b = 40
) x
union all 
select 'Subtracting one from the other using "except" gives us' as message, count(*)
from (
    select a, b
    from random_numbers 
    where a = 10 and b = 40
    except
    select a, b
    from random_numbers 
    where (a,b) = (10,40)
) x;


Which for the random data created when I wrote this post, returns this:

This is straightforward enough. Put simply: is the left side equal to the right side?

Intuitively, we expect a comparison between each “column” from the left side with the corresponding “column” of the right side.  For the equals operator, it works how we expect.

<> operator

Consider this:

select 'This: "where (a,b) <> (10,40)"' as message, count(*) 
from (
    select a, b
    from random_numbers 
    where (a,b) <> (10,40)
) x
union all 
select 'Is not the same as: "where a <> 10 and b <> 40"' as message, count(*) 
from (
    select a, b
    from random_numbers 
    where a <> 10 and b <> 40
) x
union all 
select 'It''s the same as: "where a <> 10 or (a = 10 and b <> 40)"' as message, count(*) 
from (
    select a, b
    from random_numbers 
    where a <> 10 or (a = 10 and b <> 40)
) x
union all 
select 'Subtracting row 1 from row 3 using "except" gives us' as message, count(*)
from (
    select a, b
    from random_numbers 
    where a <> 10 or (a = 10 and b <> 40)
    except
    select a, b
    from random_numbers 
    where (a,b) <> (10,40)
) x;

Which gives us:

This may not appear as intuitive. But think about it this way: as long as one of the two conditions is met, then we consider the entire condition met.  

When comparing two rows with the <> operator, if even one column is different, then the rows are different

We can either have a <> 10 and b = anything, or we can have a = 10 and b <> 40. Since both of those conditions together are equivalent to b <> 40 and a = anything, it doesn’t need to be specified. 

Pretty straightforward when you think about if for a second.

Things are less intuitive when we move to other operators.

< operator

Consider:

select 'This: "where (a,b) < (10,40)"' as where_clause, count(*) 
from (
    select a, b
    from random_numbers 
    where (a,b) < (10,40)
) x
union all 
select 'Is not the same as: "where a < 10 and b < 40"' as where_clause, count(*) 
from (
    select a, b
    from random_numbers 
    where a < 10 and b < 40
) x
union all 
select 'It''s the same as: "where a < 10 or (a = 10 and b < 40)"' as where_clause, count(*) 
from (
    select a, b
    from random_numbers 
    where a < 10 or (a = 10 and b < 40)
) x
union all 
select 'Subtracting row 1 from row 3 using "except" gives us' as message, count(*)
from (
    select a, b
    from random_numbers 
    where a < 10 or (a = 10 and b < 40)
    except
    select a, b
    from random_numbers 
    where (a,b) < (10,40)
) x;

We get this:

According to the documentation:

For the <, <=, > and >= cases, the row elements are compared left-to-right, stopping as soon as an unequal or null pair of elements is found. If either of this pair of elements is null, the result of the row comparison is unknown (null); otherwise comparison of this pair of elements determines the result

So, we are comparing one row with another, column-wise from left to right.

  1. Is a less than 10? If yes, include the row in the output. If not:
  2. Is a equal to 10 and b less than 40? If yes, include the row in the output. 

The notable omission here is where b is less than 40 but a is greater than 10. Why? 

Well, read the quote above again. 

Specifically:

stopping as soon as an unequal or null pair of elements is found

The implication here is that each subsequent comparison assumes equality in the prior comparison(s). So, the “2nd” comparison of b with 40 assumes equality between a and 10. 

And if that wasn’t confusing enough.

<= operator

Consider:

select 'This: "where (a,b) <= (10,40)"' as where_clause, count(*) 
from (
    select a, b
    from random_numbers 
    where (a,b) <= (10,40)
) x
union all 
select 'Is not the same as: "where a <= 10 and b <= 40"' as where_clause, count(*) 
from (
    select a, b
    from random_numbers 
    where a <= 10 and b <= 40
) x
union all 
select 'It''s the same as: "where a < 10 or (a = 10 and b <= 40)"' as where_clause, count(*) 
from (
    select a, b
    from random_numbers 
    where a < 10 or (a = 10 and b <= 40)
) x
union all 
select 'Subtracting row 1 from row 3 using "except" gives us' as message, count(*)
from (
    select a, b
    from random_numbers 
    where a < 10 or (a = 10 and b <= 40)
    except
    select a, b
    from random_numbers 
    where (a,b) <= (10,40)
) x;

That code gives us:

  • If a is less than 10, then include the row in the output, otherwise:
  • If a is equal to 10 and b is less than or equal to 40, then include the row in the output

We might have expected the first condition to be:

  • If a is less than or equal to 10

But again:

stopping as soon as an unequal or null pair of elements is found

So, if the first comparison was actually a<=10, those rows where a=10 would not cause the evaluation to stop, the implication being that it would be including rows where a < 10 and b = anything as well as a = 10 and b = anything, after which rows where a = 10 and b <= 40 would be evaluated. This latter evaluation is a subset of (a = 10 and b = anything), and so becomes entirely redundant. 

This behavior is similarly expressed with >= and >, but I won’t elaborate here, though they are included in the example script.

In summary

PostgreSQL allows us to use row constructors to compare sets of values.

The operators supported for such comparisons are =, <>, <, <=, > or >=

= and <> work as we might expect. 

The other operators are evaluated from left-to-right and stop evaluating when the first unequal non-null comparison is encountered. Subsequent comparisons assume equality of the prior comparisons.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>