Array Functions in PostgreSQL


In the last post, we looked into how to write a query to return array values. We ended up with the following:

SELECT title, array_agg(comments.id) AS comment_ids
FROM posts INNER JOIN comments ON posts.id = comments.post_id
GROUP BY posts.title;
+--------+-------------+
| title  | comment_ids |
+--------+-------------+
| Post 1 | {1,2}       |
| Post 2 | {3}         |
| Post 3 | {4,5}       |
| Post 4 | {6}         |
| Post 5 | {7,8,9}     |
+--------+-------------+

How would we go about filtering these results? I want to focus on one specific use case. How would I find all post titles having a comment with the id 3? We can do this without aggregating the comment ids:

SELECT DISTINCT title
FROM posts INNER JOIN comments ON posts.id = comments.post_id
WHERE comments.id = 8
+--------+
| title  |
+--------+
| Post 5 |
+--------+

Looks good. How about selecting post titles whose comment ids are a subset of [4, 5]? We’d could write the following:

SELECT DISTINCT title
FROM posts INNER JOIN comments ON posts.id = comments.post_id
WHERE comments.id in (4,5)
+--------+
| title  |
+--------+
| Post 3 |
+--------+

And for our table values this would work as expected. But there is a problem here. Let’s say we wanted all posts whose comment ids are a subset of [1,3]. Our above query would return “Post 1” and “Post 2”, but that would be incorrect: we only get ‘Post 2’ as the comment ids for ‘Post 1’ are not a subset.

Postgres comes with a number of Array Functions and Operators, one of which may just help us here: the ‘contains’ function, @>. We need to use array_agg again and filter using @>. Our first naive attempt may be as follows:

SELECT title, array_agg(comments.id) AS comment_ids
FROM posts INNER JOIN comments ON posts.id = comments.post_id
WHERE (1, 3) @> comment_ids
GROUP BY posts.title;
ERROR:  column "comment_ids" does not exist
LINE 3: WHERE (1, 2, 3) @> comment_ids

So how come that didn’t work? Well, apparently aliases cannot be used in WHERE conditions (but can in ORDER BY). Lets try the following:

SELECT title
FROM posts INNER JOIN comments ON posts.id = comments.post_id
WHERE (1, 3) @> array_agg(comments.id)
GROUP BY posts.title;
ERROR:  aggregate functions are not allowed in WHERE
LINE 3: WHERE (1, 3) @> array_agg(comments.id)

Hmm. That didn’t work either. If we want to query aggregated data, as is the case when using GROUP BY, we need to use HAVING instead of where.

SELECT title
FROM posts INNER JOIN comments ON posts.id = comments.post_id
HAVING (1, 3) @> array_agg(comments.id)
GROUP BY posts.title;
ERROR:  syntax error at or near "GROUP"
LINE 4: GROUP BY posts.title;

Bah! Ok - HAVING must come after GROUP BY.

SELECT title, array_agg(comments.id) AS comment_ids
FROM posts INNER JOIN comments ON posts.id = comments.post_id
GROUP BY posts.title
HAVING (1, 3) @> array_agg(comments.id);
ERROR:  operator does not exist: record @> integer[]
LINE 3: GROUP BY posts.title HAVING (1,2,3) @> array_agg(comments.id...

Nggh! What does that even mean? Well, to declare an array in postgres, we need to use declare the value as being an array. There are two ways to do this: either ARRAY[1,3] or '{1,3}' will suffice:

SELECT title
FROM posts INNER JOIN comments ON posts.id = comments.post_id
GROUP BY posts.title
HAVING ARRAY[1,3] @> array_agg(comments.id);
+--------+
| title  |
+--------+
| Post 3 |
+--------+

Striker!