Aggregate Functions in PostgreSQL


So you have two tables: posts and comments. Posts can have many comments:

Posts              Comments
+----+---------+   +----+---------+
| id | title   |   | id | post_id |
+----+---------+   |----|---------|
|  1 | Post 1  |   |  1 |      1  |
|  2 | Post 2  |   |  2 |      1  |
|  3 | Post 3  |   |  3 |      2  |
|  4 | Post 4  |   |  4 |      3  |
|  5 | Post 5  |   |  5 |      3  |
+----+---------+   |  6 |      4  |
                   |  7 |      5  |
                   |  8 |      5  |
                   |  9 |      5  |
                   +----+---------+

How do you write a query which will return a list of all posts with title and an array of comment ids?

SELECT title, comments.id AS comment_ids
FROM posts INNER JOIN comments ON posts.id = comments.post_id;

Should work right? Wrong:

+--------+-------------+
| title  | comment_ids |
+--------+-------------+
| Post 1 |           1 |
| Post 1 |           2 |
| Post 2 |           3 |
| Post 3 |           4 |
| Post 3 |           5 |
| Post 4 |           6 |
| Post 5 |           7 |
| Post 5 |           8 |
| Post 5 |           9 |
+--------+-------------+

How can we get a list of distinct post titles with their comment ids? Maybe with distinct?

SELECT DISTINCT ON (title) title, comments.id AS comment_ids
FROM posts INNER JOIN comments ON posts.id = comments.post_id;

Nope.

+--------+-------------+
| title  | comment_ids |
+--------+-------------+
| Post 1 |           1 |
| Post 2 |           3 |
| Post 3 |           4 |
| Post 4 |           6 |
| Post 5 |           7 |
+--------+-------------+

Bah. There must be a way. We want to collect a list of comment ids for each post and pack them in our comments_ids column, as an array. We want to collect, or aggregate, an array of comment ids.

Array Aggregation

From the PostgreSQL manual:

Aggregate functions compute a single result from a set of input values. [source]

You’ve probably used sum or count in queries before. They are also aggregate functions: they take a set of values and run a computation on them. Any there are many more. The one we need for our particular use case is array_agg.

Behold:

SELECT title, array_agg(comments.id) AS comment_ids
FROM posts INNER JOIN comments ON posts.id = comments.post_id
GROUP BY posts.title;
+--------+-------------+
| title  | comment_ids |
+--------+-------------+
| Post 1 | {1,2}       |
| Post 2 | {3}         |
| Post 3 | {4,5}       |
| Post 4 | {6}         |
| Post 5 | {7,8,9}     |
+--------+-------------+

That’s more like it :)

Note the addition of GROUP BY in the last query. What happens if we omit this clause?

ERROR:  column "posts.title" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT title, array_agg(comments.id) AS comment_ids

This makes sense of course: array functions can only work with an array of results.