So you have two tables: posts and comments. Posts can have many comments:
Posts Comments
+----+---------+ +----+---------+
| id | title | | id | post_id |
+----+---------+ |----|---------|
| 1 | Post 1 | | 1 | 1 |
| 2 | Post 2 | | 2 | 1 |
| 3 | Post 3 | | 3 | 2 |
| 4 | Post 4 | | 4 | 3 |
| 5 | Post 5 | | 5 | 3 |
+----+---------+ | 6 | 4 |
| 7 | 5 |
| 8 | 5 |
| 9 | 5 |
+----+---------+
How do you write a query which will return a list of all posts with title and an array of comment ids?
Should work right? Wrong:
+--------+-------------+
| title | comment_ids |
+--------+-------------+
| Post 1 | 1 |
| Post 1 | 2 |
| Post 2 | 3 |
| Post 3 | 4 |
| Post 3 | 5 |
| Post 4 | 6 |
| Post 5 | 7 |
| Post 5 | 8 |
| Post 5 | 9 |
+--------+-------------+
How can we get a list of distinct post titles with their comment ids? Maybe with distinct?
Nope.
+--------+-------------+
| title | comment_ids |
+--------+-------------+
| Post 1 | 1 |
| Post 2 | 3 |
| Post 3 | 4 |
| Post 4 | 6 |
| Post 5 | 7 |
+--------+-------------+
Bah. There must be a way. We want to collect a list of comment ids for each post and pack them in our comments_ids column, as an array. We want to collect, or aggregate, an array of comment ids.
Array Aggregation
From the PostgreSQL manual:
Aggregate functions compute a single result from a set of input values. [source]
You’ve probably used sum
or count
in queries before. They are also aggregate functions: they take a set of values and run a computation on them. Any there are many more. The one we need for our particular use case is array_agg
.
Behold:
+--------+-------------+
| title | comment_ids |
+--------+-------------+
| Post 1 | {1,2} |
| Post 2 | {3} |
| Post 3 | {4,5} |
| Post 4 | {6} |
| Post 5 | {7,8,9} |
+--------+-------------+
That’s more like it :)
Note the addition of GROUP BY
in the last query. What happens if we omit this clause?
ERROR: column "posts.title" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT title, array_agg(comments.id) AS comment_ids
This makes sense of course: array functions can only work with an array of results.