Upserting in Postges: It's not just all or nothing
Upserting in Postgres lets you insert a new value or update an existing value in a single atomic statement. This avoids needing two separate statements, read and update/insert, wrapped in a transaction.
If you want to only update particular fields, and not all fields in the row, you can do that to.
Let’s explore upserting into a table tracking a location’s high temperatures per day.
CREATE TABLE IF NOT EXISTS temp_agg
(
id integer NOT NULL,
day date,
high NUMERIC(5, 2),
CONSTRAINT id_day UNIQUE (id, day)
)
Every hour our application polls for the temperature at a set of location endpoints and inserts the current temperature. If a location endpoint is down, unresponsive, or returning “bad” data, it will be skipped and re-polled in the next hourly run.
Our first attempt at writing the temperature INSERT
statement starts out like:
INSERT INTO temp_agg
(id, day, high)
VALUES
(1, DATE('2022-06-01'), 15.00),
(2, DATE('2022-06-01'), 15.50)
This runs and the table now looks like:
id | high | day |
---|---|---|
1 | 15.00 | 2022-06-01 |
2 | 15.50 | 2022-06-01 |
But the next time the application tries to insert data for 2022-06-01
the INSERT statement returns the error duplicate key value violates unique constraint "id_day"
because inserting the data would conflict with the constraint that each row have a unique (id, day) pair.
On Conflict…
The ON CONFLICT clause acts a bit like a row based “catch” of a “try catch…” statement. try
to insert the row, catch and handle any conflict
. In the ON CONFLICT
clause you declare what conflict you are interested in and how you want to deal with it at the row level.
There are two options for using ON CONFLICT, DO NOTHING
and DO UPDATE
ON CONFLICT DO NOTHING
ON CONFLICT ... DO NOTHING
inserts each row that doesn’t have a conflict and skips each row that does.
The following SQL has conflicts on ids 1 and 2 on the day ‘2022-06-01’. It ignores those two rows and inserts the row with id 3.
INSERT INTO temp_agg
(id, high, day)
VALUES
(1, 20.00, DATE('2022-06-01')),
(2, 20.00, DATE('2022-06-01')),
(3, 21.00, DATE('2022-06-01'))
ON CONFLICT (id, day)
DO NOTHING
Result:
id | high | day |
---|---|---|
1 | 15.00 | 2022-06-01 |
2 | 15.50 | 2022-06-01 |
3 | 21.00 | 2022-06-01 |
ON CONFLICT DO UPDATE
ON CONFLICT ... DO UPDATE
inserts each row that doesn’t have a conflict and updates each row that does.
In the DO UPDATE
statement you have access to both the existing row data and the new row data, though referencing them is not quite obvious. You reference them like so:
Existing data => <table_name>.<field>
ie temp_agg.high
If you have aliased the table, for example INSERT INTO temp_agg as t
, then you will reference the existing data as t.high
New data => EXCLUDED.<field>
ie EXCLUDED.high
If your insert statement already references a table named excluded
you need to alias it to avoid any naming conflicts.
The result we’re looking for is to update the high temperature for locations 1 & 2 on day 2022-06-01
but location 3 is unchanged.
The following SQL statements achieve this using different semantics
In the first example, the conflict rows are only updated if the new (EXCLUDED
) high temperature is greater than the existing high temperature.
INSERT INTO temp_agg
(id, high, day)
VALUES
(1, 20.00, DATE('2022-06-01')),
(2, 20.00, DATE('2022-06-01')),
(3, 18.00, DATE('2022-06-01'))
ON CONFLICT (id, day)
DO UPDATE SET (id, high, day) = (temp_agg.id, EXCLUDED.high, temp_agg.day)
WHERE EXCLUDED.high > temp_agg.high
In the second example the GREATEST
function is used to set the high temperature instead of filtering with a WHERE
clause.
INSERT INTO temp_agg
(id, high, day)
VALUES
(1, 20.00, DATE('2022-06-01')),
(2, 20.00, DATE('2022-06-01')),
(3, 18.00, DATE('2022-06-01'))
ON CONFLICT (id, day)
DO UPDATE SET (id, high, day) = (temp_agg.id, GREATEST(EXCLUDED.high, temp_agg.high), temp_agg.day)
The result is that the high temperature for locations 1 & 2 on day 2022-06-01
have been updated and the high temp for location 3 has stayed the same.
Result:
id | high | day |
---|---|---|
1 | 20.00 | 2022-06-01 |
2 | 20.00 | 2022-06-01 |
3 | 21.00 | 2022-06-01 |