Upserting in Postges: It's not just all or nothing
Upserting in Postgres lets you insert a new value or update an existing value in a single atomic statement. This avoids needing two separate statements, read and update/insert, wrapped in a transaction.
If you want to only update particular fields, and not all fields in the row, you can do that to.
Let’s explore upserting into a table tracking a location’s high temperatures per day.
CREATE TABLE IF NOT EXISTS temp_agg ( id integer NOT NULL, day date, high NUMERIC(5, 2), CONSTRAINT id_day UNIQUE (id, day) )
Every hour our application polls for the temperature at a set of location endpoints and inserts the current temperature. If a location endpoint is down, unresponsive, or returning “bad” data, it will be skipped and re-polled in the next hourly run.
Our first attempt at writing the temperature
INSERT statement starts out like:
INSERT INTO temp_agg (id, day, high) VALUES (1, DATE('2022-06-01'), 15.00), (2, DATE('2022-06-01'), 15.50)
This runs and the table now looks like:
But the next time the application tries to insert data for
2022-06-01 the INSERT statement returns the error
duplicate key value violates unique constraint "id_day" because inserting the data would conflict with the constraint that each row have a unique (id, day) pair.
The ON CONFLICT clause acts a bit like a row based “catch” of a “try catch…” statement.
try to insert the row,
catch and handle any conflict. In the
ON CONFLICT clause you declare what conflict you are interested in and how you want to deal with it at the row level.
There are two options for using ON CONFLICT,
DO NOTHING and
ON CONFLICT DO NOTHING
ON CONFLICT ... DO NOTHING inserts each row that doesn’t have a conflict and skips each row that does.
The following SQL has conflicts on ids 1 and 2 on the day ‘2022-06-01’. It ignores those two rows and inserts the row with id 3.
INSERT INTO temp_agg (id, high, day) VALUES (1, 20.00, DATE('2022-06-01')), (2, 20.00, DATE('2022-06-01')), (3, 21.00, DATE('2022-06-01')) ON CONFLICT (id, day) DO NOTHING
ON CONFLICT DO UPDATE
ON CONFLICT ... DO UPDATE inserts each row that doesn’t have a conflict and updates each row that does.
DO UPDATE statement you have access to both the existing row data and the new row data, though referencing them is not quite obvious. You reference them like so:
Existing data =>
If you have aliased the table, for example
INSERT INTO temp_agg as t, then you will reference the existing data as
New data =>
If your insert statement already references a table named
excluded you need to alias it to avoid any naming conflicts.
The result we’re looking for is to update the high temperature for locations 1 & 2 on day
2022-06-01 but location 3 is unchanged.
The following SQL statements achieve this using different semantics
In the first example, the conflict rows are only updated if the new (
EXCLUDED) high temperature is greater than the existing high temperature.
INSERT INTO temp_agg (id, high, day) VALUES (1, 20.00, DATE('2022-06-01')), (2, 20.00, DATE('2022-06-01')), (3, 18.00, DATE('2022-06-01')) ON CONFLICT (id, day) DO UPDATE SET (id, high, day) = (temp_agg.id, EXCLUDED.high, temp_agg.day) WHERE EXCLUDED.high > temp_agg.high
In the second example the
GREATEST function is used to set the high temperature instead of filtering with a
INSERT INTO temp_agg (id, high, day) VALUES (1, 20.00, DATE('2022-06-01')), (2, 20.00, DATE('2022-06-01')), (3, 18.00, DATE('2022-06-01')) ON CONFLICT (id, day) DO UPDATE SET (id, high, day) = (temp_agg.id, GREATEST(EXCLUDED.high, temp_agg.high), temp_agg.day)
The result is that the high temperature for locations 1 & 2 on day
2022-06-01 have been updated and the high temp for location 3 has stayed the same.