Upserting in Postgres lets you insert a new value or update an existing value in a single atomic statement. This avoids needing two separate statements, read and update/insert, wrapped in a transaction.

If you want to only update particular fields, and not all fields in the row, you can do that to.

Let’s explore upserting into a table tracking a location’s high temperatures per day.

CREATE TABLE IF NOT EXISTS temp_agg
(
    id integer NOT NULL,
    day date,
    high NUMERIC(5, 2),
    CONSTRAINT id_day UNIQUE (id, day)
)

Every hour our application polls for the temperature at a set of location endpoints and inserts the current temperature. If a location endpoint is down, unresponsive, or returning “bad” data, it will be skipped and re-polled in the next hourly run.

Our first attempt at writing the temperature INSERT statement starts out like:

INSERT INTO temp_agg 
(id, day, high)
VALUES
(1, DATE('2022-06-01'), 15.00),
(2, DATE('2022-06-01'), 15.50)

This runs and the table now looks like:

id high day
1 15.00 2022-06-01
2 15.50 2022-06-01

But the next time the application tries to insert data for 2022-06-01 the INSERT statement returns the error duplicate key value violates unique constraint "id_day" because inserting the data would conflict with the constraint that each row have a unique (id, day) pair.

On Conflict…

The ON CONFLICT clause acts a bit like a row based “catch” of a “try catch…” statement. try to insert the row, catch and handle any conflict. In the ON CONFLICT clause you declare what conflict you are interested in and how you want to deal with it at the row level.

There are two options for using ON CONFLICT, DO NOTHING and DO UPDATE

ON CONFLICT DO NOTHING

ON CONFLICT ... DO NOTHING inserts each row that doesn’t have a conflict and skips each row that does.

The following SQL has conflicts on ids 1 and 2 on the day ‘2022-06-01’. It ignores those two rows and inserts the row with id 3.

INSERT INTO temp_agg 
(id, high, day)
VALUES
(1, 20.00, DATE('2022-06-01')),
(2, 20.00, DATE('2022-06-01')),
(3, 21.00, DATE('2022-06-01'))
ON CONFLICT (id, day)
DO NOTHING

Result:

id high day
1 15.00 2022-06-01
2 15.50 2022-06-01
3 21.00 2022-06-01

ON CONFLICT DO UPDATE

ON CONFLICT ... DO UPDATE inserts each row that doesn’t have a conflict and updates each row that does.

In the DO UPDATE statement you have access to both the existing row data and the new row data, though referencing them is not quite obvious. You reference them like so:

Existing data => <table_name>.<field> ie temp_agg.high

If you have aliased the table, for example INSERT INTO temp_agg as t, then you will reference the existing data as t.high

New data => EXCLUDED.<field> ie EXCLUDED.high

If your insert statement already references a table named excluded you need to alias it to avoid any naming conflicts.

The result we’re looking for is to update the high temperature for locations 1 & 2 on day 2022-06-01 but location 3 is unchanged.

The following SQL statements achieve this using different semantics

In the first example, the conflict rows are only updated if the new (EXCLUDED) high temperature is greater than the existing high temperature.

INSERT INTO temp_agg 
(id, high, day)
VALUES
(1, 20.00, DATE('2022-06-01')),
(2, 20.00, DATE('2022-06-01')),
(3, 18.00, DATE('2022-06-01'))
ON CONFLICT (id, day)
DO UPDATE SET (id, high, day) = (temp_agg.id, EXCLUDED.high, temp_agg.day) 
WHERE EXCLUDED.high > temp_agg.high

In the second example the GREATEST function is used to set the high temperature instead of filtering with a WHERE clause.

INSERT INTO temp_agg 
(id, high, day)
VALUES
(1, 20.00, DATE('2022-06-01')),
(2, 20.00, DATE('2022-06-01')),
(3, 18.00, DATE('2022-06-01'))
ON CONFLICT (id, day)
DO UPDATE SET (id, high, day) = (temp_agg.id, GREATEST(EXCLUDED.high, temp_agg.high), temp_agg.day) 

The result is that the high temperature for locations 1 & 2 on day 2022-06-01 have been updated and the high temp for location 3 has stayed the same.

Result:

id high day
1 20.00 2022-06-01
2 20.00 2022-06-01
3 21.00 2022-06-01