Friday, April 18, 2014

Upsert : Update a row if it exists else insert a new row

Some applications need an "upsert" operation: update a row if it exists else insert a new row. The following is a summary of solutions I found online. 

Suppose we have the following table in a SQL Server database that is mapped to a C# class using Entity Framework. 

CREATE TABLE dbo.Foo
(
ID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY,
Bar int NOT NULL
);
GO

The following C# code is a naive implementation of the upsert logic: 

int id =3; 
int bar = 199;
Foo foo = this.dbContext.Foos.find(id);
if (foo == null) 
{
foo = new Foo 

ID = id,
Bar = bar
};

this.dbContext.Foos.Add(foo);

else 
{
foo.Bar = bar;
}


this.dbContext.SaveChanges();

The above code doesn't work correctly when multiple users run the same code simultaneously. A duplicated key error happens when more than one users  try to insert different rows with the same key value.  A possible solution is to catch the duplication key error and try the above operation again. The benefit is that we handle everything in C# code. The disadvantage is the added complexity because of the error handling and try logic. As described below, it has race conditions that don't work correctly for non-idempotent update. 

Adding C# synchronization such as a lock in a static class variable helps. However, the solution breaks when web garden is enabled in the hosting ASP.NET application pool. Each worker process has its own copy of a static class variable. To solve the problem, one needs to use named mutex to enable inter-processes synchronization. The named mutex solution fails when we run multiple Web applications in different server. Another cons is that  the above C# code requires two round trips to the back end database. 

The right solution is to move the upsert operation to the back end database engine and add appropriate database locks. The tricky thing is to find the right "locks". Though work correctly, using a table level exclusive lock hurts performance. This Conditional INSERT/UPDATE blog gives a good solution:  

CREATE PROCEDURE dbo.Insert_Or_Update_Foo
      @ID int,
      @Bar int
AS

SET NOCOUNT, XACT_ABORT ON

BEGIN TRAN

IF EXISTS(SELECT * FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK) WHERE ID = @ID)
  BEGIN
      UPDATE dbo.Foo
      SET bar = @bar
      WHERE ID = @ID
END
ELSE
BEGIN
      INSERT INTO dbo.Foo (ID, Bar)
      VALUES (@ID, @Bar)
END

COMMIT

RETURN @@ERROR

The key point in the above solution is setting the UPDLOCK and HOLDLOCK locks to avoid race conditions. At any time, only one instance of the stored procedure is allowed to run. The blog author also describes another solution using the SQL merge command:

CREATE PROCEDURE dbo.Merge_Foo
     @ID int,
     @Bar int
AS 
SET NOCOUNT, XACT_ABORT ON;  
 
MERGE dbo.Foo WITH (HOLDLOCK) AS f
USING (SELECT @ID AS ID, @Bar AS Bar) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE SET f.Bar = new_foo.Bar
WHEN NOT MATCHED THEN
INSERT (ID, Bar)
VALUES (new_foo.ID, new_foo.Bar);

RETURN @@ERROR;

GO

My original solution was to use a similar merge command but didn't set the HOLDLOCK option. As the blog author pointed out: without HOLDLOCK, the merge command releases its UPDLOCK after the existence checking and opens the door for multiple concurrent inserts. 

Another possible solution that doesn't use an explicit lock is to insert first, check error code for duplicated key error and update in the second step.

BEGIN TRY
 INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE dbo.Foo  SET bar = @bar  WHERE ID = @ID
END CATCH

This stackoverflow  answer discussed this solution and its potential performance issues of error catching. Except the first insert, all following updates incur the expense of error catching. This solution only works in a write-once-read-many situation that has very few updates. If there are many update operations, it is better to do an extra update first. 

UPDATE dbo.Foo  SET bar = @bar  WHERE ID = @ID
IF @@ROWCOUNT = 0
BEGIN 
    BEGIN TRY
INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
    END TRY
    BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE dbo.Foo  SET bar = @bar  WHERE ID = @ID
    END CATCH
END    

However, if the update is not an idempotent operation such as the above simple value setting, the above TRY-CATCH solutions (including both T-SQL stored procedure and the C# error handling solution) still have race conditions. For example, if the update is "SET bar = @bar + 1", those solutions don't work correctly. As a general upsert solution, the two stored procedures that use UPDLOCK and HOLDLOCK don't have a race condition and work correctly. 

1 comment:

  1. Wow,I think this is a very good solution to this answer here and I genuinely feel you have taken a lot of effort by providing answer with coding.

    SSIS Upsert

    ReplyDelete