Some applications need an "upsert" operation: update a row if it exists else insert a new row. The following is a summary of solutions I found online. 
Suppose we have the following table in a SQL Server database that is mapped to a C# class using Entity Framework. 
CREATE TABLE dbo.Foo
(
     ID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY,
     Bar int NOT NULL
);
GO
The following C# code is a naive implementation of the upsert logic: 
int id =3; 
int bar = 199;
Foo foo = this.dbContext.Foos.find(id);
if (foo == null) 
{
     foo = new Foo 
     { 
          ID = id,
          Bar = bar
     };
 
     this.dbContext.Foos.Add(foo);
} 
else 
{
     foo.Bar = bar;
}
this.dbContext.SaveChanges();
The above code doesn't work correctly when multiple users run the same code simultaneously. A duplicated key error happens when more than one users  try to insert different rows with the same key value.  A possible solution is to catch the duplication key error and try the above operation again. The benefit is that we handle everything in C# code. The disadvantage is the added complexity because of the error handling and try logic. As described below, it has race conditions that don't work correctly for non-idempotent update. 
Adding C# synchronization such as a lock in a static class variable helps. However, the solution breaks when web garden is enabled in the hosting ASP.NET application pool. Each worker process has its own copy of a static class variable. To solve the problem, one needs to use named mutex to enable inter-processes synchronization. The named mutex solution fails when we run multiple Web applications in different server. Another cons is that  the above C# code requires two round trips to the back end database. 
The right solution is to move the upsert operation to the back end database engine and add appropriate database locks. The tricky thing is to find the right "locks". Though work correctly, using a table level exclusive lock hurts performance. This Conditional INSERT/UPDATE blog gives a good solution:  
CREATE PROCEDURE dbo.Insert_Or_Update_Foo
      @ID int,
      @Bar int
AS
SET NOCOUNT, XACT_ABORT ON
BEGIN TRAN
IF EXISTS(SELECT * FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK) WHERE ID = @ID)
    BEGIN
        UPDATE dbo.Foo
        SET bar = @bar
        WHERE ID = @ID
   END
ELSE
    BEGIN
        INSERT INTO dbo.Foo (ID, Bar)
        VALUES (@ID, @Bar)
    END
COMMIT
RETURN @@ERROR
The key point in the above solution is setting the UPDLOCK and HOLDLOCK locks to avoid race conditions. At any time, only one instance of the stored procedure is allowed to run. The blog author also describes another solution using the SQL merge command:
CREATE PROCEDURE dbo.Merge_Foo
     @ID int,
     @Bar int
AS 
     SET NOCOUNT, XACT_ABORT ON;  
  
 MERGE dbo.Foo WITH (HOLDLOCK) AS f
 USING (SELECT @ID AS ID, @Bar AS Bar) AS new_foo
      ON f.ID = new_foo.ID
 WHEN MATCHED THEN
      UPDATE SET f.Bar = new_foo.Bar
 WHEN NOT MATCHED THEN
      INSERT (ID, Bar)
      VALUES (new_foo.ID, new_foo.Bar);
  
 RETURN @@ERROR;
GO
My original solution was to use a similar merge command but didn't set the HOLDLOCK option. As the blog author pointed out: without HOLDLOCK, the merge command releases its UPDLOCK after the existence checking and opens the door for multiple concurrent inserts. 
Another possible solution that doesn't use an explicit lock is to insert first, check error code for duplicated key error and update in the second step.
BEGIN TRY
    INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() = 2627
        UPDATE dbo.Foo  SET bar = @bar  WHERE ID = @ID
END CATCH
This stackoverflow  answer discussed this solution and its potential performance issues of error catching. Except the first insert, all following updates incur the expense of error catching. This solution only works in a write-once-read-many situation that has very few updates. If there are many update operations, it is better to do an extra update first. 
UPDATE dbo.Foo  SET bar = @bar  WHERE ID = @ID
IF @@ROWCOUNT = 0
BEGIN 
    BEGIN TRY
        INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
    END TRY
    BEGIN CATCH
        IF ERROR_NUMBER() = 2627
             UPDATE dbo.Foo  SET bar = @bar  WHERE ID = @ID
    END CATCH
END    
However, if the update is not an idempotent operation such as the above simple value setting, the above TRY-CATCH solutions (including both T-SQL stored procedure and the C# error handling solution) still have race conditions. For example, if the update is "SET bar = @bar + 1", those solutions don't work correctly. As a general upsert solution, the two stored procedures that use UPDLOCK and HOLDLOCK don't have a race condition and work correctly. 
 
Wow,I think this is a very good solution to this answer here and I genuinely feel you have taken a lot of effort by providing answer with coding.
ReplyDeleteSSIS Upsert