This is my answer to a stackoverflow question:
One IIS server may have multiple application pools.
One web application binds to one application pool.
One application pool may have more than one worker process (when Web Garden is enable).
On worker process can have multiple app domains. One app domain lives only in one worker process.
One app domain may have multiple threads. One thread can be shared by different app domains in different time.
The meaning to ASP.NET developers: to make your web site scalable, don't use in-proc session and don't use static class variable lock for synchronization.
Good code brings value. However, code by itself is a liability. Making code correct, reliable, maintainable and minimum is a creative and fun job. Among many programming languages used in the past 20 years, I like C# for its LINQ and async/await features. I like JavaScript for its functional root and being the only language for full stack web application development.
Friday, April 18, 2014
Upsert : Update a row if it exists else insert a new row
Some applications need an "upsert" operation: update a row if it exists else insert a new row. The following is a summary of solutions I found online.
Suppose we have the following table in a SQL Server database that is mapped to a C# class using Entity Framework.
CREATE TABLE dbo.Foo
(
ID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY,
Bar int NOT NULL
);
GO
The following C# code is a naive implementation of the upsert logic:
int id =3;
int bar = 199;
Foo foo = this.dbContext.Foos.find(id);
if (foo == null)
{
foo = new Foo
{
ID = id,
Bar = bar
};
this.dbContext.Foos.Add(foo);
}
else
{
foo.Bar = bar;
}
this.dbContext.SaveChanges();
The above code doesn't work correctly when multiple users run the same code simultaneously. A duplicated key error happens when more than one users try to insert different rows with the same key value. A possible solution is to catch the duplication key error and try the above operation again. The benefit is that we handle everything in C# code. The disadvantage is the added complexity because of the error handling and try logic. As described below, it has race conditions that don't work correctly for non-idempotent update.
Adding C# synchronization such as a lock in a static class variable helps. However, the solution breaks when web garden is enabled in the hosting ASP.NET application pool. Each worker process has its own copy of a static class variable. To solve the problem, one needs to use named mutex to enable inter-processes synchronization. The named mutex solution fails when we run multiple Web applications in different server. Another cons is that the above C# code requires two round trips to the back end database.
The right solution is to move the upsert operation to the back end database engine and add appropriate database locks. The tricky thing is to find the right "locks". Though work correctly, using a table level exclusive lock hurts performance. This Conditional INSERT/UPDATE blog gives a good solution:
CREATE PROCEDURE dbo.Insert_Or_Update_Foo
@ID int,
@Bar int
AS
SET NOCOUNT, XACT_ABORT ON
BEGIN TRAN
IF EXISTS(SELECT * FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK) WHERE ID = @ID)
BEGIN
UPDATE dbo.Foo
SET bar = @bar
WHERE ID = @ID
END
ELSE
BEGIN
INSERT INTO dbo.Foo (ID, Bar)
VALUES (@ID, @Bar)
END
COMMIT
RETURN @@ERROR
The key point in the above solution is setting the UPDLOCK and HOLDLOCK locks to avoid race conditions. At any time, only one instance of the stored procedure is allowed to run. The blog author also describes another solution using the SQL merge command:
CREATE PROCEDURE dbo.Merge_Foo
@ID int,
@Bar int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo WITH (HOLDLOCK) AS f
USING (SELECT @ID AS ID, @Bar AS Bar) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE SET f.Bar = new_foo.Bar
WHEN NOT MATCHED THEN
INSERT (ID, Bar)
VALUES (new_foo.ID, new_foo.Bar);
RETURN @@ERROR;
GO
My original solution was to use a similar merge command but didn't set the HOLDLOCK option. As the blog author pointed out: without HOLDLOCK, the merge command releases its UPDLOCK after the existence checking and opens the door for multiple concurrent inserts.
Another possible solution that doesn't use an explicit lock is to insert first, check error code for duplicated key error and update in the second step.
BEGIN TRY
INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE dbo.Foo SET bar = @bar WHERE ID = @ID
END CATCH
This stackoverflow answer discussed this solution and its potential performance issues of error catching. Except the first insert, all following updates incur the expense of error catching. This solution only works in a write-once-read-many situation that has very few updates. If there are many update operations, it is better to do an extra update first.
UPDATE dbo.Foo SET bar = @bar WHERE ID = @ID
IF @@ROWCOUNT = 0
BEGIN
BEGIN TRY
INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE dbo.Foo SET bar = @bar WHERE ID = @ID
END CATCH
END
However, if the update is not an idempotent operation such as the above simple value setting, the above TRY-CATCH solutions (including both T-SQL stored procedure and the C# error handling solution) still have race conditions. For example, if the update is "SET bar = @bar + 1", those solutions don't work correctly. As a general upsert solution, the two stored procedures that use UPDLOCK and HOLDLOCK don't have a race condition and work correctly.
Suppose we have the following table in a SQL Server database that is mapped to a C# class using Entity Framework.
CREATE TABLE dbo.Foo
(
ID int NOT NULL CONSTRAINT PK_Foo PRIMARY KEY,
Bar int NOT NULL
);
GO
The following C# code is a naive implementation of the upsert logic:
int id =3;
int bar = 199;
Foo foo = this.dbContext.Foos.find(id);
if (foo == null)
{
foo = new Foo
{
ID = id,
Bar = bar
};
this.dbContext.Foos.Add(foo);
}
else
{
foo.Bar = bar;
}
this.dbContext.SaveChanges();
The above code doesn't work correctly when multiple users run the same code simultaneously. A duplicated key error happens when more than one users try to insert different rows with the same key value. A possible solution is to catch the duplication key error and try the above operation again. The benefit is that we handle everything in C# code. The disadvantage is the added complexity because of the error handling and try logic. As described below, it has race conditions that don't work correctly for non-idempotent update.
Adding C# synchronization such as a lock in a static class variable helps. However, the solution breaks when web garden is enabled in the hosting ASP.NET application pool. Each worker process has its own copy of a static class variable. To solve the problem, one needs to use named mutex to enable inter-processes synchronization. The named mutex solution fails when we run multiple Web applications in different server. Another cons is that the above C# code requires two round trips to the back end database.
The right solution is to move the upsert operation to the back end database engine and add appropriate database locks. The tricky thing is to find the right "locks". Though work correctly, using a table level exclusive lock hurts performance. This Conditional INSERT/UPDATE blog gives a good solution:
CREATE PROCEDURE dbo.Insert_Or_Update_Foo
@ID int,
@Bar int
AS
SET NOCOUNT, XACT_ABORT ON
BEGIN TRAN
IF EXISTS(SELECT * FROM dbo.Foo WITH (UPDLOCK, HOLDLOCK) WHERE ID = @ID)
BEGIN
UPDATE dbo.Foo
SET bar = @bar
WHERE ID = @ID
END
ELSE
BEGIN
INSERT INTO dbo.Foo (ID, Bar)
VALUES (@ID, @Bar)
END
COMMIT
RETURN @@ERROR
The key point in the above solution is setting the UPDLOCK and HOLDLOCK locks to avoid race conditions. At any time, only one instance of the stored procedure is allowed to run. The blog author also describes another solution using the SQL merge command:
CREATE PROCEDURE dbo.Merge_Foo
@ID int,
@Bar int
AS
SET NOCOUNT, XACT_ABORT ON;
MERGE dbo.Foo WITH (HOLDLOCK) AS f
USING (SELECT @ID AS ID, @Bar AS Bar) AS new_foo
ON f.ID = new_foo.ID
WHEN MATCHED THEN
UPDATE SET f.Bar = new_foo.Bar
WHEN NOT MATCHED THEN
INSERT (ID, Bar)
VALUES (new_foo.ID, new_foo.Bar);
RETURN @@ERROR;
GO
My original solution was to use a similar merge command but didn't set the HOLDLOCK option. As the blog author pointed out: without HOLDLOCK, the merge command releases its UPDLOCK after the existence checking and opens the door for multiple concurrent inserts.
Another possible solution that doesn't use an explicit lock is to insert first, check error code for duplicated key error and update in the second step.
BEGIN TRY
INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE dbo.Foo SET bar = @bar WHERE ID = @ID
END CATCH
This stackoverflow answer discussed this solution and its potential performance issues of error catching. Except the first insert, all following updates incur the expense of error catching. This solution only works in a write-once-read-many situation that has very few updates. If there are many update operations, it is better to do an extra update first.
UPDATE dbo.Foo SET bar = @bar WHERE ID = @ID
IF @@ROWCOUNT = 0
BEGIN
BEGIN TRY
INSERT INTO dbo.Foo (ID, Bar) VALUES (@ID, @Bar)
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
UPDATE dbo.Foo SET bar = @bar WHERE ID = @ID
END CATCH
END
However, if the update is not an idempotent operation such as the above simple value setting, the above TRY-CATCH solutions (including both T-SQL stored procedure and the C# error handling solution) still have race conditions. For example, if the update is "SET bar = @bar + 1", those solutions don't work correctly. As a general upsert solution, the two stored procedures that use UPDLOCK and HOLDLOCK don't have a race condition and work correctly.
Tuesday, April 8, 2014
JavaScript note: prototype, [[Protyotype]] and prototype property
JavaScript has two types of objects: function object and non-function object. Conceptually, all objects have a prototype (NOT A PROTOTYPE PROPERTY). Internally, JavaScript names an object's prototype as [[Prototype]].
There are two approaches to get any object (including non-function object)'s [[prototype]]: theObject.getPrototypeOf() method and the
__proto__
property. The __proto__
property is supported by many browsers and Node.js. It is to be standardized in ECMAScript 6.
Only a function (a callable) object has the prototype property. This prototype property is a regular property that has no direct relationship with the function's own [[prototype]]. When used as a constructor ( after the new operator), the function's prototype property will be assigned to the [[Prototype]] of a newly created object. In a non-function object, the prototype property is undefined . For example,
var objectOne = {x: 5}, objectTwo = Object.create({y: 6});
Both objectOne and objectTwo are non-function objects therefore they don't have a prototype property.
Thursday, April 3, 2014
A better way to structure a .net solution
Being able to quickly navigate source files in a large .net solution that has many projects is highly desirable for a programmer. My experience with some projects such as nopCommerce (http://www.nopcommerce.com/) was not a good one. Often it took me two or three tries to the right project folder for a namespace.
This YoutTube video made by Roger Harford describes a good idea to structure a .net project. Most of the 15 minutes 47 seconds video discusses the reasons why the Visual Studio default structure is not a good one when when source control and multiple projects are added. For example, we usually don't want to use the same name for a solution and its first project. The method to structure a .net solution is rather simple and is summarized below.
When you create a solution in Visual Studio at the beginning, choose the Installed --> Templates -->Other Project Types --> Visual Studio Solution --> Blank Solution. Then name your solution based on your company and product name: MyCompany.MyProduct. Visual Studio will create a folder and solution files based on the new name.
The "Blank Solution" template creates an empty solution. You can add to this solution a number of projects that are named after your namespace. For example: MyCompany.MyProduct.Web, MyCompany.MyProduct.Domain, and MyCompany.MyProduct.Database are good names for a MVC Web site project.
When you add more projects and source files, the above solution structure allows a clear and natural mapping between many file names, folder names and namespaces. Additionally, it is easy to move files and folders around and keep a consistent mapping between a namespace and its files.
This YoutTube video made by Roger Harford describes a good idea to structure a .net project. Most of the 15 minutes 47 seconds video discusses the reasons why the Visual Studio default structure is not a good one when when source control and multiple projects are added. For example, we usually don't want to use the same name for a solution and its first project. The method to structure a .net solution is rather simple and is summarized below.
When you create a solution in Visual Studio at the beginning, choose the Installed --> Templates -->Other Project Types --> Visual Studio Solution --> Blank Solution. Then name your solution based on your company and product name: MyCompany.MyProduct. Visual Studio will create a folder and solution files based on the new name.
The "Blank Solution" template creates an empty solution. You can add to this solution a number of projects that are named after your namespace. For example: MyCompany.MyProduct.Web, MyCompany.MyProduct.Domain, and MyCompany.MyProduct.Database are good names for a MVC Web site project.
When you add more projects and source files, the above solution structure allows a clear and natural mapping between many file names, folder names and namespaces. Additionally, it is easy to move files and folders around and keep a consistent mapping between a namespace and its files.
Subscribe to:
Posts (Atom)