WCF service and time outs
An application that I work on has been in production for several years, but has been expericing time outs between the UI (ASP WebForms) and the services (WCF) in increasing frequency. The amount of users and the amount of data has been increased quite significantly since we've rolled it out. Originally, we attributed the problem to an underperforming SQL Server cluster (which the services uses), and migrated to a much more powerful cluster. However, the problem persists and appears to be growing in the number of timeouts we receive per day. We have engaged our DBAs, but are unable to isolate a bottleneck on the SQL Server. I have also performed testing by calling the services directly through a test console application, and the problem also presents itself there, leading me to think the problem is not with WebForms, but with the WCF services. I am at a loss on how troubleshoot this theory (and begin to resolve it), as it only appears under seemingly high-traffic situations. Is there a known issue with WCF and scalability, or is it more likely that the current implementation of the services is flawed?
I suspect the issue is related to the interaction between the SQL server and the Application layer. I will assume that you aren't using APM in your application, since you make no mention of it. Not to mention, in most people's mind, APM is for making the UI work faster, right? The Science Bit, Concentrate ASP.Net/IIS gives you by default a limited number of threads. Remember threads are expensive, each one takes up scheduler time and take up memory in the form of various stacks and what nots. This is pretty much a flaw of all computers in the world. In .net all work is done on threads. Hence when there are no free threads, IIS will put requests onto a queue to wait for a thread. Now, normally you would think if all the threads are in use, we would have high CPU utilization. This is wrong. Generally with modern CPUs, most of the time they are I/O starved, which means they are sleeping. In this case, what generally happens is a few requests come in. Each kick of their own thread, who then hit the database. Then they wait (sleep). Your CPU util hits 0%, whilst all your threads are in use. More requests come in. They are put in a queue. The database requests return, and some requests are dequeued (but not all). Then the requests on the queue time out. Moar Thread! How do we solve this problem? Obviously we want to get as much of the work off the IIS queue as quickly as possible, and onto the SQL server, right? So obviously the answer is to increase the number of threads, right? Now, as I previously noted, threads are expensive, so if you have a massively powerful SQL server, your application server is still going to give up the ghost well before the SQL server does, whilst still having 0% CPU util. Clearly more threads will not get us where we want. Async/await magic sauce! The accepted solution is to actually use asynchronous programming. But isn't async/await for UI and parallelisation? No. Its most often demonstrated in UI and parallelisation because its where the gains are easiest to visualize. Its much harder to mock up a 1M hits/s service in a 1 hour demo. So when we send a query to the database, instead of sleeping on the result, the thread jumps back to the IIS queue to service the next customer. When the result returns the next available thread is informed and it handles it. Thus with async/await for your database calls you can max out your CPU/network util and ignore the latency to the database. In fact, you will find that you should have shifted the bottleneck to the SQL server. But where is my API? Ah... Here is the problem. Async/Await is pretty darn new. You need VS 2012 and .net 4.5 (well sorta) to use it. Also most Database APIs don't yet have full support for Async/Await. For example Entity Framework, Microsoft's flagship DB technology only supports async/await in EF 6.0 ALPHA (as of writing this), and most likely only for MS SQL SERVER.
Does Dapper request a full object from the database prior to Linq operations?
SQL Server to Postgresql - Database migration - Tool issues
SQL Agent Job to run .bat with gsutil rsync
Tool/script to clone SQL Server database with reduced rights (without backup functionality)?
SQL Server function with parameters
Bulk Load Data Conversion Error - Can't Find Answer
when i exec select on linux with odbc, hang on '__select_nocancel'
The system cannot find the file specified on Production server
An Error when trying to connect to SQL Server
Temporary table not created from dynamic query execution
How to process the data in the table, which is frequently inserted
Combine two columns and input the result in a different columns using SQL server
SQL Query to accept time difference
How to change my T-SQL query to overwrite a csv file rather than append data to it?
How to join multiple columns with only one foreign key using linq