Tuesday, August 16, 2005 - Posts

Scalability Testing - Part 1 of... Crap, who knows...

Alright, so here’s the first of my technical blogs. 

This blog is going to focus on a customer that I recently worked with and the project that I was placed on there.  The customer will remain unnamed for this blog but the situation seems to be very typical in home grown 3-tier applications or software developed by vendors.  This project was to design and perform a test harness for a scalability test against this company’s application.  If the application bottlenecked the requirement was to identify it and let the customer re-architect that tier or rewrite that particular piece of the application that was causing the bottleneck, then re-test with the new code.

I’m gonna be 2 part’in this topic so look for the next one too.  I figure that most of you will already know how to do this type of thing but hopefully I say something worth while at some point to make this thing interesting.

First, I’ll give a run down of the environment and then I will talk about what is important about that information and what is not so important.  Being able to identify the not-so-important information is almost as important as identifying the important things.

Customer A wants to test their application with 100 users, 150 users and 200 users.  Customer A has an application that runs in a 3-tier architecture that is database agnostic (can run on multiple different database platforms Ex: SQL Server, Oracle, MySQL, etc.).  The web tier runs on Windows 2003 and IIS 6.0.  The web tier will send requests to the application tier which is also running on Windows 2003 and the application requests run through a cobol compiler so that the application tier also remains OS agnostic.  The single application server will send static SQL statements (not dynamic, meaning not built on the fly by the application server) to the database server. 

There were 2 web servers used for the 100 user and 150 user tests, and 3 web servers for the 200 user test, all web servers are current dual proc 3.8GHz, 2GB RAM boxes.  There is a single application server and is a current 4-way 3.5GHz box with hyper-threading and 3.5GB of RAM.  The database server is a current 4-way 3.0GHz server with 3GB of RAM.

Alright, so the important stuff point #1 is the fact that there are three tests, 100, 150 and 200 users, these are the metrics for measuring scalability and the metric(s) are important to identify before you can even begin developing any scalability test. 

Important stuff point #2 and really point #3 too: is the fact that the application is that the database and the OS are agnostic.  This introduces a huge scalability issue most of the time, because, once you have an application that is agnostic 9 times out of 10 the application will not be optimized for any specific database platform but instead the development is focused on ensuring that the functionality works on multiple database or OS platforms and the performance is "good" universally.  With the application tier being OS agnostic, more times than not, it introduces scalability issues with additional overhead, especially if the application is a UNIX based application and is ported to windows (which it was).  When looking at the application server you can usually identify the additional overhead in the areas of CPU utilization and context switching.  Context switching occurs when application threads are switched into and out of a processor, when threads are competing for processor time.  A good threshold for context switches is around 5000 context switches/sec per logical processor.

Important stuff point #4: the amount of RAM in each server.  Without knowing this you can pull a Perfmon counter “% Available Memory” but you still don’t get a good feel for how much memory you have left to use without the “Available Memory /MB” comparing both to the amount of RAM in the system, or just one of them if you prefer.  Without this you won't be able to tell if the application and web servers have memory pressure or not.  The database server is a different story and there is more information that can be found on that in the “Not so important stuff point #3” below.

Not so important stuff point #1, the number of users for each test.  The reason this is not important is because this number is going to vary application to application.  If you do not have intimate knowledge of how the application works and the functions of the application (which I didn’t) you won’t be able to tie the users to the functions performed by each and you won’t be able to tie that to individual measurements of each function.

Not so important stuff point #2, the speed of the processors doesn’t matter one bit when measuring performance unless you have an application doing a lot of calculations and compilations and the CPU’s are pegged (higher than 85% over an extended period of time).  If that is the case, at the very least, faster processors will help the situation, even better, more processors, better than that, multiple 64-bit processors (generally anything 8 CPU’s or more is best, I’ll blog about this some other time)   

Not so important stuff point #3, the amount of RAM in the database server.  Now, I don't mean to say that memory is not important to SQL Server, because by God it is, and I don't mean to say that the amount of RAM that you dedicate to SQL Server is not important because we all know it is.  All I am trying to say is that it will not help you gauge whether or not SQL Server has memory pressure and that in SQL Server memory pressure is measured differently than for other systems.  SQL Server has Perfmon counters to help determine memory pressure on the server.  For example, Page Life Expectancy, or PLE as some call it, is a measurement of how many seconds a data page will stay in the buffer cache if not accessed then Lazy writer flushes the page so that another data page can take it's place... blah, blah, blah (don't want to get too long winded if it's not too late already).  The generally accepted threshold for this counter is 300 seconds or 5 minutes.  You really should give it attention if the value is <= 300 because SQL Server could be forced to go to disk (causing more I/O's) to retrieve data pages more than it really should be.  The need for this value to be greater really depends on the applications running against that database server and can be quite complex to determine accurately what you might need it to be.  As you evaluate this value on a regular basis on various machines you can get a better “gut feel” for the value.  This brings us to the next set of memory counters in SQL Server that I would look at which is Target and Total Server Memory (Kb).  These counters should always be equal.  Target server memory, tells you how much memory SQL Server would like to have.  Total server memory, tells you how much memory SQL Server is actually using.   If the Target server memory is higher than the Total Server memory then you know that SQL Server would like to have more memory than it can physically acquire.  Thus, memory pressure.  There are other counters as well around the mem-to-leave area and also the caches that are also important.  I can get into these next time.  You've suffered enough I think, well... for now anyway. 

Maybe I should just blog about counters some time and get away from this stuff for now.  Let me know if you would be interested in that and I’ll put some time into it. 

Anyway it’s about 2 AM and time to hit the sack.  Check back later for part 2 (or more) of this project.

Keep on truck’in,

Zach