Tuesday, May 27, 2008

Selling Application Monitoring

Take the word "security" and replace it with "application monitoring" in this article and you have my problem. Application Monitoring is really important.

Schneier on Security: How to Sell Security
How to Sell Security

It's a truism in sales that it's easier to sell someone something he wants than something he wants to avoid. People are reluctant to buy insurance, or home security devices, or computer security anything. It's not they don't ever buy these things, but it's an uphill struggle.

Thursday, May 22, 2008

Recent performance related articles

Every once in a while I take pen to paper (well, more correctly fingers to keyboard) and key out another article. Here are a couple I've written this year.

Comment lines: Alexandre Polozoff: Cultivating a performance specialist
Comment lines: Alexandre Polozoff: How well does traditional performance testing apply to SOA solutions?

Disaster Recovery & AIX GLVM

IBM eServer - Using the Geographic LVM in AIX 5L
"GLVM can help protect your business from a disaster by mirroring your mission-critical data to a remote disaster recovery site. If a disaster, such as a fire or flood, were to destroy the data at your production site, you would already have an up-to-date copy of the data at your disaster recovery site."

Absolutely critical technology if you are at all interested in DR (Disaster Recovery). I know at least two customers who have used this with synchronous updates (i.e. the local disk update is synchronized with the remote disk update) and seeing little overhead even over a distance of about 150 miles between data centers. This is an ideal technology for people looking to setup DR for their WebSphere Application Server, Portal, Process Server, MQ, DB2 and the list goes on and on. I'm absolutely excited about this technology and the potential impact this can have for our high availability site customers.

If you haven't read up on GLVM I highly recommend taking the time.

JVM memory and high CPU

In the "Butterfly Effect" a butterfly flapping it's wings on one side of the world can create a typhoon on the other.

In the world of Java: memory usage can cause high CPU. Summer of 2007 I spent 3 weeks working with a customer in the UK and we spent a few days measuring the memory usage of the application. We worked with the developers at reducing the memory footprint. In many cases these were simple code changes not requiring architectural or design changes to the app.

Reducing an application's memory footprint reduces the amount of garbage collection the JVM needs to execute. GC will use up CPU so obviously executing fewer GC cycles reduces the CPU load.

The UK application went from about 80+% CPU down to 25-30% for the exact same load test and better response times.

Verbose GC is your friend here too. Another application I'm looking at right now is suffering from high CPU and we can see in verbose GC that during these high CPU events the JVM is actively GCing because it is running low on memory. Obviously the memory settings need to be changed here but I wonder if we spent some time profiling the app and reducing its footprint if we wouldn't have to? I guess it depends if they will take the time to work on this effort.

Wednesday, May 21, 2008

JDBC driver versions

In helping one of my colleagues this week I've come across another common problem that should be audited by everyone running a J2EE app server; JDBC driver version.

Typical scenarios:
1. If you're application has been working and all of the sudden starts to have strange SQL errors it never had before
2. Your application works with intermittent SQL errors (unrelated to SQL statement bugs).
3. See a lot of StaleConnectionExceptions

Check the version of the database server and then the version of the JDBC driver (in WebSphere Application Server the JDBC driver version is printed in SystemOut.log during the app server startup sequence). Typically the DBA just updates the server without telling anyone. This means that a bunch of clients are backlevel on the JDBC driver. I don't know what it is that some database vendors do in their fixpacks but they often seem to break the protocol used by the previous client drivers. So... audit your JDBC drivers periodically and if you can get your DBA to notify you of when updates are going on to the database servers you could even save yourself some grief in production.

Wednesday, November 21, 2007

Some common mistakes

Sending the right files
I spent a frustrating day yesterday reading log files from a server that I was told had stopped running (i.e. the process was gone). I spent a long time reading the logs looking for an indication of a problem and couldn't find one. On examining the IP addresses in the logs it was clear they had sent me the logs from the wrong server. Sysadmins, make sure you send the right log files. Once I had the right log files I had the answer within the hour and a fix to go forward with.

Verbose GC
Too many people run with verbose GC turned off. Unfortunately the word verbose is over loaded and in this case verbose does not mean verbose in the Unix sense. Verbose GC is actually not that verbose and has a heck of a lot of great information in there. Run with the ST_VERIFY option if you are on pre-Java5 JVMs (i.e. earlier than WAS v6.1)

Stay current with maintenance
Time and time again I go in to debug a problem and what do I find? The problem is "known" and a fix has been available for some time. Just because things are working today doesn't mean that they won't break in the future. Yeah, I also understand the "if it ain't broke don't fix it" mentality and application servers are one place I don't like to play that card. Run the maintenance through all your test environments (including your performance test environment... you do have one, right?) and things should be okay in production. But don't expect to run on a 5 year old JVM and not run into problems one day...

I'm sure I'll think of more of my pet peeves so I'll add them later. In the meantime, if you're bored and need something to read see my paper on large topologies.

Monday, November 19, 2007

HTTP tuning

I often have to modify the performance of the HTTP side of the conversation and here is a link to a decent basics:

Best Practices for Speeding Up Your Web Site

I also like the fact they used IBM's Page Detailer http://alphaworks.ibm.com/tech/pagedetailer which is an awesome application my colleague performance whiz Phil Theiller showed me. It is really awesome for getting a good visual feel for the HTTP side of things.