Thursday, July 31, 2008

Messaging engine and its database

Some optimizations are required on database tables. My good friend Tom Alcott discusses the lack of necessity to optimize the messaging engine database tables and for good reason.

The WebSphere Contrarian: Are you sure you want to reorg that messaging engine database?
The standard practice for database administration is to periodically check on the database and table organization to insure optimal performance -- but do these standard practices apply to a database used for JMS persistent message storage with IBM® WebSphere® Application Server?

New book on DataPower

If you are not aware of the performance boost DataPower can provide your environment and the XML threat protections it provides you really need to get up to speed on it. Some of my colleagues have taken the time to put this book together. IBM WebSphere DataPower SOA Appliance Handbook: Bill Hines, John Rasmussen, Jaime Ryan, Simon Kapadia, Jim Brennan: Books
IBM WebSphere DataPower SOA Appliance Handbook (Hardcover)

Wednesday, July 23, 2008

"free -m" on the Linux command line

One thing you can never do is over commit available RAM in the machine. If the application server fails to start and SystemOut.log contains a message like this:

JVMJ9VM015W Initialization error for library j9gc23(2): Failed to instantiate heap. 1G requested
Could not create the Java virtual machine.

Then it is highly likely that you tried to start an app server with not enough free physical RAM. Check the "free -m" command on Linux. Otherwise refer to your OS specific manuals to see how to determine how much free RAM you actually have.

Tuesday, July 22, 2008

Reliability and Availability

Common topics in the performance space are reliability and availability. This article goes on to describe some of the problems that can occur in such an environment. This is the challenge of building reliable systems from unreliable components. From a hardware perspective this can be done if one has enough money for all the redundancy that is necessary. If one tries to do this on the cheap they will fail.

There is an interesting point in the article that Google is trying to solve this problem with software. While there are products like WebSphere XD that provide software level solutions for some problems they can't solve the problem as easily as hardware can. For example, the database is slow. Giving the database faster disk, more RAM or a RAM backed SAN and you can eliminate that problem. There is relatively little you can do from a software perspective to fix that. Another example is a server goes down. Sure, we could route data to another server using a software component but then why not just do it from the hardware level? Okay, so there are a couple of places where software is useful like maintaining J2EE affinity but the same can be done by a hardware load balancer. It just depends on where you put the smarts.

The problem with software to try and fix this is that it introduces another, more complex, layer of hardware/software where as redundant hardware makes things a lot simpler.

A few people will argue that software is cheaper. I don't agree with that argument. I think hardware is cheaper. Especially when it makes troubleshooting that much easier (and quicker) than home built software. If it takes 6-12 months to debug software in production then that is a lot of money (and bad press earned) down the drain.

Amazon S3: For now at least, sometimes you have to reboot the cloud | News - Business Tech - CNET News
Afterward, Om Malik called cloud computing frail: "The S3 outage points to a bigger (and a larger) issue: the cloud has many points of failure--routers crashing, cable getting accidentally cut, load balancers getting misconfigured, or simply bad code. And he's right, to a degree, but there are three things that shouldn't be overlooked before writing cloud computing off as a failure.

Friday, July 18, 2008

Negative testing

The following article is a good example why companies writing software need to hire subject matter experts when it comes to testing their applications. Particularly in what is commonly referred to in performance speak as "negative testing." This is where we, subject matter experts on performance testing, purposely cause a negative event to occur. For example, I routinely disable the Network Interface Card (NIC) [also known as your ethernet card] while running load/stress tests just to see how the application environment handles the event. If the application breaks then it fails the test and a defect is written up against the application and back to development it goes. It is easy enough in Unix environments to disable a NIC card but if worse comes to worse I'll pull the ethernet cable out of the jack. Crude but it works just as well.

Irish Examiner | Airport radar meltdown due to 'faulty' component
The malfunctioning network card, a component that allows computers to communicate with each other, was also blamed for previous glitches in the Dublin system.
It is unfortunate that the people that put that airport radar system didn't conduct negative testing because a problem like the one that occurred could have been completely avoided.

Likewise, while they are adding more monitoring I'm dubious that will help them. The fact that they haven't tested for negative events what other negative events they haven't thought of could occur? For example, some of the others I routinely test for are lost packets in the network, total network failure, network lag, 100%+ CPU, low memory, too many airplanes in the radar, duplicate radar images, etc, etc, etc and the list goes on and on.

All they need is for a different negative event to occur and they could (and probably will) suffer another outage. What they need to do is get a subject matter expert to teach them how to test their code.

BTW, notice the sentence about "delays were still being experienced at peak times"? Seems someone hasn't done stress testing either...

Wednesday, July 16, 2008

IBM Support Assistant

There is an update to the IBM Support Assistant. If you are running WebSphere Application Server and you do not have this tool then click on the next link and download it.

IBM Software Support - Overview
The IBM Support Assistant is a complimentary software serviceability workbench that helps you resolve questions and issues with IBM software.

Friday, July 11, 2008

createOrWaitForConnection and why we need a finally block

In previous versions of WebSphere Application Server this method actually had a name I liked better which was: createOrWaitForVictimConnection where the victim connection was one that was not closed within the same thread that opened it and eventually reaped by the app server. But either way, if you see this message timeout in your log file then that means somewhere, somehow someone has not closed a connection to a pooled resource properly. Get the following 3 words into someone's vocabulary... try, catch, finally. I can't emphasize enough how important it is to close the connection in the finally block. If you don't, then any exception that occurs can leave un-closed connections hanging around. If you are in a high volume environment you'll find this to be a serious bottleneck! Follow the following psuedo code...

Connection con;
try {
con = ds.getConnection();
// do some work
} catch (Exception e) {
//maybe log an error here if you like?
} finally {

Off topic alphaworks listing

I try not to go off topic as this is a performance blog but some folks might find the following technology preview useful.

alphaWorks Services | IBM Pass It Along | Overview
A peer-to-peer knowledge exchange network that builds communities of experts and learners around "nuggets" of knowledge.

Wednesday, July 2, 2008

Do you run WebSphere? Then you need this diagnostic tool!

IBM: IBM Support Assistant

I have used this tool (and its predecessors) so frequently I don't go anywhere without it/them. Not only does it produce handy little graphs like the one above showing Java GC but it also provides some darned good analysis on recommended changes to the JVM command line parameters (especially if you're running on WAS v6.0.x or earlier which do not run on Java 1.5) to improve your memory utilization. Now, of course, you can only get this kind of feedback from the tool if you followed one of my earlier recommendations to turn on verbose GC. You have turned on verbose GC by now, right? If you haven't then you have to go and do that right now.

So, go to the link for the IBM Support Assistant and download this tool.

ITCAM instrumenting your own method capture

One of the nice things that application monitoring tools provide is the ability to specifically measure information about your own method calls. This page describes how to do so with the IBM ITCAM tooling.

Help -
A custom request is an application class and method that you designate as an edge or nested request. When the method runs, a start and end request trace record is written to the Level 1 or Level 2 tracing.

WebSphere Process Server Performance

The first week of June I got to work in person with a colleague of mine, Richard Metzger, from the IBM labs in Böblingen on a process server engagement. Richard has started his own performance blog for process server!

WebSphere Process Server Performance
Thoughts and opinions around performance of and capacity planning for IBM WebSphere Process Server and other products (like e.g. DB2), as they are used in the context of business process management and business process automation solutions.