Monday, June 23, 2008

kill -3 does not produce a javacore

This is a sporadic problem I run into when kill -3 does not generate a javacore and the process has to be terminated. The first thing to check is the service release of the JVM and ensuring the JVM is at the correct level with the corresponding level of WebSphere Application Server. In some cases I have installed later fixes than those that have been tested with WebSphere Application Server.

Another suggestion a colleague of mine had was to generate an AIX core of the process (make sure the ulimits for file and core are properly set to unlimited but you already knew that because you followed the installation instructions for WebSphere Application Server). I don't remember which kill signal needs to be sent but I'm sure a Goggle search will reveal that answer.

Tuesday, June 17, 2008

Get thread dumps during supplemental load tests

I recently found a bug in an application that the developers were not aware of. The code had a synchronized block they thought would be low cost. Low and behold in our load testing we found that after a certain number of users were active their response times started to go up exponentially. I took some thread dumps and found the synchronized block of code.

For anyone interested in performance:

1. Load test
2. Get thread dumps during bad response times.

If neither is done there will be problems in production. If you do not know how to analyze javacores open a PMR with IBM and IBM Support can help identify the problem.

Wednesday, June 11, 2008

WebSphere Process Server - database configuration

It is crucial that a WPS gold topology have the databases properly configured. If they are not (i.e. all pointing to the same database instance) there will be contention issues that will not resolve themselves. One of my esteemed colleagues wrote this great article.

Building clustered topologies in WebSphere Process Server V6.1
This leads you to the database settings screen, arguably the most complex of all the steps.

Wednesday, June 4, 2008

Why cross cell data centers are not a best practice for disaster recovery

This topic has been coming up time and time again. It is time that people read about the trade offs when trying to conduct DR with a single cell across multiple data centers. Yes, this might work. But more often than not the various interconnects between the two data centers and the interaction can lead to very negative consequences that disables the intended DR effort.

Do the right thing. Isolate the two data centers with separate cells. You'll find this works not only much better but has a very high success rate if done correctly (i.e. you use scripting to build your environments therefore having repeatable processes across DCs).

Comment lines: Tom Alcott: Everything you always wanted to know about WebSphere Application Server but were afraid to ask -- Part 3
While the notion of a single cell across data centers is bad from a risk aversion perspective, running a cluster across two data centers not only requires you to forget about minimizing risk, as noted above (since a cluster cannot span cells), but further increases risk along multiple dimensions.