One thing I can not stress thoroughly enough is the necessity to conduct performance testing. I've already written an article on this subject (click here) and there is no reason not to do it. In fact, the #1 reason most critsits happen is either the complete lack of performance testing or very poorly executed testing. This is what I refer to as self-inflicted pain because, from my point of view, if you make the decision not to thoroughly test then you have accepted the reality that you will have a production outage and it will be difficult to resolve.
In addition, you should be monitoring your application using an appropriate tool like IBM Tivoli's ITCAMfWAS (IT Composite Application Manager for WebSphere Application Server). v6.1 was just released and I got a chance to play with it at a critsit over the past couple of weeks and it is a good improvement over v6.0 addressing some of the usability issues I had. It is tools like this that help me resolve a critsit in a matter of hours instead of days or weeks.