Wednesday, March 30, 2011

Backup Strategies

I was having a discussion recently with one of my colleagues around server backups.  He likened it to the spare tire of the car.  Yes, you can drive around without the spare tire but does anyone want to do that for long stretches of time?  Probably not.

Servers need to be backed up.  Simply if a server completely tipped over it can be duplicated, rebuilt and put back into service.

Monday, February 7, 2011

tprof switches

I wanted to capture this before I forgot about it ... tprof switches

Tuesday, November 30, 2010

Cognos

Looking for information about Cognos?  I was.  My colleague in the UK, Richard Collins, was able to point me to some informational links about Cognos. 

For full details on this product, see its home page here: http://www-01.ibm.com/software/analytics/cognos/business-intelligence/

However, your most useful starting point is probably going to be: http://www-01.ibm.com/support/docview.wss?uid=swg27014432

...and in particular, this one: http://www-01.ibm.com/support/docview.wss?uid=swg27019126

Tuesday, November 9, 2010

global transaction sharing

In the developers resource references there is an attribute for global transaction sharing.The value should be Shareable if the application uses global transactions.  But if one is using LTC then one does not need to share the connection.  Change the res-sharing-scope to Unshareable as noted above.  This eliminates a lot of contention for connection pool threads.  For more details on LTC see the link above to understand how it works.

Friday, September 24, 2010

The importance of dynamic circuit breakers

It is rare for anyone to provide details behind the root cause of a production outage.  Facebook put out a report about an outage they had.  If you're into troubleshooting and problem determination it is an interesting read. It sounds like they could turn off the particular function but had to completely restart the environment to do it.  This is why it is important to have circuit breakers that can be activated dynamically. 

One also wonders what infrastructural changes could be made in the environment to help? It sounds like the application logic continued to retry requests.  This is why I'm not a fan of applications automatically retrying requests because when failure occurs the retries can quickly overwhelm the back-ends.  A firewall could have at least help shut off the pipe to the database.  Though the consequences to the application would have been no different and would still have required a restart since there seemed to be no way to dynamically shut off that particular function. 

Certainly the error logic sounded confusing at best.  And error paths through code are the ones least frequently tested so they tend to fail magnificently in production

Tuesday, September 14, 2010

grep Exception SystemOut.log | wc -l

I'm reminded today that after performance tests a simple check exists especially when adding more JVMs to a cluster.  Count the number of exceptions in the log files.  Of course, clear the logs before running the test.  If the counts are not all roughly the same (or significantly skewed from the other app servers) then it is clear there are issues with that JVM that need to be checked.  Sometimes it is configuration or a misplaced JAR file.

Tuesday, June 8, 2010

Been a busy 2010

I know I haven't kept up with the blog this year.  While the blog post I'm linking to today makes no mention of performance it has everything to do with performance.  Maybe one day I'll get a chance to sit down and explain my thinking.