Of course, I have somewhere between a few and several bugs in my monitoring process. So, from time to time, I need to restart the process. I just invited my first alpha user to test out the site, so while my audience is minuscule, I'm still very worried about the site being dysfunctional.
Enter RightScale. Today, in about 4 hours, I learned all about monitoring at rightscale (they use collectd) and I enabled it for my job monitoring servers. It was easy to add a plugin to monitor my custom application -- I just configured the standard processes plugin to track my daemon. Immediately, I was able to see count, cpu usage, mem usage, and disk io for my process. Very useful. I added an escalation to email me when the process crashed*. That was neat... but then I had this vision of myself fishing with my son, getting an urgent email, making him quit fishing early (tears), and then speeding home, all just to type "kill -9
Another nifty trick - when I invited my testers to the site, I wanted to have separate staging and production environments. So, I clicked the "clone" button, and presto, my whole environment was replicated. heroku_san made it even easier for the web application.
Anyway, wish me luck as the first user tries out my new project!
* Yeah, some of my bugs are still crashing bugs. Sorry Joel Spolsky, I don't have a QA team for this either. I do have 200+ unit tests though!