Friday, July 3, 2009

If this isn't an adventure, I'm not sure what is..

I named this blog "adventures in perl" because of the work that I do, being in Perl, and the fact that it's fairly zany stuff compared to my previous work experience. Here we go.

I have applications that start up in the morning, and shut down in the afternoon. They log very important data throughout the day detailing their actions. The log files are gargantuan in size (think TB), every day. We are forced to use LZMA to compress these suckers down so we're not in the business of inflating Seagate's stock price ;-) ... Anyway.. The compression starts shortly after 6pm, when a "spare" system mysteriously crashed, causing one of the pre-req ssh calls to hang. (our engineer forgot to put a timeout wrapper, or put a timeout argument on the ssh call).

The script "unstuck itself" the next morning, between me troubleshooting why the logs weren't transferred, and getting IT to fix the machine that crashed. So, the program went ahead and lzma'd all the log files for the current day, instead of yesterday.

I sat in horror as I realized that our programs were now writing to deleted files (lzma is kind enough to remove the file after it compresses it). Most of our programs shut down immediately at 4pm, so we had to act fast.

The solution, ended up being a simple perl script, that interrogated "lsof" for deleted files in our log directory, on all of our machines, and simply opened a file handle to the /proc/pid file descriptor matching the file in question. This incremented the "open" count, so even if the program that was writing to it went away, the file would not be marked for deletion.

We had a hook on our script that allowed us to signal it to copy files in proc back to the file system, after we had cleaned up the lzma files that were already there. The best part, we got to use the totally underrated "cluster ssh" to take command of all of our servers at once to fix the problem one time, for all of our machines.

Problem, solved, the program was easy to write, easy to verify, and performed perfectly on the first try. Hooray Perl!