News & Announcements
Filesystem problems on NIH Biowulf (Biowulf)Date: 09 June 2011 09:06:25
From: steven fellini (sfellini@NIH.GOV)
Biowulf users have probably noticed pauses while trying to access their data directories over the past week. In some cases, jobs have failed with "Stale NFS file handle" errors. We apologize for this disruption to your work. During the downtime on April 26, a new version of IBM's GPFS file system was installed on one of the Biowulf storage systems, with the intent of improving reliability and availability. Unfortunately, beginning on May 29 something in our job mix has uncovered a bug in the code which is causing kernel panics in the fileservers. Most pauses encountered by users occur during fileserver failovers. The Biowulf staff is actively pursuing a resolution to the problem with the storage vendor and with IBM. We will follow up this email with another once the problem has been resolved.