Log in

No account? Create an account

Previous Entry | Next Entry

Well, that sucks.

Polis is down for a somewhat extended count, as it's doing an intensive check on a huge disk.

It's going to take about ten hours.



( 4 comments — Leave a comment )
Nov. 5th, 2007 02:41 pm (UTC)
Thanks for letting us know.
Nov. 5th, 2007 06:54 pm (UTC)
For a long time, we had a primary/secondary IMAP server failover model. One fateful day, power and network issues conspired to cause a split-brain problem where both nodes had the shared mail spool SAN volume mounted at the same time.

Hello, 18-hour fsck on a single 400GB Reiser3 logical volume. That was a fun day. (Did I mention that it was my second or third week on the job, too?)

Amazingly, it pretty much worked, modulo a few hundred "orphan" files which couldn't be automatically re-associated with a parent directory inode. Thankfully, they were all basically RFC822 text, so we were able to grep through the recovery directory for any user that noticed something amiss in their inbox.

Now, we keep filesystems in the <250GB range, and just spread mail out onto a large number of logical volumes. Also, each mail server has its own local RAID, and boxes are divided between them and copied as a read-only replica to the other node.

Regardless, good luck. Hope you have a good version of tetris to keep yourself busy watching those inodes churn.
Nov. 6th, 2007 04:16 am (UTC)
Nov. 6th, 2007 04:16 am (UTC)
Yes, but it's better!
( 4 comments — Leave a comment )