In Exchange 2010 Microsoft introduced the Database Availability Group to implement redundancy on mailbox server level and mailbox database level. If a mailbox database (or a server) fails, another one can take over. This concept is carried forward into Exchange 2013 and Exchange 2016 and has improved ever since.
There are still customers that do not use a Database Availability Group and rely on a single server and a solid backup software solution. A backup of the mailbox database is created every night and this continue to run for years. You hope. Until disaster strikes…..
I got a call earlier today from a customer. He has been patching his Hyper-V host, and after a reboot his Exchange 2013 server didn’t come up properly. Well, after questioning it turned out that the Exchange server booted correctly, but that only one of three Mailbox databases mounted properly. So, two Mailbox databases (approx. 250 GB in size) seem to be corrupt and this is where the pain begins.
To ‘resolve’ the issue the customer tried to reboot the box again, tried to restore the databases from backup, tried a ‘soft repair’ and tried a ‘hard repair’. No idea what the latter are by the way, but that was according to the customer. But if you know anything about Mailbox databases in Exchange, then you also know that most destruction happens in the first 15 minutes!
If you rely on a single server and a backup solution for restoring services or a disaster recovery scenario you have to know the basics of Exchange database technology. Know what a mailbox database is (except for a large .edb file), know what transactional logging is and how the mailbox database, the transaction log file and the checkpoint file relate to each other. And related to this, it is of utmost importance that you know how to replay transaction log files into a Mailbox database.
Although old, these are good starting points:
Furthermore, you have to know how your backup solution works, and how to restore mailbox database into a production environment. There are streaming backups, but these are rare these days and VSS snapshot backups. You can find more detailed information in the following articles:
Besides the technical knowledge about the Mailbox database technology you have to perform regular ‘fire drills’. Restore a Mailbox database into production, restore using a recovery database, perform replaying of transaction log files, get your hands on ESEUTIL and see what the /G, /K, /P and /R are doing. It will save you a considerable amount of time when you know the technology and the tools, it will reduce risk of data loss and you are able to give a proper planning to your users/manager/customer when the mailboxes are available again.
If you don’t know this you’re playing with fire, and it will backfire to you, believe me!