Restore 500GB SQL2000 database from tape using NetBackup

The other day I was restoring a 500GB SQL Server 2000 database from tape using NetBackup and it didn’t work well. Well, it didn’t work at all. No matter how many times I tried, I kept getting this error:

Status Code 41: network connection timed out

This was not the first database I’d tried and I was able to restore other databases successfully, so I knew there was nothing wrong with the network. One thing I could think of was the relatively large database size.

At that moment, I was 50% sure that the error was caused by a timeout setting somewhere in NetBackup. Since I had limited experiences working with NetBackup, I turned to Google and found the following article:

DOCUMENTATION: Restores of large Microsoft SQL server databases using the NetBackup for Microsoft SQL Server database extension fail before jobs start reading data from tape.

With Microsoft SQL server, the restore process starts by having SQL allocate all of the data files that will be used for the SQL Database. SQL then writes zeroes to all of these files. If this is a very large SQL database, this process can take a significant amount of time. Only after Microsoft SQL finishes writing zeros will it start requesting data from the NetBackup agent. Generally there is no appearance of activity until the Microsoft SQL server is ready to start requesting data to recover the SQL database.

If the NetBackup “Client Read Timeout” for that client is not large enough, the restore processes will have already timed out before the SQL Server requests the first byte of data.

It all made perfect sense to me and luckily I had read about instant file initialization before so I knew it was exactly it. I’d managed to connect all the dots and came to the understanding that:

if file initialization time > client read timeout
    status code 41

Since I had no control over the database size and had to restore to a SQL Server 2000 box, I definitely needed to modify “Client Read Timeout”, but to what?

A quick search on the net returned overwhelming amount of blog posts including Symantec documentation advising people to increase that number.

That just didn’t seem right to me. My experience tells me when you code something like a timeout, you always leave behind a way to disable it and the convention is: zero = disabled.

In the end, I set the timeout to zero and guess what? It worked. I was happily restoring the 500GB database and looking forward to restoring the 2TB data warehouse next.


For more info on instant file initialization, please refer to the following blog posts:
Instant Initialization – What, Why and How?
Misconceptions around instant file initialization
Search Engine Q&A #24: Why can’t the transaction log use instant initialization?