A tale of SQL Server installation

I had to reinstall SQL Server 2000 on a box a few weeks ago. The mission sounded simple when I first started but quickly turned into an eventful drama. Below is a recount of the events I had experienced.

One afternoon, a user reported that he could no longer run reports against a database. He also claimed to have successfully extracted data earlier in the day. The symptom didn’t sound too complicated.

After having difficulty connecting to the SQL Server from SSMS, I went onto the box and discovered that the SQL Server service was stopped.

That’s strange but still no big deal. I clicked on the green start button in the service manager, once…twice…three times and the server was still not started. I sensed something was not right.

A trip to the event viewer revealed the problem:

17052: SQL Server evaluation period has expired.

Huh? All along I didn’t realise we were running an evaluation version. Now what? We had proper licences and a reinstallation would be the logical step but due to the time constraint, I didn’t have time to mark around, it was getting late and the user needed his data urgently.

One of my team mate suggested a dirty quick trick at this point: move the system clock back one day and pretend we were still within the evaluation period. You know what it actually worked. I thought I was smart but this guy was a real “problem solver”.

The system clock trick was nice but it was just a sticky tape solution, we really needed to install a proper version.

Came next day, I naturally selected the enterprise edition (since it was to replace the evaluation enterprise edition) and happily clicked on setup.exe.

Microsoft SQL server 2000 enterprise edition server component is not supported on this operating system, Only client component will be available for installation.

Oops, I later found out that neither enterprise nor standard edition was supported on a Windows XP box, so I was left with no choice but to settle on a developer edition. Fast forward and I clicked on the setup.exe again but this time I was presented with a different message:

A previous program installation created pending file operations on the installation machine. You must restart the computer before running setup.

Fine, so I restarted the box…three times and the message was still lingering around. Between the restarts, I had uninstalled SQL Server 2000 (and inadvertently lost all the DTS packages stored in msdb, luckily I kept a pretty recent backup of msdb, phew!)

In the end, I had to delete a registry entry following instructions from the KB article: SQL Server 2000 installation fails with “…previous program installation…” error message

Thirty minutes later, SQL Server 2000 was successfully installed. OK, time to restore the msdb database to recover all the DTS packages and jobs, etc. It didn’t take me long to locate the msdb backup and type in the command to restore it…

Msg 3168, Level 16, State 1, Line 1
The backup of the system database on device C:\Backup\msdb.bak cannot be restored because it was created by a different version of the server (134219767) than this server (134217922).

Now what?! After staring at the error message for a while, I remembered that the previous SQL Server was on SP4. It got to be it.

Click…click…click, SP4 was installed without any hiccups. Attempted the msdb restore again and thank god it worked this time. I had finally restored the server to the state it was two day ago. What a “fun” experience!

I had fun deleting transaction logs today

I’ve read about people deleting their transaction log files but I have never tried it myself (for good reasons). Today, with some time on my hands I thought it would be fun to delete a transaction log and try to recover a database from that state.

I set up a test environment for this exercise:

use master
go

if @@TRANCOUNT > 0 
    rollback
go

if db_id('accident') is not null 
    drop database accident
go

create database accident 
on 
(
    name = 'accident'
    , filename = 'c:\temp\accident.mdf'
) 
log on 
(
    name = 'accident_log'
    , filename = 'c:\temp\accident_log.ldf'
)
go

use accident
go

create table critical 
(
    data varchar(10)
)
go

insert critical values ('good')
go

begin tran

insert critical values ('bad')

checkpoint -- flush dirty pages to disk

Once it was done, I opened another query window and had a look at the database options:

select 
    name
    , user_access_desc
    , state_desc
    , recovery_model_desc
    , databasepropertyex('accident', 'updateability') as 'updateability'
from sys.databases
where name = 'accident'

image

I knew if I just tried to delete the transaction log file while the database was still online, I would get a “file in use” error message from Windows.

image

In order to delete the file, I need to tell SQL Server to let go of the file, there are three ways of achieving it:

  1. Take database offline.
  2. Detach database.
  3. Shut down SQL Server.

1. Take database offline

According to Books Online, if the database is set to offline, it will shut down cleanly. I’d also found the following statement from Paul Randal’s blog post:

…there’s a feature in SQL Server 2005 that if you attach or startup a database without a transaction log file, and the database was cleanly shut down, SQL Server will create a new log file automatically.

Armed with the knowledge, I executed:

alter database accident set offline

To make it work, I had to:

  • Rollback the open transaction.
  • Change the current database context from “accident” to something else.

In other words, I had to make sure the database was not in use before issuing the command. Eventually I managed to switch off the database and deleted the transaction log file:

image

I was ready to bring the database back online again:

alter database accident set online

Unexpectedly, I received the following error messages:

image

I turned to the error log:

sp_readerrorlog

and found:

image

During upgrade? It didn’t make any sense to me. I googled and googled. I went back and read Paul Randal’s blog posts over and over again. After literally several hours of trying different combination of statements, I’d made one observation which led me to a solution. Pity that I still don’t understand why the statement failed in the first place.

The observation was that the database would be set to the simple recovery model after executing “dbcc checkdb” with the repair_allow_data_loss option. I will talk about this statement later. For now, I just needed to issue the following commands to bring the database back online:

alter database accident set recovery simple
alter database accident set online

image

Bingo! The database options were:

image

Recap:

  1. Take database offline.
  2. Delete transaction log file.
  3. Set recovery model to simple.
  4. Bring database online.

2. Detach database

I returned the database back to the initial state and confirmed the database options:

image

I tried to detach the database:

sp_detach_db 'accident'

After 20 seconds:

image

Just like before, I had to make sure the database was not in use before I could successfully detach it. Once the database was detached, I could delete the transaction log file.

After that, I then tried to attach the database with only the primary data file:

create database accident 
on 
(
    name = 'accident'
    , filename = 'c:\temp\accident.mdf'
) 
for attach

image

Because the database was cleanly shutdown, the log file was recreated. Have you noticed something? This time, I wasn’t required to set the recovery model to simple. I took a look at the database options:

image

The create database with attach statement had set the recovery model to simple itself.

Recap:

  1. Detach database.
  2. Delete transaction log file.
  3. Attach database.

3. Shut down SQL Server

I returned the database back to the initial state and confirmed the database options:

image

I was ready to shut down SQL Server:

image

Once the service was stopped successfully, I deleted the transaction log file and then started SQL Server:

image

Everything seemed to be fine, I quickly had a look at the database options:

image

SQL Server could not recover the database because the log file was missing. I decided to bring the database online and see what happens:

alter database accident set online;

image

Since the database wasn’t shutdown cleanly, neither setting it online or attaching data file would work, but I had another card up my sleeve:

dbcc checkdb('accident', repair_allow_data_loss) 
with no_infomsgs, all_errormsgs

Something wasn’t quite right:

image

No worries, Books Online to the rescue:

When a database has been set to emergency mode by using the ALTER DATABASE statement, DBCC CHECKDB can perform some special repairs on the database if the REPAIR_ALLOW_DATA_LOSS option is specified. These repairs may allow for ordinarily unrecoverable databases to be brought back online in a physically consistent state. These repairs should be used as a last resort and only when you cannot restore the database from a backup. When the database is set to emergency mode, the database is marked READ_ONLY, logging is disabled, and access is limited to members of the sysadmin fixed server role.

I couldn’t wait to try the emergency mode:

alter database accident set emergency

The database options showed:

image

So far so good, I then executed the repair command again:

dbcc checkdb('accident', repair_allow_data_loss) 
with no_infomsgs, all_errormsgs

image

Ok, no worries, I switched the database into single user mode and did it again:

alter database accident set single_user

dbcc checkdb('accident', repair_allow_data_loss) 
with no_infomsgs, all_errormsgs

image

Mmm, juicy, I like detailed error/warning messages. However, I’d made a few puzzling observations.

Observation 1

The third paragraph contradicts the second one, but that’s ok, my guess is the checkdb command encapsulates a few other statements underneath and it has no control over the output of those statements.

Observation 2

The database has been put in dbo-only mode? According to Books Online:

Transact-SQL Reference (SQL Server 2000) RESTORE

RESTRICTED_USER
Restricts access for the newly restored database to members of the db_owner, dbcreator, or sysadmin roles. In SQL Server 2000, RESTRICTED_USER replaces the DBO_ONLY option. DBO_ONLY is available only for backward compatibility.

Furthermore, since the DATABASEPROPERTY function has been superseded by DATABASEPROPERTYEX and the new function mentions RESTRICTED_USER but not DBO_ONLY, I was comfortable in assuming that :

dbo-only = restricted_user

The database options somehow disagreed:

image

The database was still in SINGLE_USER mode. I tried another command:

sp_helpdb 'accident'

image

Did you see that? UserAccess=MULTI_USER. From memory I knew the data in the status column was gathered from DATABASEPROPERTYEX, so I double checked:

select databasepropertyex('accident', 'useraccess')

image

It did return MULTI_USER. What is going on?

  • CHECKDB (or something ran by CHECKDB) said the database was in DBO-ONLY (aka RESTRICTED_USER) mode.
  • SYS.DATABASES said the database was in SINGLE_USER mode.
  • DATABASEPROPERTYEX said the database was in MULTI_USER mode.
  • The database was actually in SINGLE_USER mode.

image

Anyway, back to the topic. There was one more thing to note:

select * from accident..critical

image

Funny that I’d used REPAIR_ALLOW_DATA_LOSS, and instead of losing data, I’d got more data. Smile

Recap:

  1. Shut down SQL Server.
  2. Delete transaction log file.
  3. Start SQL Server.
  4. Set database to emergency mode.
  5. Set database to single user mode.
  6. Run dbcc checkdb with repair_allow_data_loss.

Summary

I have proven above that a database with a missing or corrupt transaction log file can be recovered. However, depending on the database state before the removal or corruption of the log file and depending on the methods used during recovery, there will be different changes made to the database after it is brought back online.

  Database shutdown cleanly Recovery model after database is back online User access after database is back online
1. Take database offline Yes Full -> Simple MULTI_USER -> MULTI_USER
2. Detach database Yes Full -> Simple MULTI_USER -> MULTI_USER
3. Shut down SQL Server No Full -> Simple MULTI_USER -> SINGLE_USER

Other side effects
When a database is detached, all its metadata is dropped. If the database was the default database of any login accounts, master becomes their default database.
Transactional consistency has been lost.

–oo00oo–

For more info on CHECKDB and EMERGENCY mode, please refer to the following blog posts:
TechEd Demo: Creating, detaching, re-attaching, and fixing a suspect database
Search Engine Q&A #4: Using EMERGENCY mode to access a RECOVERY PENDING or SUSPECT database
CHECKDB From Every Angle: EMERGENCY mode repair – the very, very last resort

Restore 500GB SQL2000 database from tape using NetBackup 6.5.3.1

The other day I was restoring a 500GB SQL Server 2000 database from tape using NetBackup 6.5.3.1 and it didn’t work well. Well, it didn’t work at all. No matter how many times I tried, I kept getting this error:

Status Code 41: network connection timed out

This was not the first database I’d tried and I was able to restore other databases successfully, so I knew there was nothing wrong with the network. One thing I could think of was the relatively large database size.

At that moment, I was 50% sure that the error was caused by a timeout setting somewhere in NetBackup. Since I had limited experiences working with NetBackup, I turned to Google and found the following article:

DOCUMENTATION: Restores of large Microsoft SQL server databases using the NetBackup for Microsoft SQL Server database extension fail before jobs start reading data from tape.

With Microsoft SQL server, the restore process starts by having SQL allocate all of the data files that will be used for the SQL Database. SQL then writes zeroes to all of these files. If this is a very large SQL database, this process can take a significant amount of time. Only after Microsoft SQL finishes writing zeros will it start requesting data from the NetBackup agent. Generally there is no appearance of activity until the Microsoft SQL server is ready to start requesting data to recover the SQL database.

If the NetBackup “Client Read Timeout” for that client is not large enough, the restore processes will have already timed out before the SQL Server requests the first byte of data.

It all made perfect sense to me and luckily I had read about instant file initialization before so I knew it was exactly it. I’d managed to connect all the dots and came to the understanding that:

if file initialization time > client read timeout
    status code 41
else
    ok

Since I had no control over the database size and had to restore to a SQL Server 2000 box, I definitely needed to modify “Client Read Timeout”, but to what?

A quick search on the net returned overwhelming amount of blog posts including Symantec documentation advising people to increase that number.

That just didn’t seem right to me. My experience tells me when you code something like a timeout, you always leave behind a way to disable it and the convention is: zero = disabled.

In the end, I set the timeout to zero and guess what? It worked. I was happily restoring the 500GB database and looking forward to restoring the 2TB data warehouse next.

–oo00oo–

For more info on instant file initialization, please refer to the following blog posts:
Instant Initialization – What, Why and How?
Misconceptions around instant file initialization
Search Engine Q&A #24: Why can’t the transaction log use instant initialization?

Rebuild Master failed with error -1:

The other day I was trying to rebuild the master database for a SQL Server 2000 box. Everything was going well until the “Configuring Server” dialog popped up.

D0B9A22DC37EAE4F_164_0

The progress bar went across about 4 times and then an error message popped up.

D0B9A22DC37EAE4F_164_1

That’s weird, I have rebuilt the master database a few times before and I have never encountered errors like this one. I looked for the error details and started with the cnfgsvr.out log file located in C:\Program Files\Microsoft SQL Server\MSSQL\Install

D0B9A22DC37EAE4F_164_2

By looking at the log file, I could confirm the error, but it didn’t help to explain as why the connections had failed. So I ran eventvwr. One of the system log error mentioned something about MSSQLSERVER service failed to start.

D0B9A22DC37EAE4F_164_3

At that point, I was still not sure what’d happened, so I then ran services.msc and tried to start the MSSQLSERVER service manually.

D0B9A22DC37EAE4F_164_4

The error message matched what I had seen in the event viewer. Path? What path? Suddenly as if stricken by lightening, I noticed “Path to executable:” had the short file names for the path. The question became obvious, what was MICROS~3?

D0B9A22DC37EAE4F_164_5

A quick look in the explorer revealed 5 long name folders starting with the word “Microsoft”.

D0B9A22DC37EAE4F_164_6

MICROS~3 happened to be “Microsoft Office” instead of “Microsoft SQL Server”. Now the cause of the issue had been identified, I needed a way to work around it. For what I was doing at the time, I could not just reinstall SQL Server, therefore I decided to move the “80” and “mssql” folders into “Microsoft Office”.

D0B9A22DC37EAE4F_164_7

What I had effectively done was making C:\PROGRA~1\MICROS~3\MSSQL\binn\sqlservr.exe a valid path.

Then I reran rebuildm.exe and as expected, everything worked.

D0B9A22DC37EAE4F_164_8

–oo00oo–

For more info on short file names, please refer to the following Microsoft Knowledge Base articles:
How Windows Generates 8.3 File Names from Long File Names
8.3 Short File Names not Retained after Tape Restore
How to Disable Automatic Short File Name Generation