The Realm of the Verbal Processor

Jarvis's Ramblings

SCCM SQL Cluster Problem

Earlier this week I had an issue with backing up SCCM that was because Kerberos was not enabled on the cluster. Got that fixed, but I was noticing other things on my SCCM server that just didn’t seem right. (Instructions for how to enable Kerberos are in the link above.) In particular I noticed that my Site System Status was red. In looking into this I saw where SCCM was referencing the SQL cluster nodes directly…not the SQL cluster. That’s not good. So I took a look at the Site Systems (under Site Settings), and here is what I saw:

Bad SQL 

What you see here is that the SQL cluster does NOT hold the site database role. That role is held directly by the SQL nodes. What happened was that although Kerberos must be enabled on the cluster for normal SCCM operation, the pre-req checker apparently does not check for this. As a result it allowed the install to go through and ended up installing directly to the nodes instead of to the SQL cluster…because it could not see the cluster since Kerberos was not enabled on it. Anyway…all of that said…it’s a major problem. Site Status is red. Who knows what would happen in the event of a SQL node failover.

So I got to thinking. I’m pretty sure the problem is a result of the database being created on the SQL server before Kerberos was enabled. In theory, I should be able to move the DB elsewhere, then move it back (now that Kerberos is enabled) and everything would be lovely again. Nice theory. But will it work? Enough thinking…let’s find out.

I moved the DB to a SQL named instance on the same server using the instructions found here. [Note…at the time of this writing there is a mistake in the instructions. Between steps 2 and 3 should be a step about actually going into SQL server and detaching and attaching the site DB. I reported it, and Microsoft acknowledged that it is missing and it will be fixed in the next update of the documentation.] After bringing up SCCM on the named instance, I shut it back down and used the same process to move it back to the default instance. Here is what it looks like now:

Good SQL 

Note that the site database role is on the SQL cluster now. The two nodes are still in the list, but they have no roles associated with them. Right clicking them does not give an option to delete. According to Wally Mead and Stan White, those two should age out of the system after 30 days. The very nice thing is that my Site Status is now a lovely shade of green.

I reported this issue as a bug in SCCM. Got a great response from Wally Mead. He assigned it to the SCCM SP1 team for possible inclusion in SP1. Very cool!

February 1, 2008 Posted by | ConfigMgr | , , , | 6 Comments

SCCM Backup Issues

For the last week I have been attempting to back up my SCCM server before it goes into production. The backup has been failing, so I have been in major “trouble shoot” mode. Basic scenario is this… SCCM is installed on a VMWare virtual machine. The SQL database is offloaded to a clustered SQL server. When the backup ran, it would fail after about five seconds and leave the following four lines in the smsbkup.log.

>>>>>>>>>>>>
Info: Sending message to start the SQL Backup…
Couldn’t connect to \\SQLcluster registry
STATMSG: ID=5049 SEV=E LEV=M SOURCE=”SMS Server” COMP=”SMS_SITE_BACKUP” SYS=SCCMserver SITE=LHT PID=3400 TID=924 GMTDATE=Wed Jan 23 19:21:16.539 2008 ISTR0=”” ISTR1=”” ISTR2=”” ISTR3=”” ISTR4=”” ISTR5=”” ISTR6=”” ISTR7=”” ISTR8=”” ISTR9=”” NUMATTRS=0
Error: Failed to send start message to the SqlBackup.
>>>>>>>>>>>>

I re-confirmed that the SCCM server’s machine account was in the admin group on the SQL server. I also knew that I had already taken care of the SPN registration issue, so I posted on the Technet SCCM forum. In hindsight, Stan White (a moderator on the forum) nailed the answer on his first reply…I just misunderstood what he was saying. After much other troubleshooting, I realized that if I started a cmd prompt as local system, I was able to map a drive to the administrative shares on the SQL server nodes as local system, but I was NOT able to map a drive to the cluster. (i.e. SQLcluster is made up of SQLserver1 and SQLserver2. I was able to map to \\SQLserver1\c$, but was not able to map to \\SQLcluster\c$.) This led me to search Google and found this thread (and Ragnar’s post in particular) which put me in the right direction…the direction that Stan specifically pointed to.

The root problem is that Kerberos authentication was not enabled on the cluster. When Kerberos is enabled on the cluster, it publishes the cluster name to Active Directory. Until that is done, the server name “SQLcluster” does not exist in AD…so it can’t be communicated with via Kerberos. I found a few articles that talk in more detail about how to enable Kerberos on the cluster here, here, and here.

After our DBA enabled Kerberos on the cluster last night, I was able to get a successful backup. Now I can move on to other things.

I’d like to acknowledge that my friend Tim is the one who asked a couple of key questions about authentication that caused me to find Ragnar’s post above.

January 30, 2008 Posted by | ConfigMgr | , , , , , | 2 Comments

   

%d bloggers like this: