Home / Computers And Technology / Troubles Clustering Windows 2003
Hello Guest! login | Register

Troubles Clustering Windows 2003 , Computers And Technology

Resource for Troubles Clustering Windows 2003 , Computers And Technology with Articles arranged by categories . Continue for our current list of the Troubles Clustering Windows 2003 , Computers And Technology


Troubles Clustering Windows 2003

This article isn't from Microsoft, it's from in-the-field technicians that work with clusters in real world situations. At Phoenix Synergy we are contacted regularly to help local businesses (small and large) with any clustering help they may need. In this case the customer had 7 Windows 2003 Dell servers, each with the standard Dual Broadcom NIC that comes with 1-U servers these days. As you know those network interfaces can be "Teamed" to form a single interface. The client wants to make their existing production environment as fault tolerant as possible. These 7 new servers are new and are setup in a lab environment. Each running Windows 2003 standard.

What do we have to work with:

They have two dedicated Domain Controllers for Active Directory, they are running the DNS for these for both internal and external name resolution. Their domains zone records will be hosted here, they will become ns1 and ns2.

They have web two servers, and three complus servers. Later they will implement their SQL Clusters, but we wont get into that here.

So far it's a straight forward configuration. They want to have the NIC's teamed, having NIC1 from each server plugged into switch-1, and NIC2 plugged into switch-2. Allowing for a switch to fail. They will have a cross-over cable between the two switches allowing either NIC to fail. Each switch will be plugged into it's own Firewall/Router, but the Gateway on each server will be set to primarily point to the FW that their switch-1 is plugged into, we add a second Gateway IP with a different metric to allow for any failure of the primary firewall. Each Firewall is plugged into a different ISP and has a different External IP configured. This allows ns2 to be an IP on ISP-2, which allows for a complete failure of the first ISP. By having all the host records on ns2 pointing to IP's from the second ISP allows for complete failure of an entire segment of their line.

That is the layout. Once we get AD setup and DNS configured, we team and setup the NIC's. Pull a few plugs to test the theory of the setup and we're confident everything is doing well. So now we have to setup and test the cluster.

The cluster:

Since we do not have a network load balancer we have to balance the load across the web servers and complus servers by way of Microsoft's Network Load Balancing. We proceed with the NTLB Management interface to cluster the web servers. Each of the two servers converge into t

he cluster without a hitch. When we try the same on the complus servers it does not go as well.

The problem:

We add complus1 to the cluster. It adds fine, of course it does it's the only member of the cluster. It says "converging" for a moment and then goes green. We attempt to add complus2 to the cluster and it says "converging" forever, it never converges. It stays in the state of "converging" for over 30 minutes, refresh after refresh, stopping and starting, pausing, trying anything. We can not get the second node to converge. We try adding complus3 and get the same result. We retrace our steps, checking DNS for internal resolution of both the servers themselves and the cluster IP's, all looks good. We attempt to ping all nodes, everyone seems to ping each other fine. IPConfig shows the Cluster IP on each of the complus servers. NTLB is bound on each "Team" interface. Searching Microsoft's support they insist there is a problem with the NIC. So we proceed to unteam and try each NIC individually. As we retrace our steps we find the same problem regardless of how which NIC's we use on any system. On a whim we uncluster each, reboot, and add complus2 first. Then we add complus3 to the cluster. And they "converge" within seconds. Trying to add complus1 fails. So we have isolated the problem to just one server.

The solution:

It turns out that NTLB was bound on complus1, it was bound to each NIC (both members of the "team"). Once we re-team the NIC's and remove NTLB from nic1 and nic2, the server converges into the cluster without a problem.

Summary:

When clustering be sure to only select the NTLB service on one of the NIC's being used as the cluster. No other NIC should have NTLB bound to it. As we continue with our proposed configuration, everything works well. All tests are successful and it looks like they will have a great fault tolerant production environment. Next is the SQL Clusters, implementing two SQL Clusters on an EMC SAN live, with no tolerance for downtime, this should be fun. Until then...


Adam Yax is CTO of http://www.phoenixsynergy.com, which provides IT services in Phoenix, AZ.


Submit YOUR Articles Here!!

If you are not sure what to do Please Contact Us
Submit max. to be added featured contributors.
To contribute to Articles4Ever.com, Please login

Not Registered yet? Click to Register it's FREE

Tell Your Friend


Search Site

 
Web Articles4Ever.com


More from Web