Effective Remote Management

Overview

Have you ever had a remote server hang, or crash, and it’s 6:00PM on a Friday and your Saturday morning is now blown, because you have to drive out and push the power button?

This guide may help save you when Something Bad happens to one of your remote servers.

If your server has “Lights-out Management,” also called IPMI (Intelligent Platform Management Interface), you can exploit this clever technology to shut down, reboot, and power on a server remotely, boot from alternate devices, and lots of other fun stuff.

Note that this guide covers generic IPMI, not proprietary solutions like iLO. There is a nod to Remote Insight cards at the bottom, more for my own notes than anything else.

A Note on Link Aggregation (LAG)

Unfortunately, the IPMI spec does not work with 802.3ad link aggregation, as of the current IPMI version (1.1). Some vendors, such as Dell, claim if the aggregated link fails over to the primary NIC only, IPMI will work. Theoretically you could disable one of the ports on your managed switch to force this condition. I tried it with an Intel Xserve and a ProCurve 4108gl; no go. I would take vendor claims with a grain of salt.

As a result of this issue, you have to decide between redundant + bonded NICs, or the ability to manage your server at the hardware level remotely. The way I approach it is as follows:

  • I can’t remember the last time I’ve seen a server NIC, or (properly installed) cable, actually fail.
  • If I’m concerned about link failure, I can always cable both interfaces, and give the second one a “spare” IP, for manual failover. (Log into the spare IP, disable the primary link, and change the spare’s IP to the primary.) So failover goes from an automatic to a manual process, though it can be done remotely, which is important.
  • If I’m okay with manual failover, then I must think about bandwidth. Will I be using a full 2Gb/sec? If so, can I get by with 1Gb/sec?
  • If I really need that bonded link, at this point I may consider installing an additional NIC in my server, so that I can use Ethernet 1 (onboard) for IPMI, and Ethernet 2 (onboard) and Ethernet 3 (add-on NIC) for LAG.
  • I typically sum it up like this: when I first roll out a server, instability is more likely to occur within the first month or so, while I’m tweaking things. And unless I’m really on the ball, I won’t know my bandwidth utilization. If it turns out my enterprise email server is only doing 300Mb/sec, then I’ll just leave the one link.
Tools

You’ll need:

  • A server with IPMI.
  • A workstation with ipmitool (if *nix or OS X) or ipmish (if Windows).
  • Ideally, a “lifeboat” computer on the local subnet with ipmitool or ipmish. Doesn’t need to be on the broadcast domain. You may not be able to ping the IPMI interface remotely from within a VPN, that’s why it’s useful. Any machine with ipmitool / ipmish will work.
Using ipmitool

Syntax

ipmitool -U (username) -H (host) (command)

Basic Examples

Note: Each manufacturer has its own IPMI management syntax. Note that these are for Apple’s Xserve. Dell PowerEdge units don’t seem to respond to these.

Power cycle
This is actually five commands. I prefer to verify the system is on (and hanging or what have you), turn it off, check that it’s off, turn it back on, and verify it’s on.
ipmitool -U adurkee -H 1.2.3.4 chassis power status
ipmitool -U adurkee -H 1.2.3.4 chassis power off
ipmitool -U adurkee -H 1.2.3.4 chassis power status
ipmitool -U adurkee -H 1.2.3.4 chassis power on
ipmitool -U adurkee -H 1.2.3.4 chassis power status

Hard reset

ipmitool -U adurkee -H 1.2.3.4 chassis power reset

Check just the power status (am I on or off?)
ipmitool -U adurkee -H 1.2.3.4 chassis power status

Obtain full chassis information (is it locked, what’s the power status, etc.)
ipmitool -U adurkee -H 1.2.3.4 chassis status

Obtain full sensor information (thermal, fan, power)
ipmitool -U adurkee -H 1.2.3.4 chassis sensor

Serial Over LAN (SOL)

This mode provides a virtual serial connection to the OS on the server. Useful when the server’s OS doesn’t have its network address configured correctly and you can’t SSH in. Useful when the GUI’s crashed, too. Heck, useful for all sorts of things.

Open a SOL connection
ipmitool -U adurkee -H 1.2.3.4 sol activate
Special escape sequences are provided to control the SOL session:

~. Terminate connection
~^Z Suspend ipmitool
~B Send break
~~ Send the escape character by typing it twice
~? Print the supported escape sequences

Fixing a failed bond in SOL (OS X Server 10.5)
You might have done this accidentally – create a bond, destroyed it and now your IP isn’t accessible.
First, remember that you’re probably logged in as an admin. You need to do this as root. Then:
ifconfig bond0 destroy

Ping the box from another machine to verify all’s OK.

Using ipmish (IPMIshell) – Windows equivalent to ipmitool

Syntax
ipmish -ip 10.9.8.7 -u adurkee -p mypassword subcommand

Basic Examples
Power cycle the server
ipmish -ip 10.9.8.7 -u adurkee -p mypassword power cycle

Display general system info
ipmish -ip 10.9.8.7 -u adurkee -p mypassword sysinfo

Advanced ipmitool usage

The following techniques are useful in larger environments when you need finer control.

Using keys instead of passwords

Use the -f flag to specify a password file. This is useful for when you want to use a key.

Query the IPMI system to see what auth types are supported
ipmitool -U adurkee -H 10.9.8.7 lan print 1 auth

Compaq/HP “Remote Insight” cards

These are “PCI video cards with Ethernet sockets on the back.” They convert video from the PC- even POST/BIOS screens – to a web interface that you can access over the network. Originally sold for Compaq servers, they can be used with any PC. They can flip the power switch by means of a pass-through cable. They have external power supplies so that the card remains running even while the PC is off, and they save their configs to non-volatile flash. They have PS/2 passthrough cables so that you can get into the BIOS, etc. at a very low level. If purchasing one make sure it includes the external power adapter, the PS/2 passthrough cable (it’s a single cable for the keyboard and mouse), and the power switch passthrough cable. There’s also an internal ribbon cable, specific to Compaq servers, that is not used when using it with standard PCs.

Resetting to defaults – headless (no monitor)
This is how you reset a card if you don’t even have a VGA monitor available.

  • Turn on PC
  • Wait about 5 sec and hit F8 to enter Remote Insight setup screen. Card comes up after POST is done, but before boot devices start, you have about 3 sec to hit F8.
  • By default the setup goes to the Set Defaults menu item. Hit Enter 2x.
  • Wait about 5 sec for card restart command to kick in. Then diagnostic LEDs on card will blink from 12 to 15 seconds. Then all lights will flash once, turn off, then the Network light (second from back) will turn on and resume its normal blinking.
  • Now we change the default password. Hit Right arrow 3x, down 2x, then hit Enter 1x.
  • Wait 10 sec. for the User dialog to come up, it’s usually set to “Administrator.” Hit Enter 1x.
  • Hit Down 2x.
  • Type “administrator” (or other password), then hit Down 1x when done. (Not Enter!)
  • Type it again, then hit Enter 1x.
  • Wait 10 sec for card to write change to flash.
  • Now we exit. Hit Right 2x, then Down 1x (selects Exit), then hit Enter 3x. (To get out of the interface you always need to hit Enter 3x.)
  • As a side note, after you exit the card interface, it resumes boot – doesn’t restart the PC – so the PC will then attempt to boot from its HD/CD/whatever. You may want to Ctrl-Alt-Del.
  • Card takes 10-15 sec to get DHCP address, maybe a little longer than a normal PC. Depends on environment.
  • Find what IP the card has by comparing MAC with DHCP lease, and login to web interface at that IP. (Regular HTTP.) Accept self signed cert. Default username is “Administrator” (it is case sensitive) and the pass as we mentioned was set to “administrator” if you did it correctly.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>