Resolving a GlusterFS Server Failure

2015-09-30T19:30:00Z

I'm responsible for maintaining four servers that have been built for running automated browser tests using Selenium. Each server runs a selenium hub and multiple XVNC desktops / selenium nodes. The tests are run in Firefox and the Firefox profile utilised by Selenium imports the Firebug Firefox extension. The profile also enables HTTP Archive logging. As there are four different servers in this Selenium cluster, in the event of a Selenium test failing, checking four different SMB shares for the correct HTTP Archive (HAR) file could be tedious. Thus, each server uses a distributed file system (GlusterFS) for storing the HAR files. Each server is configured as a 'brick' in a replicated GlusterFS volume. In fact there are two volumes. The second volume is used for storing files that some browser tests can upload if needed.

One day recently I arrived at work to find one of the nodes in the Selenium/GlusterFS cluster had suffered a hard disk failure. Using four servers in the cluster gives plenty of redundancy, particularly given that the available bandwidth from the office where these servers are located tests severely limits the number of tests that can run in parallel. (I'm planning to spike VPS based Selenium Grids at some point in the future in order to mitigate the bandwidth limitation.)

I replaced the disk in the failed server and then reprovisioned it by temporarily withdrawing one of the other good servers from the cluster and cloning the disk. In a production an environment, of course I'd try to avoid doing this and instead use an automated system configuration management solution such a Puppet or Chef.

Before the clone server can be added to the node, I needed to remove it's existing GlusterFS configuration and then re-add to the two GlusterFS volumes as a pair of new bricks. I couldn't just change the hostname because GlusterFS uses UUIDs to identify each brick. The clone server would have had the same UUIDs for it's bricks as the server it was cloned from, and thus any attempts to re-add it's bricks to our two replicated GlusterFS volumes would fail.

This article describes the process I used in re-adding the clone server to the distributed GlusterFS cluster.

Removing the failed node and it's bricks from the replicated GlusterFS volumes

Cloning the source node and restoring it to the repaired failed node will take some time. Whilst those operations are running, you can remove the dead GlusterFS bricks from the two GlusterFS volumes that we use.

Note our first peer, 192.168.5.51 (glusnode1) is disconnected. Of course, the node we are using to perform these checks does not itself show in the list of peers, hence there are just three peers returned from the gluster peer command.

You can now close this SSH session as we turn our attention to the rebuilt failed node.

Change the hostname and update the hosts file on our clone server

Assuming that you have now successfully repaired/cloned/rebuilt the failing node, we need to make some configuration changes before connecting it to the network. Boot it up and using the console, execute the following:

Assuming we've cloned from glusnode2, change

glusnode2

to

glusnode1

Change

127.0.1.1   glusnode2

to

127.0.1.1   glusnode1

Removing the existing GlusterFS bricks from our clone server

Before we remove the bricks from the clone server, we need to stop them from being automatically mounted. You will see that our GlusterFS volumes are stored within the root partition

Change

localhost:/glusvol1    /var/vol1mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol2    /var/log/vol2mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2

To

localhost:/glusvol1    /var/vol1mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol2    /var/log/vol2mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2      

It isn't absolutely necessary to delete and recreate these folders. The alternative is removing the file attributes glusterfs cares about. It's arguably slightly more typing. If you want to retain the folders rather than deleting them and recreating them, you can try:

$ sudo setfattr -x trusted.glusterfs.volume-id /media/gluster-volume-1
$ sudo setfattr -x trusted.gfid /media/gluster-volume-1
$ sudo rm -rf /media/gluster-volume-1/.glusterfs

$ sudo setfattr -x trusted.glusterfs.volume-id /media/gluster-volume-2
$ sudo setfattr -x trusted.gfid /media/gluster-volume-2
$ sudo rm -rf /media/gluster-volume-1/.glusterfs

If the node relies on a static IP configuration, then you will need to update "/etc/network/interfaces" with the correct IP address, otherwise the node will call an IP address conflict with the node form which it was cloned.

If the machine fails to connect to the network, it's likely that the ethernet interface has a different logical name, for example it may now be called eth1 instead of eth0. You can get the logical name of the network adapter with:

$ lshw -class network | grep "logical name"

You can check the returned result against /etc/network/interfaces. If you see references to a different logical interface, then you can amend as appropriate.

Add the clone server (back into) to the GlusterFS cluster

Substitute for the network address of the failed rebuilt node, e.g.

$  sudo gluster peer probe 192.168.5.51
peer probe: success

Running "gluster peer status" confirms the node has been re-added:

$  sudo gluster peer status
Number of Peers: 3

Hostname: glusnode3.biscuit.ninja
Uuid: 7aeb75c3-6d54-4a1d-b8f4-623598f8da4a
State: Peer in Cluster (Connected)

Hostname: glusnode2.biscuit.ninja
Uuid: 3ba486b1-86e5-4d8d-899d-b9f969aa9079
State: Peer in Cluster (Connected)

Hostname: 192.168.5.51
Port: 24007
Uuid: 8fccf14e-4f84-44e8-9eeb-6d2d2b23e932
State: Peer in Cluster (Connected)

The force is necessary because we have created GlusterFS "volumes" within the root file system, which isn't recommended. These servers are not in a production environment. They are built from recycled components with a high degree of redundancy, which makes this compromise acceptable.

I've run these checks from our clone server, hence the slight variation when compared to the checks run earlier. All is looking healthy and listing the contents of /media/gluster-volume-2 shows data is getting synchronised into our new brick:

$ ls /media/gluster-volume-2/
2015-09-28

Change

localhost:/glusvol2    /var/log/vol2mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol1    /var/vol1mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2

to

localhost:/glusvol2    /var/log/vol2mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol1    /var/vol1mnt   glusterfs  defaults,nobootwait,_netdev,fetch-attempts=10 0 2

The clone server is now ready to be re-added to the Selenium Cluster. If there's a lot of data to be replicated onto the clone server, you may want to wait for the synchonisation to complete prior to re-adding it.