10 minutes
Resolving a GlusterFS Server Failure
Rebuilding a failed server and removing/re-adding bricks from/into a GlusterFS volume
Note: This is an old article and may contain content which is out of date.
Backgound
I setup four servers that have been built for running automated browser tests using Selenium. Each server runs a selenium hub and multiple XVNC desktops / selenium nodes. The tests are run in Firefox and the Firefox profile utilised by Selenium imports the Firebug Firefox extension. The profile also enables HTTP Archive logging. As there are four different servers in this Selenium cluster, in the event of a Selenium test failing, checking four different SMB shares for the correct HTTP Archive (HAR) file could be tedious. Thus, each server uses a distributed file system (GlusterFS) for storing the HAR files. Each server is configured as a ‘brick’ in a replicated GlusterFS volume. In fact there are two volumes. The second volume is used for storing files that some browser tests can upload if needed.
One of the nodes in the Selenium/GlusterFS cluster had suffered a hard disk failure. Thankfully, using four servers in the cluster gives plenty of redundancy.
I replaced the disk in the failed server and then reprovisioned it by temporarily withdrawing one of the other good servers from the cluster and cloning the disk. This is not recommended in a production environment. Ideally we would have the configuration for these servers written in Ansible, Chef or Puppet and stored in version control.
Before the clone server can be added to the node, I needed to remove it’s existing GlusterFS configuration and then re-add to the two GlusterFS volumes as a pair of new bricks. I couldn’t just change the hostname because GlusterFS uses UUIDs to identify each brick. The clone server would have had the same UUIDs for it’s bricks as the server it was cloned from, and thus any attempts to re-add it’s bricks to our two replicated GlusterFS volumes would fail.
This article describes the process I used in re-adding the clone server to the distributed GlusterFS cluster.
Removing the failed node and it’s bricks from the replicated GlusterFS volumes
Cloning the source node and restoring it to the repaired failed node will take some time. Whilst those operations are running, you can remove the dead GlusterFS bricks from the two GlusterFS volumes that we use.
-
On a third GlusterFS node, use the gluster command to retrieve some information about our gluster volumes.
$ sudo gluster volume info all Volume Name: glusvol1 Type: Replicate Volume ID: 09c0da39-d1b5-41ea-965b-0212ee316568 Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: glusnode1.biscuit.ninja:/media/gluster-volume-1 Brick2: glusnode2.biscuit.ninja:/media/gluster-volume-1 Brick3: glusnode3.biscuit.ninja:/media/gluster-volume-1 Brick4: glusnode4.biscuit.ninja:/media/gluster-volume-1 Options Reconfigured: server.allow-insecure: on auth.allow: * Volume Name: glusvol2 Type: Replicate Volume ID: 3553fcf7-cf6f-49ee-8c15-e7e02a9309b7 Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: glusnode1.biscuit.ninja:/media/gluster-volume-2 Brick2: glusnode2.biscuit.ninja:/media/gluster-volume-2 Brick3: glusnode3.biscuit.ninja:/media/gluster-volume-2 Brick4: glusnode4.biscuit.ninja:/media/gluster-volume-2 Options Reconfigured: auth.allow: * server.allow-insecure: on
-
In this example, glusnode1 is the failed node. We can confirm this:
$ sudo gluster peer status Number of Peers: 3 Hostname: 192.168.5.51 Uuid: fe29cf69-45f5-476a-a542-686e136cf3fc State: Peer in Cluster (Disconnected) Hostname: glusnode3.biscuit.ninja Uuid: 7aeb75c3-6d54-4a1d-b8f4-623598f8da4a State: Peer in Cluster (Connected) Hostname: glusnode2.biscuit.ninja Uuid: 3ba486b1-86e5-4d8d-899d-b9f969aa9079 State: Peer in Cluster (Connected)
Note our first peer, 192.168.5.51 (glusnode1) is disconnected. Of course, the node we are using to perform these checks does not itself show in the list of peers, hence there are just three peers returned from the gluster peer command.
-
When we check the status of our volumes, the dead node is omitted from the results.
$ sudo gluster volume status all Status of volume: glusvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick glusnode2.biscuit.ninja:/media/gluster-volume-1 49153 Y 1391 Brick glusnode3.biscuit.ninja:/media/gluster-volume-1 49153 Y 1345 Brick glusnode4.biscuit.ninja:/media/gluster-volume-1 49153 Y 1326 NFS Server on localhost 2049 Y 1340 Self-heal Daemon on localhost N/A Y 1345 NFS Server on glusnode3.biscuit.ninja 2049 Y 2862 Self-heal Daemon on glusnode3.biscuit.ninja N/A Y 2880 NFS Server on glusnode2.biscuit.ninja 2049 Y 1400 Self-heal Daemon on glusnode2.biscuit.ninja N/A Y 1405 There are no active volume tasks Status of volume: glusvol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick glusnode2.biscuit.ninja:/media/gluster-volume-2 49152 Y 1386 Brick glusnode3.biscuit.ninja:/media/gluster-volume-2 49152 Y 2852 Brick glusnode4.biscuit.ninja:/media/gluster-volume-2 49152 Y 1331 NFS Server on localhost 2049 Y 1340 Self-heal Daemon on localhost N/A Y 1345 NFS Server on glusnode3.biscuit.ninja 2049 Y 2862 Self-heal Daemon on glusnode3.biscuit.ninja N/A Y 2880 NFS Server on glusnode2.biscuit.ninja 2049 Y 1400 Self-heal Daemon on glusnode2.biscuit.ninja N/A Y 1405
-
Now we’ve established the state of play, go ahead and remove the failed brick from each GlusterFS volume using the Gluster command:
$ sudo gluster volume remove-brick glusvol1 replica 3 glusnode1.biscuit.ninja:/media/gluster-volume-1 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit force: success $ sudo gluster volume remove-brick glusvol2 replica 3 glusnode1.biscuit.ninja:/media/gluster-volume-2 Removing brick(s) can result in data loss. Do you want to Continue? (y/n) y volume remove-brick commit force: success
-
Detach the failed node:
$ sudo gluster peer detach glusnode1
You can now close this SSH session as we turn our attention to the rebuilt failed node.
Change the hostname and update the hosts file on our clone server
Assuming that you have now successfully repaired/cloned/rebuilt the failing node, we need to make some configuration changes before connecting it to the network. Boot it up and using the console, execute the following:
-
Change the hostname
$ sudo vi /etc/hostname
Assuming we’ve cloned from glusnode2, change
glusnode2
to
glusnode1
-
Update the hosts file
$ sudo vi /etc/hosts
Change
127.0.1.1 glusnode2
to
127.0.1.1 glusnode1
Removing the existing GlusterFS bricks from our clone server
Before we remove the bricks from the clone server, we need to stop them from being automatically mounted. You will see that our GlusterFS volumes are stored within the root partition
-
Remove the GlusterFS volumes from fstab by commenting out any lines that begin with localhost
$ sudo vi /etc/fstab
Change
localhost:/glusvol1 /var/vol1mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol2 /var/log/vol2mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
To
localhost:/glusvol1 /var/vol1mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol2 /var/log/vol2mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
-
Delete the old physical GlusterFS volumes (I’m not interested in saving any data):
$ sudo rm -rf /media/gluster*
-
Optionally, clear out the logs folder. Obviously these logs really belong to the source node which we cloned to rebuild the failed node and leaving them behind could confuse the issue. You may want to go a step further and remove the contents of /var/log. If you have logrotate configured for all of your log files, you could run “logrotate –force” and remove any files suffixed .1 or 2.gz etc.
$ sudo rm -rf /var/log/glusterfs/*
-
Delete existing GlusterFS peer/volume/brick metadata. GlusterFS will re-initialise this folder structure when it is restarted.
$ sudo rm -rf /var/lib/glusterd/*
-
Recreate the folders used as glusterfs volumes:
$ sudo mkdir gluster-volume-1 $ sudo mkdir gluster-volume-2 $ sudo chmod 777 gluster*
It isn’t absolutely necessary to delete and recreate these folders. The alternative is removing the file attributes glusterfs cares about. It’s arguably slightly more typing. If you want to retain the folders rather than deleting them and recreating them, you can try:
$ sudo setfattr -x trusted.glusterfs.volume-id /media/gluster-volume-1
$ sudo setfattr -x trusted.gfid /media/gluster-volume-1
$ sudo rm -rf /media/gluster-volume-1/.glusterfs
$ sudo setfattr -x trusted.glusterfs.volume-id /media/gluster-volume-2
$ sudo setfattr -x trusted.gfid /media/gluster-volume-2
$ sudo rm -rf /media/gluster-volume-1/.glusterfs
-
That’s our node cleaned up. Now assuming that the node relies on DHCP, we can shutit down and reconnected it to the network …
$ sudo shutdown now
If the node relies on a static IP configuration, then you will need to update “/etc/network/interfaces” with the correct IP address, otherwise the node will call an IP address conflict with the node form which it was cloned.
If the machine fails to connect to the network, it’s likely that the ethernet interface has a different logical name, for example it may now be called eth1 instead of eth0. You can get the logical name of the network adapter with:
$ lshw -class network | grep "logical name"
You can check the returned result against /etc/network/interfaces. If you see references to a different logical interface, then you can amend as appropriate.
Add the clone server (back into) to the GlusterFS cluster
-
We’re now in a position to reconfigure GlusterFS on our clone server. Start a new SSH session and confirm the glusterfs service is running
$ sudo service glusterfs-server status glusterfs-server start/running, process 5721
-
Start a new SSH session on an existing good node and execute:
$ sudo gluster peer probe <ip address>
Substitute for the network address of the failed rebuilt node, e.g.
$ sudo gluster peer probe 192.168.5.51
peer probe: success
Running “gluster peer status” confirms the node has been re-added:
$ sudo gluster peer status
Number of Peers: 3
Hostname: glusnode3.biscuit.ninja
Uuid: 7aeb75c3-6d54-4a1d-b8f4-623598f8da4a
State: Peer in Cluster (Connected)
Hostname: glusnode2.biscuit.ninja
Uuid: 3ba486b1-86e5-4d8d-899d-b9f969aa9079
State: Peer in Cluster (Connected)
Hostname: 192.168.5.51
Port: 24007
Uuid: 8fccf14e-4f84-44e8-9eeb-6d2d2b23e932
State: Peer in Cluster (Connected)
-
Now we can re-add bricks for both glusterfs volumes served from our rebuilt failed node:
$ sudo gluster volume add-brick glusvol1 replica 4 192.168.5.51:/media/gluster-volume-1 force volume add-brick: success
$ sudo gluster volume add-brick glusvol2 replica 4 192.168.5.51:/media/gluster-volume-2 force volume add-brick: success
The force is necessary because we have created GlusterFS “volumes” within the root file system, which isn’t recommended. These servers are not in a production environment. They are built from recycled components with a high degree of redundancy, which makes this compromise acceptable.
-
Check the status of the new bricks:
$ sudo gluster volume info all
Volume Name: glusvol1 Type: Replicate Volume ID: 09c0da39-d1b5-41ea-965b-0212ee316568 Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: glusnode2.biscuit.ninja:/media/gluster-volume-1 Brick2: glusnode3.biscuit.ninja:/media/gluster-volume-1 Brick3: glusnode4.biscuit.ninja:/media/gluster-volume-1 Brick4: 192.168.5.51:/media/gluster-volume-1 Options Reconfigured: auth.allow: * server.allow-insecure: on Volume Name: glusvol2 Type: Replicate Volume ID: 3553fcf7-cf6f-49ee-8c15-e7e02a9309b7 Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: glusnode2.biscuit.ninja:/media/gluster-volume-2 Brick2: glusnode3.biscuit.ninja:/media/gluster-volume-2 Brick3: glusnode4.biscuit.ninja:/media/gluster-volume-2 Brick4: 192.168.5.51:/media/gluster-volume-2 Options Reconfigured: server.allow-insecure: on auth.allow: * $ sudo gluster peer status Number of Peers: 3 Hostname: glusnode4.biscuit.ninja Port: 24007 Uuid: aaa72f7a-ea87-4bc1-beda-e95f7aff4398 State: Peer in Cluster (Connected) Hostname: glusnode3.biscuit.ninja Uuid: 7aeb75c3-6d54-4a1d-b8f4-623598f8da4a State: Peer in Cluster (Connected) Hostname: glusnode2.biscuit.ninja Uuid: 3ba486b1-86e5-4d8d-899d-b9f969aa9079 State: Peer in Cluster (Connected) $ sudo gluster volume status all Status of volume: glusvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick glusnode2.biscuit.ninja:/media/gluster-volume-1 49153 Y 1391 Brick glusnode3.biscuit.ninja:/media/gluster-volume-1 49153 Y 1345 Brick glusnode4.biscuit.ninja:/media/gluster-volume-1 49153 Y 1326 Brick 192.168.5.51:/media/gluster-volume-1 49152 Y 10332 NFS Server on localhost 2049 Y 10537 Self-heal Daemon on localhost N/A Y 10544 NFS Server on glusnode4.biscuit.ninja 2049 Y 15518 Self-heal Daemon on glusnode4.biscuit.ninja N/A Y 15525 NFS Server on glusnode3.biscuit.ninja 2049 Y 17375 Self-heal Daemon on glusnode3.biscuit.ninja N/A Y 17400 NFS Server on glusnode2.biscuit.ninja 2049 Y 19522 Self-heal Daemon on glusnode2.biscuit.ninja N/A Y 19535 There are no active volume tasks Status of volume: glusvol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick glusnode2.biscuit.ninja:/media/gluster-volume-2 49152 Y 1386 Brick glusnode3.biscuit.ninja:/media/gluster-volume-2 49152 Y 2852 Brick glusnode4.biscuit.ninja:/media/gluster-volume-2 49152 Y 1331 Brick 192.168.5.51:/media/gluster-volume-2 49153 Y 10518 NFS Server on localhost 2049 Y 10537 Self-heal Daemon on localhost N/A Y 10544 NFS Server on glusnode4.biscuit.ninja 2049 Y 15518 Self-heal Daemon on glusnode4.biscuit.ninja N/A Y 15525 NFS Server on glusnode3.biscuit.ninja 2049 Y 17375 Self-heal Daemon on glusnode3.biscuit.ninja N/A Y 17400 NFS Server on glusnode2.biscuit.ninja 2049 Y 19522 Self-heal Daemon on glusnode2.biscuit.ninja N/A Y 19535 There are no active volume tasks
I’ve run these checks from our clone server, hence the slight variation when compared to the checks run earlier. All is looking healthy and listing the contents of /media/gluster-volume-2 shows data is getting synchronised into our new brick:
$ ls /media/gluster-volume-2/
2015-09-28
-
We can now remount the gluster volumes locally. Edit /etc/fstab and uncomment out the two lines beginning “#localhost:/”
$ sudo vi /etc/fstab
Change
localhost:/glusvol2 /var/log/vol2mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol1 /var/vol1mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
to
localhost:/glusvol2 /var/log/vol2mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
localhost:/glusvol1 /var/vol1mnt glusterfs defaults,nobootwait,_netdev,fetch-attempts=10 0 2
-
Mount the glusterfs volumes
$ sudo mount -a
The clone server is now ready to be re-added to the Selenium Cluster. If there’s a lot of data to be replicated onto the clone server, you may want to wait for the synchonisation to complete prior to re-adding it.