/ GlusterFS

Replace Faulty Bricks in GlusterFS

Previous Posts:

From our GlusterFS Series we have covered the following:

Replacing Faulty GlusterFS Bricks:

Today we will simulate a hard drive failure, and go through the steps on how to replace a faulty brick.

Some background:

From our Distributed Replicated Volume we have the following volume structure:

Status of volume: gfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ip-172-31-44-169:/gluster/a/brick     49152     0          Y       7614 
Brick ip-172-31-47-175:/gluster/c/brick     49152     0          Y       7428 
Brick ip-172-31-44-169:/gluster/b/brick     49153     0          Y       7632 
Brick ip-172-31-47-175:/gluster/d/brick     49153     0          Y       7446 

I am using AWS, so my volumes is on EBS, so I will simulate a hard disk failure by force detaching the volume, then deleting the volume. The volume that I chose to remove will be /dev/xvdf 50G 33M 50G 1% /gluster/a, which will be Brick ip-172-31-44-169:/gluster/a/brick. So '/gluster/a/brick/file02.txt' should be affected, on the node iteself.

Detaching the Volume from the Instance:

Right after I Force-Detached the EBS volume from the instance, I got these messages on the shell:

Broadcast message from systemd-journald@ip-172-31-44-169 (Mon 2017-11-06 14:48:00 UTC):
gluster-a-brick[7614]: [2017-11-06 14:48:00.653127] M [MSGID: 113075] [posix-helpers.c:1821:posix_health_check_thread_proc] 0-gfs-posix: health-check failed, going down

ubuntu@ip-172-31-44-169:~$ 
Broadcast message from systemd-journald@ip-172-31-44-169 (Mon 2017-11-06 14:48:30 UTC):
gluster-a-brick[7614]: [2017-11-06 14:48:30.654082] M [MSGID: 113075] [posix-helpers.c:1827:posix_health_check_thread_proc] 0-gfs-posix: still alive! -> SIGTERM

Trying to list the directory which was mounted for the volume in question:

$ sudo ls /gluster/a 
ls: cannot access '/gluster/a': Input/output error

Having a look at the status again:

$ sudo gluster volume status gfs
Status of volume: gfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ip-172-31-44-169:/gluster/a/brick     N/A       N/A        N       N/A  
Brick ip-172-31-47-175:/gluster/c/brick     49152     0          Y       7428 
Brick ip-172-31-44-169:/gluster/b/brick     49153     0          Y       7632 
Brick ip-172-31-47-175:/gluster/d/brick     49153     0          Y       7446 
NFS Server on localhost                     2049      0          Y       8784 
Self-heal Daemon on localhost               N/A       N/A        Y       8792 
NFS Server on ip-172-31-47-175              2049      0          Y       7467 
Self-heal Daemon on ip-172-31-47-175        N/A       N/A        Y       7472 
 
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks

So at this point, we can definitely see the volume is dead, as we can see there's node Pid for the Brick.

$ sudo gluster volume info
 
Volume Name: gfs
Type: Distributed-Replicate
Volume ID: 4b0d3931-73be-4dff-b1a5-56d791fccaea
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ip-172-31-44-169:/gluster/a/brick
Brick2: ip-172-31-47-175:/gluster/c/brick
Brick3: ip-172-31-44-169:/gluster/b/brick
Brick4: ip-172-31-47-175:/gluster/d/brick
Options Reconfigured:
performance.readdir-ahead: on

From a GlusterFS Level, as the volume is replicated, we should still be able to view the file02.txt's content:

$ cat /mnt/file02.txt 
10082

which is served from this copy:

$ cat /gluster/c/brick/file02.txt 
10082

Repairing the Volume by Replacing the Brick:

Unmount the faulty volume from the Operating System:

$ sudo umount /gluster/a

Create a new EBS Volume and attach the volume to the EC2 Instance, note the partition that you mount it into, then list the block devices:

$ sudo lsblk
NAME    MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   8G  0 disk 
└─xvda1 202:1    0   8G  0 part /
xvdg    202:96   0  50G  0 disk /gluster/b
xvdh    202:112  0  50G  0 disk 

Create a new directory for our new volume, format the disk with XFS, and then mount:

$ sudo mkdir -p /gluster/e
$ sudo mkfs.xfs /dev/xvdh
$ sudo mount /dev/xvdh /gluster/e

Update fstab so that the device mapping and path /dev/xvdf /gluster/a is now replaced with /dev/xvdh /gluster/e, then remount:

$ sudo mount -a

Have a look at the disks again:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            488M     0  488M   0% /dev
/dev/xvda1      7.7G  1.3G  6.5G  17% /
/dev/xvdh        50G   33M   50G   1% /gluster/e
/dev/xvdg        50G   33M   50G   1% /gluster/b
localhost:/gfs  100G   65M  100G   1% /mnt

Now that the disk is mounted, create the brick directory and remove the old directory path:

$ sudo mkdir /gluster/e/brick
$ sudo chown -R ubuntu /gluster
$ sudo rm -rf /gluster/a

Note: When it was a case where any brick in a replicated volume went offline, the glusterd daemons on the remaining nodes keep track of all the files that are not replicated to the offline brick, this can be healed, by running the following:

$ sudo gluster volume heal gfs info
Brick ip-172-31-44-169:/gluster/a/brick
Status: Transport endpoint is not connected

Brick ip-172-31-47-175:/gluster/c/brick
Number of entries: 0

Brick ip-172-31-44-169:/gluster/b/brick
Number of entries: 0

Brick ip-172-31-47-175:/gluster/d/brick
Number of entries: 0

Time to replace our brick, replace the brick by specifying the volume which we want to replace the brick on, and then specifying source and target brick, which we want to replace:

$ sudo gluster volume replace-brick gfs ip-172-31-44-169:/gluster/a/brick ip-172-31-44-169:/gluster/e/brick commit force
volume replace-brick: success: replace-brick commit force operation successful

Let's have a look at the status:

$ sudo gluster volume status
Status of volume: gfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ip-172-31-44-169:/gluster/e/brick     49154     0          Y       9510 
Brick ip-172-31-47-175:/gluster/c/brick     49152     0          Y       7428 
Brick ip-172-31-44-169:/gluster/b/brick     49153     0          Y       7632 
Brick ip-172-31-47-175:/gluster/d/brick     49153     0          Y       7446 
NFS Server on localhost                     2049      0          Y       9519 
Self-heal Daemon on localhost               N/A       N/A        Y       9524 
NFS Server on ip-172-31-47-175              2049      0          Y       7928 
Self-heal Daemon on ip-172-31-47-175        N/A       N/A        Y       7935 
 
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks

We can see that the new brick is part of our volume, and it has a correlating Pid.

In a case where any brick in a replicated volume went offline, the glusterd daemons on the remaining nodes keep track of all the files that are not replicated to the offline brick, this can be healed, by running the following:

$ sudo gluster volume heal gfs info
Brick ip-172-31-44-169:/gluster/a/brick
Status: Transport endpoint is not connected

Brick ip-172-31-47-175:/gluster/c/brick
Number of entries: 0

Brick ip-172-31-44-169:/gluster/b/brick
Number of entries: 0

Brick ip-172-31-47-175:/gluster/d/brick
Number of entries: 0

Testing Replication: Restored Volume

Testing if Replication restored the data that was lost from the removed EBS volume:

$ cat /gluster/e/brick/file02.txt 
10082

Indeed, the replication restored the data. Although data was available from a GlusterFS volume perspective, but from having a look at the actual brick, we can see the data has been restored.

And from our volume info output, we can see that our brick was replaced:

$ sudo gluster volume info gfs
 
Volume Name: gfs
Type: Distributed-Replicate
Volume ID: 4b0d3931-73be-4dff-b1a5-56d791fccaea
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ip-172-31-44-169:/gluster/e/brick
Brick2: ip-172-31-47-175:/gluster/c/brick
Brick3: ip-172-31-44-169:/gluster/b/brick
Brick4: ip-172-31-47-175:/gluster/d/brick
Options Reconfigured:
performance.readdir-ahead: on

Resourcces:

Thanks

Thanks for reading :)