Previous Posts:
From our GlusterFS Series we have covered the following:
- GlusterFS: Distributed Replicated Volume
- GlusterFS: Distributed Storage Volume
- GlusterFS: Replicated Storage Volume
- GlusterFS: Adding Bricks to your Volume
- GlusterFS: Replace Faulty Bricks
Replacing Faulty GlusterFS Bricks:
Today we will simulate a hard drive failure, and go through the steps on how to replace a faulty brick.
Some background:
From our Distributed Replicated Volume we have the following volume structure:
Status of volume: gfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick ip-172-31-44-169:/gluster/a/brick 49152 0 Y 7614
Brick ip-172-31-47-175:/gluster/c/brick 49152 0 Y 7428
Brick ip-172-31-44-169:/gluster/b/brick 49153 0 Y 7632
Brick ip-172-31-47-175:/gluster/d/brick 49153 0 Y 7446
I am using AWS, so my volumes is on EBS, so I will simulate a hard disk failure by force detaching the volume, then deleting the volume. The volume that I chose to remove will be /dev/xvdf 50G 33M 50G 1% /gluster/a
, which will be Brick ip-172-31-44-169:/gluster/a/brick
. So '/gluster/a/brick/file02.txt' should be affected, on the node iteself.
Detaching the Volume from the Instance:
Right after I Force-Detached the EBS volume from the instance, I got these messages on the shell:
Broadcast message from systemd-journald@ip-172-31-44-169 (Mon 2017-11-06 14:48:00 UTC):
gluster-a-brick[7614]: [2017-11-06 14:48:00.653127] M [MSGID: 113075] [posix-helpers.c:1821:posix_health_check_thread_proc] 0-gfs-posix: health-check failed, going down
ubuntu@ip-172-31-44-169:~$
Broadcast message from systemd-journald@ip-172-31-44-169 (Mon 2017-11-06 14:48:30 UTC):
gluster-a-brick[7614]: [2017-11-06 14:48:30.654082] M [MSGID: 113075] [posix-helpers.c:1827:posix_health_check_thread_proc] 0-gfs-posix: still alive! -> SIGTERM
Trying to list the directory which was mounted for the volume in question:
$ sudo ls /gluster/a
ls: cannot access '/gluster/a': Input/output error
Having a look at the status again:
$ sudo gluster volume status gfs
Status of volume: gfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick ip-172-31-44-169:/gluster/a/brick N/A N/A N N/A
Brick ip-172-31-47-175:/gluster/c/brick 49152 0 Y 7428
Brick ip-172-31-44-169:/gluster/b/brick 49153 0 Y 7632
Brick ip-172-31-47-175:/gluster/d/brick 49153 0 Y 7446
NFS Server on localhost 2049 0 Y 8784
Self-heal Daemon on localhost N/A N/A Y 8792
NFS Server on ip-172-31-47-175 2049 0 Y 7467
Self-heal Daemon on ip-172-31-47-175 N/A N/A Y 7472
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks
So at this point, we can definitely see the volume is dead, as we can see there's node Pid for the Brick.
$ sudo gluster volume info
Volume Name: gfs
Type: Distributed-Replicate
Volume ID: 4b0d3931-73be-4dff-b1a5-56d791fccaea
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ip-172-31-44-169:/gluster/a/brick
Brick2: ip-172-31-47-175:/gluster/c/brick
Brick3: ip-172-31-44-169:/gluster/b/brick
Brick4: ip-172-31-47-175:/gluster/d/brick
Options Reconfigured:
performance.readdir-ahead: on
From a GlusterFS Level, as the volume is replicated, we should still be able to view the file02.txt
's content:
$ cat /mnt/file02.txt
10082
which is served from this copy:
$ cat /gluster/c/brick/file02.txt
10082
Repairing the Volume by Replacing the Brick:
Unmount the faulty volume from the Operating System:
$ sudo umount /gluster/a
Create a new EBS Volume and attach the volume to the EC2 Instance, note the partition that you mount it into, then list the block devices:
$ sudo lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvda 202:0 0 8G 0 disk
└─xvda1 202:1 0 8G 0 part /
xvdg 202:96 0 50G 0 disk /gluster/b
xvdh 202:112 0 50G 0 disk
Create a new directory for our new volume, format the disk with XFS, and then mount:
$ sudo mkdir -p /gluster/e
$ sudo mkfs.xfs /dev/xvdh
$ sudo mount /dev/xvdh /gluster/e
Update fstab so that the device mapping and path /dev/xvdf /gluster/a
is now replaced with /dev/xvdh /gluster/e
, then remount:
$ sudo mount -a
Have a look at the disks again:
$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 488M 0 488M 0% /dev
/dev/xvda1 7.7G 1.3G 6.5G 17% /
/dev/xvdh 50G 33M 50G 1% /gluster/e
/dev/xvdg 50G 33M 50G 1% /gluster/b
localhost:/gfs 100G 65M 100G 1% /mnt
Now that the disk is mounted, create the brick directory and remove the old directory path:
$ sudo mkdir /gluster/e/brick
$ sudo chown -R ubuntu /gluster
$ sudo rm -rf /gluster/a
Note: When it was a case where any brick in a replicated volume went offline, the glusterd daemons on the remaining nodes keep track of all the files that are not replicated to the offline brick, this can be healed, by running the following:
$ sudo gluster volume heal gfs info
Brick ip-172-31-44-169:/gluster/a/brick
Status: Transport endpoint is not connected
Brick ip-172-31-47-175:/gluster/c/brick
Number of entries: 0
Brick ip-172-31-44-169:/gluster/b/brick
Number of entries: 0
Brick ip-172-31-47-175:/gluster/d/brick
Number of entries: 0
Time to replace our brick, replace the brick by specifying the volume which we want to replace the brick on, and then specifying source and target brick, which we want to replace:
$ sudo gluster volume replace-brick gfs ip-172-31-44-169:/gluster/a/brick ip-172-31-44-169:/gluster/e/brick commit force
volume replace-brick: success: replace-brick commit force operation successful
Let's have a look at the status:
$ sudo gluster volume status
Status of volume: gfs
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick ip-172-31-44-169:/gluster/e/brick 49154 0 Y 9510
Brick ip-172-31-47-175:/gluster/c/brick 49152 0 Y 7428
Brick ip-172-31-44-169:/gluster/b/brick 49153 0 Y 7632
Brick ip-172-31-47-175:/gluster/d/brick 49153 0 Y 7446
NFS Server on localhost 2049 0 Y 9519
Self-heal Daemon on localhost N/A N/A Y 9524
NFS Server on ip-172-31-47-175 2049 0 Y 7928
Self-heal Daemon on ip-172-31-47-175 N/A N/A Y 7935
Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks
We can see that the new brick is part of our volume, and it has a correlating Pid.
In a case where any brick in a replicated volume went offline, the glusterd daemons on the remaining nodes keep track of all the files that are not replicated to the offline brick, this can be healed, by running the following:
$ sudo gluster volume heal gfs info
Brick ip-172-31-44-169:/gluster/a/brick
Status: Transport endpoint is not connected
Brick ip-172-31-47-175:/gluster/c/brick
Number of entries: 0
Brick ip-172-31-44-169:/gluster/b/brick
Number of entries: 0
Brick ip-172-31-47-175:/gluster/d/brick
Number of entries: 0
Testing Replication: Restored Volume
Testing if Replication restored the data that was lost from the removed EBS volume:
$ cat /gluster/e/brick/file02.txt
10082
Indeed, the replication restored the data. Although data was available from a GlusterFS volume perspective, but from having a look at the actual brick, we can see the data has been restored.
And from our volume info output, we can see that our brick was replaced:
$ sudo gluster volume info gfs
Volume Name: gfs
Type: Distributed-Replicate
Volume ID: 4b0d3931-73be-4dff-b1a5-56d791fccaea
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: ip-172-31-44-169:/gluster/e/brick
Brick2: ip-172-31-47-175:/gluster/c/brick
Brick3: ip-172-31-44-169:/gluster/b/brick
Brick4: ip-172-31-47-175:/gluster/d/brick
Options Reconfigured:
performance.readdir-ahead: on
Resourcces:
- GlusterFS Server Failures
- GlusterFS Tips and Tricks
- GlusterFS Storage Administration Guide
- GlusterFS: Manage Volumes / Replacing Bricks
Thanks
Thanks for reading :)
Comments