Setup a Distributed Storage Volume with GlusterFS

GlusterFS is a Awesome Scalable Networked Filesystem, which makes it Easy to Create Large and Scalable Storage Solutions on Commodity Hardware.

Basic Concepts of GlusterFS:

  • Brick:
    In GlusterFS, a brick is the basic unit of storage, represented by a directory on the server in the trusted storage pool.

  • Gluster Volume:
    A Gluster volume is a Logical Collection of Bricks.

  • Distributed Filesystem:
    The concept is to enable multiple clients to concurrently access data which is spread across multple servers in a trusted storage pool. This is also a great solution to prevent data corruption, enable highly available storage systems, etc.

More concepts can be retrieved from their documentation.

Different GlusterFS Volume Types:

With GlusterFS you can create the following types of Gluster Volumes:

  • Distributed Volumes: (Ideal for Scalable Storage, No Data Redundancy)
  • Replicated Volumes: (Better reliability and data redundancy)
  • Distributed-Replicated Volumes: (HA of Data due to Redundancy and Scaling Storage)
  • More detail on GlusterFS Architecture

Setup a Distributed Gluster Volume:

In this guide we will setup a 3 Node Distributed GlusterFS Volume on Ubuntu 16.04.

For this use case we would like to achieve a storage solution to scale the size of our storage, and not really worried about redundancy as, with a Distributed Setup we can increase the size of our volume, the more bricks we add to our GlusterFS Volume.

Setup: Our Environment:

Each node has 2 disks, /dev/xvda for the Operating System wich is 20GB and /dev/xvdb which has 100GB. After we have created our GlusterFS Volume, we will have a Gluster Volume of 300GB.

Having a look at our disks:

$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   20G  0 disk
└─xvda1 202:1    0   20G  0 part /
xvdb    202:16   0  100G  0 disk 

If you don't have DNS setup for your nodes, you can use your /etc/hosts file for all 3 nodes, which I will be using in this demonstration:

$ cat /etc/hosts
172.31.13.226   gluster-node-1
172.31.9.7      gluster-node-2
172.31.15.34    gluster-node-3
127.0.0.1       localhost

Install GlusterFS from the Package Manager:

Note that all the steps below needs to be performed on all 3 nodes, unless specified otherwise:

$ apt update && apt upgrade -y
$ apt install xfsprogs attr glusterfs-server glusterfs-client glusterfs-common -y

Format and Prepare the Gluster Disks:

We will create a XFS Filesystem for our 100GB disk, create the directory path where we will mount our disk onto, and also load it into /etc/fstab:

$ mkfs.xfs /dev/xvdb
$ mkdir /gluster
$ echo '/dev/xvdb /gluster xfs defaults 0 0' >> /etc/fstab
$ mount -a

After we mounted the disk, we should see that our disk is mounted to /gluster:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       20G  1.2G   19G   7% /
/dev/xvdb       100G   33M  100G   1% /gluster

After our disk is mounted, we can proceed by creating the brick directory on our disk that we mounted, from the step above:

$ mkdir /gluster/brick

Start GlusterFS Service:

Enable GlusterFS on startup, start the service and make sure that the service is running:

$ systemctl enable glusterfs-server
$ systemctl restart glusterfs-server
$ systemctl is-active glusterfs-server
active

Discover All the Nodes for our Cluster:

The following will only be done on one of the nodes. First we need to discover our other nodes.

The node that you are currently on, will be discovered by default and only needs the other 2 nodes to be discovered:

$ gluster peer probe gluster-node-2
$ gluster peer probe gluster-node-3

Let's verify this by listing all the nodes in our cluster:

$ gluster pool list
UUID                                    Hostname        State
6e02731c-6472-4ea4-bd48-d5dd87150e8b    gluster-node-2  Connected
9d4c2605-57ba-49e2-b5da-a970448dc886    gluster-node-3  Connected
608f027e-e953-413b-b370-ce84050a83c9    localhost       Connected

Create the Distributed GlusterFS Volume:

We will create a Distributed GlusterFS Volume across 3 nodes, and we will name the volume gfs:

$ gluster volume create gfs gluster-node-1:/gluster/brick gluster-node-2:/gluster/brick gluster-node-3:/gluster/brick
volume create: gfs: success: please start the volume to access data

Start the GlusterFS Volume:

Now start the gfs GlusterFS Volume:

$ gluster volume start gfs
volume start: gfs: success

To get information about the volume:

$ gluster volume info gfs

Volume Name: gfs
Type: Distribute
Volume ID: c08bc2e8-59b3-49e7-bc17-d4bc8d99a92f
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: gluster-node-1:/gluster/brick
Brick2: gluster-node-2:/gluster/brick
Brick3: gluster-node-3:/gluster/brick
Options Reconfigured:
performance.readdir-ahead: on

Status information about our Volume:

$ gluster volume status

Status of volume: gfs
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick gluster-node-1:/gluster/brick         49152     0          Y       7139
Brick gluster-node-2:/gluster/brick         49152     0          Y       7027
Brick gluster-node-3:/gluster/brick         49152     0          Y       7099
NFS Server on localhost                     2049      0          Y       7158
NFS Server on gluster-node-2                2049      0          Y       7046
NFS Server on gluster-node-3                2049      0          Y       7118

Task Status of Volume gfs
------------------------------------------------------------------------------
There are no active volume tasks

Mounting our GlusterFS Volume:

On all the clients, in this case our 3 nodes, load the mount information into /etc/fstab and then mount the GlusterFS Volume:

$ echo 'localhost:/gfs /mnt glusterfs defaults,_netdev,backupvolfile-server=gluster-node-1 0 0' >> /etc/fstab
$ mount -a

Now that the volume is mounted, have a look at your disk info, and you will find that you have a 300GB GlusterFS Volume mounted:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       20G  1.3G   19G   7% /
/dev/xvdb       100G   33M  100G   1% /gluster
localhost:/gfs  300G   98M  300G   1% /mnt

As mentioned before, this is most probably for a scenario where you would like to achieve a high storage size and not really concerned about data availability.

In the next couple of weeks I will also go through the Replicated, Distributed-Replicated and GlusterFS with ZFS setups.

Related Resources: