a sysadmin'z hard dayz: GlusterFS in a simple way

Here is the story how I managed to install a 2 node glusterfs on CentOS and one client for test purposes.
In my case the hostnames and the IPs were:

192.168.183.235 s1
192.168.183.236 s2
192.168.183.237 c1

Append these to the end of /etc/hosts to make sure that simple name resolution will work.
Execute the followings on both servers.

rpm -ivh http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-5.noarch.rpm
wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.5/CentOS/glusterfs-epel.repo
yum -y install glusterfs glusterfs-fuse glusterfs-server

It's no need to install any of samba packages if you don't intend to use smb.

systemctl enable glusterd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/glusterd.service to /usr/lib/systemd/system/glusterd.service.

Both servers had a second 20G capacity disk named sdb. I created two LV's for two bricks.

[root@s2 ~]# lvcreate -L 9G -n brick2 glustervg
Logical volume "brick2" created.
[root@s2 ~]# lvcreate -L 9G -n brick1 glustervg
Logical volume "brick1" created.
[root@s1 ~]# vgcreate glustervg /dev/sdb
Volume group "glustervg" successfully created
[root@s1 ~]# lvcreate -L 9G -n brick2 glustervg
Logical volume "brick2" created.
[root@s1 ~]# lvcreate -L 9G -n brick1 glustervg
Logical volume "brick1" created.
[root@s2 ~]# pvdisplay

--- Physical volume ---
PV Name               /dev/sdb
VG Name               glustervg
PV Size               20.00 GiB / not usable 4.00 MiB
Allocatable           yes
PE Size               4.00 MiB
Total PE              5119
Free PE               511
Allocated PE          4608
PV UUID               filZyX-wR7W-luFX-Asyn-fYA3-f7tf-q4xGyU
[...]
[root@s2 ~]# lvdisplay

--- Logical volume ---
LV Path                /dev/glustervg/brick2
LV Name                brick2
VG Name                glustervg
LV UUID                Rx3FPi-S3ps-x3Z0-FZrU-a2tq-IxS0-4gD2YQ
LV Write Access        read/write
LV Creation host, time s2, 2016-05-18 16:02:41 +0200
LV Status              available
# open                 0
LV Size                9.00 GiB
Current LE             2304
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     8192
Block device           253:3

--- Logical volume ---
LV Path                /dev/glustervg/brick1
LV Name                brick1
VG Name                glustervg
LV UUID                P5slcZ-dC7R-iFWv-e0pY-rvyb-YrPm-FM7YuP
LV Write Access        read/write
LV Creation host, time s2, 2016-05-18 16:02:43 +0200
LV Status              available
# open                 0
LV Size                9.00 GiB
Current LE             2304
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     8192
Block device           253:4
[...]

[root@s1 ~]# lvdisplay
--- Logical volume ---
LV Path                /dev/glustervg/brick2
LV Name                brick2
VG Name                glustervg
LV UUID                7yC2Wl-0lCJ-b7WZ-rgy4-4BMl-mT0I-CUtiM2
LV Write Access        read/write
LV Creation host, time s1, 2016-05-18 16:01:56 +0200
LV Status              available
# open                 0
LV Size                9.00 GiB
Current LE             2304
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     8192
Block device           253:2

--- Logical volume ---
LV Path                /dev/glustervg/brick1
LV Name                brick1
VG Name                glustervg
LV UUID                X6fzwM-qdRi-BNKH-63fa-q2O9-jvNw-u2geA2
LV Write Access        read/write
LV Creation host, time s1, 2016-05-18 16:02:05 +0200
LV Status              available
# open                 0
LV Size                9.00 GiB
Current LE             2304
Segments               1
Allocation             inherit
Read ahead sectors     auto
- currently set to     8192
Block device           253:3
[...]

[root@s1 ~]# mkfs.xfs /dev/glustervg/brick1

meta-data=/dev/glustervg/brick1 isize=256    agcount=4, agsize=589824 blks
         =                       sectsz=4096 attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2359296, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@s1 ~]# mkfs.xfs /dev/glustervg/brick2

meta-data=/dev/glustervg/brick2 isize=256    agcount=4, agsize=589824 blks
         =                       sectsz=4096 attr=2, projid32bit=1
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=2359296, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

[root@s1 ~]# mkdir -p /gluster/brick{1,2}
[root@s2 ~]# mkdir -p /gluster/brick{1,2}
[root@s1 ~]# mount /dev/glustervg/brick1 /gluster/brick1 && mount /dev/glustervg/brick2 /gluster/brick2
[root@s2 ~]# mount /dev/glustervg/brick1 /gluster/brick1 && mount /dev/glustervg/brick2 /gluster/brick2

Add the following to a newline in both /etc/fstab:

/dev/mapper/glustervg-brick1 /gluster/brick1 xfs rw,relatime,seclabel,attr2,inode64,noquota 0 0
/dev/mapper/glustervg-brick2 /gluster/brick2 xfs rw,relatime,seclabel,attr2,inode64,noquota 0 0

[root@s1 etc]# systemctl start glusterd.service

Making sure:
[root@s1 etc]# ps ax|grep gluster

1010 ?        Ssl    0:00 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO[root@s1 etc]# gluster peer probe s2
peer probe: success.

[root@s2 etc]# gluster peer status
Number of Peers: 1
Hostname: 192.168.183.235
Uuid: f5bdc3f3-0b43-4a83-86c1-c174594566b9
State: Peer in Cluster (Connected)

[root@s1 etc]# gluster pool list
UUID                                    Hostname        State
01cf8a70-d00f-487f-875e-9e38d4529b57    s2              Connected
f5bdc3f3-0b43-4a83-86c1-c174594566b9    localhost       Connected
[root@s1 etc]# gluster volume status
No volumes present

[root@s2 etc]# gluster volume infoNo volumes present

[root@s1 etc]# mkdir /gluster/brick1/mpoint1
[root@s2 etc]# mkdir /gluster/brick1/mpoint1
[root@s1 gluster]# gluster volume create myvol1 replica 2 transport tcp s1:/gluster/brick1/mpoint1 s2:/gluster/brick1/mpoint1
volume create: myvol1: failed: Staging failed on s2. Error: Host s1 is not in 'Peer in Cluster' state
Ooooops....
[root@s2 glusterfs]# ping s1ping: unknown host s1I forgot to check name resolution. When i fixed this and tried to create it again, i got:
[root@s1 glusterfs]# gluster volume create myvol1 replica 2 transport tcp s1:/gluster/brick1/mpoint1 s2:/gluster/brick1/mpoint1
volume create: myvol1: failed: /gluster/brick1/mpoint1 is already part of a volume

WTF ??
[root@s1 glusterfs]# gluster volume get myvol1 all
volume get option: failed: Volume myvol1 does not exist
[root@s1 glusterfs]# gluster
gluster>
exit         global       help         nfs-ganesha peer         pool         quit         snapshot     system::     volume
gluster> volume
add-brick      bitrot         delete         heal           inode-quota    profile        remove-brick   set            status         tier
attach-tier    clear-locks    detach-tier    help           list           quota          replace-brick start          stop           top
barrier        create         get            info           log            rebalance      reset          statedump      sync
gluster> volume l
list log
gluster> volume list
No volumes present in cluster
That's odd! Hmm. I thought it'd work:
[root@s1 /]# rm /gluster/brick1/mpoint1
[root@s1 /]# gluster volume create myvol1 replica 2 transport tcp s1:/gluster/brick1/mpoint1 s2:/gluster/brick1/mpoint1volume create: myvol1: success: please start the volume to access data

[root@s1 /]# gluster volume list
myvol1
Yep. Success. Phuhh.
[root@s1 /]# gluster volume start myvol1
volume start: myvol1: success
[root@s2 etc]# gluster volume list
myvol1
[root@s2 etc]# gluster volume status
Status of volume: myvol1
Gluster process                             TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick s1:/gluster/brick1/mpoint1            49152     0          Y       2528
Brick s2:/gluster/brick1/mpoint1            49152     0          Y       10033
NFS Server on localhost                     2049      0          Y       10054
Self-heal Daemon on localhost               N/A       N/A        Y       10061
NFS Server on 192.168.183.235               2049      0          Y       2550
Self-heal Daemon on 192.168.183.235         N/A       N/A        Y       2555

Task Status of Volume myvol1
------------------------------------------------------------------------------
There are no active volume tasks

[root@s1 ~]# gluster volume create myvol2 s1:/gluster/brick2/mpoint2 s2:/gluster/brick2/mpoint2 force
volume create: myvol2: success: please start the volume to access data
[root@s1 ~]# gluster volume start myvol2
volume start: myvol2: success
[root@s1 ~]# gluster volume info
Volume Name: myvol1
Type: Replicate
Volume ID: 633b765b-c630-4007-91ca-dc42714bead4
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: s1:/gluster/brick1/mpoint1
Brick2: s2:/gluster/brick1/mpoint1
Options Reconfigured:
performance.readdir-ahead: on

Volume Name: myvol2
Type: Distribute
Volume ID: ebfa9134-0e6a-40be-8045-5b16436b88ed
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: s1:/gluster/brick2/mpoint2
Brick2: s2:/gluster/brick2/mpoint2
Options Reconfigured:
performance.readdir-ahead: on

On the client:

[root@c1 ~]# wget -P /etc/yum.repos.d http://download.gluster.org/pub/gluster/glusterfs/LATEST/CentOS/glusterfs-epel.repo
[...]
[root@c1 ~]# yum -y install glusterfs glusterfs-fuse
[....]
[root@c1 ~]# mkdir /g{1,2}
[root@c1 ~]# mount.glusterfs s1:/myvol1 /g1
[root@c1 ~]# mount.glusterfs s1:/myvol2 /g2
[root@c1 ~]# mount
[...]
s1:/myvol1 on /g1 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
s2:/myvol2 on /g2 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@c1 ]# df -h
Filesystem               Size Used Avail Use% Mounted on
/dev/mapper/centos-root   28G 1.1G   27G   4% /
devtmpfs                 422M     0 422M   0% /dev
tmpfs                    431M     0 431M   0% /dev/shm
tmpfs                    431M 5.7M 426M   2% /run
tmpfs                    431M     0 431M   0% /sys/fs/cgroup
/dev/sda1                494M 164M 331M 34% /boot
tmpfs                     87M     0   87M   0% /run/user/0
s1:/myvol1               9.0G   34M 9.0G   1% /g1 [9G,9G because of replicating (aka RAID1 over network))
s2:/myvol2                18G   66M   18G   1% /g2 (9G+9G because of distributing (aka JBOD over network))

What is the difference between distributing and striping? Here are two short sniplets from glusterhacker blog:
Distribute : A distribute volume is one, in which all the data of the volume, is distributed throughout the bricks. Based on an algorithm, that takes into account the size available in each brick, the data will be stored in any one of the available bricks. [...] The default volume type is distribute, hence my myvol2 got distributed.
Stripe: A stripe volume is one, in which the data being stored in the backend is striped into units of a particular size, among the bricks. The default unit size is 128KB, but it's configurable. If we create a striped volume of stripe count 3, and then create a 300 KB file at the mount point, the first 128KB will be stored in the first sub-volume(brick in our case), the next 128KB in the second, and the remaining 56KB in the third. The number of bricks should be a multiple of the stripe count.

The very useable official howto is here.

Performance test, split brain, to be continued....

a sysadmin'z hard dayz

2016. május 19., csütörtök

GlusterFS in a simple way

Nincsenek megjegyzések:

Megjegyzés küldése

Magamról