Friday, March 16, 2012

OCFS2 Cluster using Pacemaker and Corosync



==========================
OS : OpenSUSE 11.4
==========================
Node1               :- mysql01.ansil.com      - 192.168.56.11
Node2               :- mysql02.ansil.com      - 192.168.56.12
ClusterIP            :- mysql.ansil.com           - 192.168.56.10
Shared Storage   :- san.ansil.com              - 192.168.56.200
==========================
OS : OpenSUSE 11.4
==========================

The setup is intended for testing purpose only.

Install OS on 3 hosts (Host names,IPs are given above)
Ensure TWO disks are free in san.ansil.com (For shared storage and Fencing)
Make sure all hosts having access to installation DVD/iso
The /etc/hosts file must be configured to resolve name of both hosts

Setup a shared storage using "tgt" ( on san.ansil.com )
----------------------------------------------------------------------------------- 
The Shared storage can be made using iSCSI with tgt (http://stgt.sourceforge.net/)
Download the latest source from   https://github.com/fujita/tgt/archives/master
Install "make" and "gcc"packages using zypper install make gcc
Untar tgt tarball-->Go to the newly created directory and execute make
After compilation install the tgt using make install
Start tgt using nohup tgtd & 
-----------------------------------------------------------------------------------
Create and Share LUNs using tgt
 ----------------------------------------------------------------------------------- 
 1.Create new target and iqn
 tgtadm --lld iscsi --op new --mode target --tid 1 \
 -T iqn.2012-03.com.ansil:storage.tgt.cluster.ocfs.a 

2.Verify the present status of targets
tgtadm --lld iscsi --op show --mode target 
   
Target 1: iqn.2012-03.com.ansil:storage.tgt.cluster.ocfs.a
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: null
            Backing store path: None
            Backing store flags:
    Account information:
    ACL information:

3.Bind disks (/dev/sg1 and /dev/sg2 ) to the iqn for sharing
tgtadm --lld iscsi --op new --mode logicalunit \
        --tid 1 --lun 1 --bstype=sg --device-type=pt -b /dev/sg1

tgtadm --lld iscsi --op new --mode logicalunit \
        --tid 1 --lun 2 --bstype=sg --device-type=pt -b /dev/sg2



4.Verify the present status of targets
tgtadm --lld iscsi --op show --mode target  

Last lines will be like this 
        LUN: 1
            Type: passthrough
            SCSI ID: IET     00010001
            SCSI SN: beaf11
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: sg
            Backing store path: /dev/sg1
            Backing store flags:
        LUN: 2
            Type: passthrough
            SCSI ID: IET     00010002
            SCSI SN: beaf11
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            Backing store type: sg
            Backing store path: /dev/sg2
            Backing store flags:

    Account information:
    ACL information:


5.Change the ACL to allow hosts
tgtadm --lld iscsi --op bind --mode target --tid 1 -I ALL

Now our LUN is ready for sharing

Cluster Setup

zypper install pacemaker corosync

The following NEW packages are going to be installed:
 OpenIPMI
cluster-glue corosync
 libcorosync4
libdlm
libdlm3
libesmtp
libglue2
libnet1
libopenais3
libpacemaker3
openais
openhpi
openhpi-daemon
pacemaker
python-curses
python-tk resource-agents
tk


cd /etc/corosync/
cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf

Change the Bind address,Multicast  address and multicast port

vi /etc/corosync/corosync.conf

        interface {
                ringnumber:     0

                # The following values need to be set based on your environment
                bindnetaddr:    192.168.56.0
                mcastaddr:      226.94.1.1
                mcastport:      4000
        }


Copy the Cluster configuration  file to second node

scp /etc/corosync/corosync.conf mysql02:/etc/corosync/

Create authentication file for corosync and both node must contain same file
mysql01:~ #  cd /etc/corosync
mysql01:~ #  corosync-keygen
mysql01:~ #  scp authkey mysql02:/etc/corosync/


start corosync service  on mysql01

 mysql01:~ # rcopenais start

The /var/log/message may show the below errors and it will be taken care while configuring cluster
The ERRORs are related to STONITH and presently we didn’t configured anything
WARN: cluster_status: We do not have quorum - fencing and resource management disabled
ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity


Start corosync service on mysql02

mysql01:~ #  rcopenais start

After a few seconds execute crm_mon command

mysql01:~ # crm_mon -1
============
Last updated: Mon Mar 12 20:03:32 2012
Stack: openais
Current DC: mysql01 - partition with quorum
Version: 1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8
2 Nodes configured, 2 expected votes  à Both Nodes are configured
0 Resources configured.
============

Online: [ mysql01 mysql02 ] àBoth nodes are online

mysql01:~ #

Two nodes are configures and both are online


Our Basic Cluster Setup is ready…!!!!!


Install OCFS2 Packages and its dependencies (Both Nodes)

mysql01:~# zypper install vte016-sharp
The following NEW packages are going to be installed:
glib-sharp2
gtk-sharp2
libgdiplus0
libvte9 mono-core
vte016-sharp

Download the latest ocfs2 packages from opensuse site and install them

mysql01:~# rpm -ivh ocfs2-tools-o2cb-1.8.0-5.2.i586.rpm ocfs2-tools-1.8.0-5.2.i586.rpm ocfs2console-1.8.0-5.2.i586.rpm

Check the ocfs2 cluster status

mysql01:~#  rco2cb status
Driver for "configfs": Not loaded
Driver for "ocfs2_dlmfs": Not loaded
mysql01:~#

Remember, you need to follow the same step on mysql02 as well



Configure Shared Storage on Cluster Nodes


Follow the steps on both nodes

lsscsi –g   àNote down the present scsi disks / tapes
rcopen-iscsi  start  à Start iscsi service
iscsiadm -m discovery -t sendtargets –p san.ansil.com à Discover iqn numbers

iscsiadm –m node –T   iqn.2012-03.com.ansil:storage.tgt.cluster.ocfs.a –p san.ansil.com –login àLogin

lsscsi –g à Note down the newly detected LUNs

Suppose the newly detected LUNs are like /dev/sdb and /dev/sdc

Create a Partition on /dev/sdb using fdisk  on mysql01 node

fdisk /dev/sdb


After creation of new partition, execute partprobe on node2 mysql02

partprobe /dev/sdb


Create External STONITH device


Stop Cluster Service on both nodes

rcopenais stop

Create sbc device on mysql01

sbd -d /dev/sdc create
sbd -d /dev/sdc dump

Do the following steps for both nodes

modprobe softdog

Create a file /etc/sysconfig/sbd for starting sbd device along with cluster

vi /etc/sysconfig/sbd

SBD_DEVICE="/dev/sdc"
SBD_OPTS="-W"
Start cluster service on both the nodes

rcopenais start

Verify the cluster using crm_verify command

mysql01:~ # crm_verify -L
crm_verify[5889]: 2012/03/13_02:56:39 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
crm_verify[5889]: 2012/03/13_02:56:39 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
crm_verify[5889]: 2012/03/13_02:56:39 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid  -V may provide more details

 Yes the errors are saying about STONITH,we didn’t configures it yet ,now we are going to

You don’t need to execute crm commands from both the nodes,the cluster will take care the configuration updation on another node

crm configure primitive stonith_sbd stonith:external/sbd params sbd_device="/dev/sdc"

Now we configured the STONITH  and the crm_verify will not produce any errors or warnings

mysql01:~ # crm_verify –L
mysql01:~ #

Configure OCFS2 in Pacemaker

In Pacemaker managed OCFS2 cluster the dynamic locking and heartbeat will be managed by Pacemaker itself.There is no role for OCFS2 internal mechanism .So we don’t have to create any configuration file specific to OCFS2

Configure the dlm and ocfs2 modules in cluster

mysql01:~ #  crm configure primitive p_controld ocf:pacemaker:controld
mysql01:~ #  crm configure primitive p_o2cb ocf:ocfs2:o2cb

Now both resource are running in mysql01,but we need both service need to be run simultaneously on both nodes.For that we will put both resource into a single group and then we will clone that group

mysql01:~ #  crm configure group base-group p_controld p_o2cb
mysql01:~ #  crm configure clone base-clone base-group meta interleave="true"



If cluster service not started then install kernel-desktop-2.6.37.6-0.9.1.i586 (reboot both nodes)

stop openais on working node
cleanup resource from second node
crm_resource -r p_o2cb –cleanup

Our OCFS2 Cluster is managed by Pacemaker and the resource group is running on both nodes
Now we can create an OCFS2 file system on shared sorage
(The partition is already done Refer:- Configuring Shared Storage)

mkfs.ocfs2 -N 2 /dev/sdc1

Create a directory call /data on both nodes

Mount the /dev/sdc1 on both nodes .You can mount the same simultaneously on both nodes,That’s why we done all these steps

Don’t forget to unmount them, because the cluster will manage the mount and unmount operation

Now crate a File system resource on cluster

crm configure primitive ocfs2-1 ocf:heartbeat:Filesystem params device="/dev/sdc1" directory="/data" fstype="ocfs2" options="acl" op monitor interval="20" timeout="40"

Wait..!!!..it will run only on one node ,so we need to add the file system resource to the existing ocfs2 group

crm configure edit base-group
group base-group p_controld p_o2cb ocfs2-1 àThis one need to be added manually


Now check your cluster status with crm_mon and verify the /data is mounted on both nodes


Happy Clustering …I’ll be back with more exciting cluster topics



LVM thin provisioning - file system usage and pool usage dosn't match

When I was demonstrating LVM thin provisioning to new batch of campus hires ; they pointed out an important mismatch between thin pool usag...