Howto HA Cluster Wowza Servers with CRM / Pacemaker / Corosync and RHEL 6.x

04/18/12 | by coolzero [mail] | Categories: Just a story...

This howto explains how to setup a High Availability Cluster with Wowza and Redhat Enterprise Linux 6.x. In our setup we have a cluster of 2 origin servers. We are using the following details below with the ip settings in the complete document. We have 2 servers, and are going to use one VIP (Virtual IP Address) address for the Cluster.

11.11.11.11 = s100
22.22.22.22 = s101
99.99.99.99 = vip

Before we start you should know the following points.

1. You need a RedHat Account at https://rhn.redhat.com and have a copy of RHEL 6.x and the High Availability Add-On.
2. If you don't have RedHat, use CentOS 6.x or Scientificlinux 6.x and their repositories for getting the files.
3. You know network basics.
4. You know unix basics.
5. You know how to look at log files (very important when you are debugging).
6. You know RHEL 6.x / CentOS / Scientificlinux basics.
7. You know how Wowza configuration works.
8. Everything explained below assumes that you do all actions on both hosts. The only thing that's not executed on both hosts are the crm tool or cibadmin.

All checked? Let's start!

First, in your /etc/hosts file add the following:

11.11.11.11 s100
22.22.22.22 s101
99.99.99.99 vip

This is needed because when you have DNS outage, the names are still resolvable.

After this execute the following:

yum install -y corosync pacemaker
chkconfig pacemaker on
chkconfig corosync on
chown -R hacluster /var/log/cluster

After doing this, create a new file at /etc/corosync/corosync.conf and paste:

# Please read the corosync.conf.5 manual page
compatibility: whitetank

totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 11.11.11.11
mcastaddr: 226.94.1.1
mcastport: 5405
}
}

logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: yes
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}

amf {
mode: disabled
}

The only thing you need to change are:

bindnetaddr: 11.11.11.11
mcastaddr: 226.94.1.1
mcastport: 5405

Now we are doing two nodes, so you need to do this on the second node:

bindnetaddr: 22.22.22.22
mcastaddr: 226.94.1.1
mcastport: 5405

When done, you need to create another file /etc/corosync/service.d/pcmk and paste:

service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}

After these actions, please first check if you are running IPTables on your machine. If you do, you need the following rules to allow access between the hosts to communicate.

iptables -A INPUT -m udp -p udp -s 11.11.11.11 --dport 5405 -j ACCEPT
iptables -A INPUT -m udp -p udp -s 22.22.22.22 --dport 5405 -j ACCEPT
iptables -A INPUT -m udp -p udp -s 226.94.1.1 --dport 5405 -j ACCEPT

Of course if you are a experienced IPTables user, simply adjust the rules how you see fit. The important thing to know is that UDP port 5405 is able to communicate to both hosts + the multicast address. Also this needs to be done on BOTH hosts.

After setting up iptables or simply disabling it, start the corosync process and pacemaker.

service corosync start
service pacemaker start

You should be able to logging in /var/log/messages or /var/log/cluster/corosync.log. The interesting thing in these logs is to look if you have no big errors. You should be seeing the error:

ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues.

This is correct, because we have not setup the crm config file yet! The basics are ready, now we are going to install Wowza Server itself and Java. That's quite easy (again on both nodes).

Download Java from http://www.oracle.com/technetwork/java/javase/downloads/index.html and pick the Java 7 JDK package. Why JDK? Well you could also choose JRE but with Wowza some functions will not work out. After downloaded the right java package install it.

yum localinstall jdk-7u3-linux-x64.rpm

Next is Wowza.

wget http://www.wowza.com/downloads/WowzaMediaServer-3-1-0/WowzaMediaServer-3.1.0.rpm.bin
chmod 755 WowzaMediaServer-3.1.0.rpm.bin
./WowzaMediaServer-3.1.0.rpm.bin (and say yes)
/usr/local/WowzaMediaServer/bin/startup.sh (and enter serial key and CTRL+C after startup)

Now Wowza is installed, setup the /usr/local/WowzaMediaServer/conf/VHost.xml file that it only binds to the VIP address.

As example, in the VHost.xml file you can search on IpAddress. There you should add the VIP address, which is 99.99.99.99 in this case.

After Wowza is installed, you will get a startup script from Wowza in /etc/init.d/WowzaMediaServer. This script is unfortunately not LSB compliant. I have made one small adjustment in this startup script. Just execute the following.

vi /etc/init.d/WowzaMediaServer

When in edit mode, search for:

localstatus() {
if [ -f $WMSLOCK_FILE ]; then
echo "$WMSBASE_NAME started"
else
echo "$WMSBASE_NAME stopped"
fi
RETVAL=0
}

And change it to this:

localstatus() {
if [ -f $WMSLOCK_FILE ]; then
echo "$WMSBASE_NAME started"
RETVAL=0
else
echo "$WMSBASE_NAME stopped"
RETVAL=3
fi
}

After editing these lines, we make another script called /etc/init.d/WowzaMediaServer2 and put this piece of code in the script:

#!/bin/bash
#
# WowzaMediaServer2 Startup script for Wowza
#
# chkconfig: - 85 15
# description: Wowza Streaming Server
# processname: WowzaMediaServerd
#
### BEGIN INIT INFO
# Provides: WowzaMediaServerd
# Required-Start: $local_fs $remote_fs $network $named
# Required-Stop: $local_fs $remote_fs $network
# Should-Start: distcache
# Short-Description: start and stop Wowza
# Description: Wowza Streaming Server
### END INIT INFO

# Source function library.
. /etc/rc.d/init.d/functions

prog=WowzaMediaServer
pidfile=${PIDFILE-/var/run/wowza/wowza.pid}
lockfile=${LOCKFILE-/var/lock/subsys/wowza}
RETVAL=0
STOP_TIMEOUT=${STOP_TIMEOUT-10}

start() {
echo -n $"Starting $prog: "
/etc/init.d/WowzaMediaServer start
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch ${lockfile}
return $RETVAL
}

stop() {
echo -n $"Stopping $prog: "
/etc/init.d/WowzaMediaServer stop
RETVAL=$?
echo
[ $RETVAL = 0 ] && rm -f ${lockfile} ${pidfile}
}

# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
status)
/etc/init.d/WowzaMediaServer status
RETVAL=$?
;;
restart)
stop
start
;;
*)
echo $"Usage: $prog {start|stop|restart|status}"
RETVAL=2
esac

exit $RETVAL

This script will ensure everything is working correctly with the LSB standards, but still using the /etc/init.d/WowzaMediaServer script. This is needed for crm because if the script is not LSB compliant, you get strange errors and situations with failover. Next we are going to edit the CRM configuration.

crm configure edit

You will get the standard editor in front of you (in my case vi but could be nano as well), and paste the following (When you do edit, some things are already configured. Just leave those entries what they are, and add the extra's described below.). Please notice that you replace the 99.99.99.99 with your VIP address.

node s100 \
attributes standby="off"
node s101 \
attributes standby="off"
primitive VIP ocf:heartbeat:IPaddr \
params ip="99.99.99.99" \
op monitor interval="10s"
primitive WOWZA lsb:WowzaMediaServer2 \
op monitor interval="10s"
group StreamingCluster VIP WOWZA
colocation vip_with_wowza inf: VIP WOWZA
order wowza_after_vip inf: VIP WOWZA
property $id="cib-bootstrap-options" \
dc-version="1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"

If you edited the configuration correctly, you should see no error when saving the document. To check if everything is correctly saved in the configuration, you execute:

crm configure show

Now you need to verify if everything is working correctly. First, check the CRM monitor, to verify if you're cluster is working properly.

crm_mon -1

You should be seeing something simular like this:

============
Last updated: Wed Apr 18 13:44:51 2012
Last change: Thu Apr 12 11:12:36 2012 via crm_attribute on s100
Stack: openais
Current DC: s101 - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ s100 s101 ]

Resource Group: StreamingCluster
VIP (ocf::heartbeat:IPaddr): Started s100
WOWZA (lsb:WowzaMediaServer2): Started s100

What you see here is that the s100 host is doing all the heavy lifting right now for Wowza Services. So let's simulate a failover. That's quite easy achieved by doing the following:

crm node standby s100

When failover is working correctly, you should be seeing this output.

crm_mon -1
============
Last updated: Wed Apr 18 13:48:45 2012
Last change: Wed Apr 18 13:48:40 2012 via crm_attribute on s100
Stack: openais
Current DC: s101 - partition with quorum
Version: 1.1.6-3.el6-a02c0f19a00c1eb2527ad38f146ebc0834814558
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Node s100: standby
Online: [ s101 ]

Resource Group: StreamingCluster
VIP (ocf::heartbeat:IPaddr): Started s101
WOWZA (lsb:WowzaMediaServer2): Started s101

Congratualations, you have successfully made a failover to the other node, taking over the Wowza Services + VIP address. Now you can take the first node online again.

crm node online s100

After this you will see that it's started, but not taking over the services from s101. This is correct! This setup prevents failing back to the first node. This is on purpose. Because in a worst case scenario it can happen that you have a flapping network card on s100. Then the cluster would be making a failover every time it flaps. That's not what we want of course. In case you want to failover again to s100, you simply move the resource back. Example below:

crm resource move StreamingCluster s100

It's now running again on s100 instead of s101.

Now in case the above crm configurations is giving errors, you could start all over again. This can be achieved by executing the following:

cibadmin -E --force

NOTICE, this will delete the complete CRM setup. So be careful when executing this command. Below we have some handy commands which can be very handy.

cibadmin -Q : List the xml configuration of the cluster
cibadmin -Q > cib-backup.xml : Save the configuration of the cluster in the given file
crm_verifiy -LV : Check the xml for errors or orphaned resources

This is the end of the Howto. Many things above are found be just using Google, and wrapping this up. Most stuff I have used are listed below:

http://www.clusterlabs.org/wiki/Documentation

Permalink

Howto use Grub, Raid0, LVM2 and Fake Raid / dmraid (Silicon Image, Inc. SiI 3114) under Fedora, Redhat and Linux in general.

11/03/09 | by coolzero [mail] | Categories: Just a story...

Hi there,
I know I haven't been posting alot lately, but I came accross some weird problem with fake raid on my Silicon Image 3114 controller. I was trying to reinstall the mbr with grub, because I had installed Windows on the machine as well, on another partition. Normally, when using grub via the cli, you get something like this:

[me@somehost ~]$ grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub>

Now normally you could do this when using some kind of rescue cd.

grub> root (hd0,1)
grub> setup (hd0)

But unfortunately this simply does not work with fake raid, because when using something like RAID 0, it gives errors that it cannot find the partition. I was really searching for two hours long why it couldn't find my disk layout. Everything was ok, even the device.map. It looked like this:

(hd0) /dev/mapper/sil_agababbifhbg

So how did I fix this problem? The solution lies within grub and pointing out that you give the right geometry of the harddisk layout of your raid 0 setup. It sounds weird, but it worked just great. Just do this:

cfdisk /dev/mapper/sil_agababbifhbg

In my case the output was like this:

Disk Drive: /dev/mapper/sil_agababbifhbg
Size: 148709441536 bytes, 148.7 GB
Heads: 255 Sectors per Track: 63 Cylinders: 18079

As you can see, my heads are 255, sectors are 63, and cylinders are 18079. These are just the settings I need to get around the problem. Now let's get grub going.

[me@somehost ~]$ grub
Probing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)

[ Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists the possible
completions of a device/filename.]
grub> grub --device-map=/dev/null
grub> device (hd0) /dev/mapper/sil_agababbifhbg
grub> geometry (hd0) 18079 255 63
grub> root (hd0,1)
grub> setup (hd0,1)
grub> setup (hd0)

And voila! It worked! It's more a bug this problem, which I also found online, with many people having the same issues.

Look at this post for instance!

Anyway, I hope that the search engines are going to archive this correctly, so many people can be helped with this problem. Any comments are welcome of course!

Permalink

Imtek the Pimptek

11/04/08 | by coolzero [mail] | Categories: Links

This site is of a friend of mine. Cool dude, with nice *nix skills.

http://www.imtek.nl/

Permalink

RHEL 5.2 (CentOS) and the LSI / Symbios Logic SAS1068E Checking RAID Status

06/05/08 | by coolzero [mail] | Categories: Just a story...

Well, today, I just wanted to know if there was another and also a much smaller utility for monitoring the raid status of my Dell 2950 server. Normally with the tools of Dell with OpenManage you can do it also. But that's a big package, and I wanted a light package cli for this.

Fortunately, somebody made that package. And best of all, it's open source! So this is the deal with Redhat Enterprise 5.2 (CentOS).

First, we install the kernel sources, that are needed to compile the package.

rpm -ivh kernel-2.6.18-92.1.1.el5.src.rpm

Than we need to unpack the packages of the kernel and make it ready to unpack the sources.

rpmbuild -bp --target=$(uname -m) \
/usr/src/redhat/SPECS/kernel-2.6.spec

After waiting you must make a symbolic link like this:

ln -s /usr/src/redhat/BUILD/kernel-2.6.18/linux-2.6.18.x86_64 linux

Now we need to download the package of the CLI and unpack it.

wget http://www.drugphish.ch/~ratz/mpt-status/mpt-status-1.2.0.tar.gz
tar -zxf mpt-status-1.2.0.tar.gz
cd mpt-status-1.2.0
vi mpt-status.h

Now in VI you need to change the following line:

#include linux/compiler.h
to
#include /usr/src/redhat/BUILD/kernel-2.6.18/linux-2.6.18.x86_64 \
/include/linux/compiler.h

Next compile the program:

make

And with no errors, you got a working mpt-status in this directory. Now let's check it out if it works!

./mpt-status
ioc0 vol_id 0 type IM, 2 phy, 231 GB, state OPTIMAL, flags ENABLED
ioc0 phy 3 scsi_id 9 ATA WDC WD2500YS-18S 6C07, 232 GB, state ONLINE, flags NONE
ioc0 phy 2 scsi_id 1 ATA WDC WD2500YS-18S 6C07, 232 GB, state ONLINE, flags NONE

./mpt-status -n -s
vol_id:0 OPTIMAL
phys_id:3 ONLINE
phys_id:2 ONLINE
spare_id:0 ONLINE
spare_id:1 ONLINE
scsi_id:3 100%
scsi_id:2 100%

It works! These two examples are simple. The first one is just showing what's running, the other one is showing what is running, but also shows hot-spares and even more. If you do not use the "-s" you will see everything what's needed.

I hope I help some souls with this example! Good luck!

-CoolZero

Permalink

Drac 5 / RHEL 5.2 (CentOS) Xen based and Serial Console Redirection

06/04/08 | by coolzero [mail] | Categories: Just a story...

So, it's Sunday, it also is a rainy day, and you think, hell, let's get to work with a nice Dell 2950 server with Drac 5 in it's system. Now I would like to make a simple console redirection, for the simplicity of remote administration of the machine.

Now you think, hell, that's a easy job, and normally, it really is! But I have spend almost for 2 hours to get this working. So let's see what steps I have taken to get the darn thing moving.

First, install on the server Dell's Openmanage, or just do like I did, just some setup in the bios. With Dell's Openmanage you can do this very easy, just type in the following:

omconfig chassis biossetup attribute=extserial setting=rad
omconfig chassis biossetup attribute=fbr setting=57600
omconfig chassis biossetup attribute=serialcom setting=com2
omconfig chassis biossetup attribute=crab setting=enabled

In your bios setup, just get to the console redirection options, and use the values like here:

BIOS:
on with console redirection via com2
remote access device
57600
vt100/vt220
enabled

Alright, second, get in your Drac 5 and set it up so it can do ssh. When done, connect with ssh, and type in the following:

racadm config -g cfgSerial -o cfgSerialBaudRate 57600
racadm config -g cfgSerial -o cfgSerialConsoleEnable 1
racadm config -g cfgSerial -o cfgSerialHistorySize 2000

These command's simply set the right speed + history size of the console redirection. This was really the easy part. Now the part where I got stuck. I saw everything booting in the serial redirection console, but when the operating system started, it simply did not show anything anymore. The problem was that Xen has a own startup line to get the console going. To get everything working we need to edit the following files:

/boot/grub/grub.conf
/etc/inittab

In grub.conf it looks like this:

serial --unit=0 --speed=57600
terminal --timeout=10 console serial
default=0
timeout=5
title Red Hat Enterprise Linux Server (2.6.18-92.el5xen)
root (hd0,0)
kernel /xen.gz-2.6.18-92.el5 console=tty0 console=com2 \ com2=57600,8n1
module /vmlinuz-2.6.18-92.el5xen ro root=LABEL=/1 xencons=ttyS1 \ console=ttyS1
module /initrd-2.6.18-92.el5xen.img

And in inittab the following line needs to be added:

# Run gettys in standard runlevels
s1:2345:respawn:/sbin/agetty ttyS1 57600

Now we have everything to get the serial console redirection working. Now let's test it. Get to ssh of your Drac 5, and type in the following:

connect com2

To disconnect use "[CTRL]+[\]" (Press the Control key and the backslash key together to disconnect cleanly from the connection.)

If it says the port is in use by another user that probably means the connection was not cleanly terminated. Best way to clear that up is to reset the drac card with the following command:

racadm racreset

You should see something in your screen. To be sure this is working at all, reboot your server, you really should get the starting up on your screen. If not, then you got already something wrong at your bios setup! If you see your system starting but you don't see your booting process of the os, you need to check the settings of grub.conf, and maybe inittab if you see the booting, but no login screen.

I hope I help some people who are experiencing the same problems I had. For comments, you can mail me at jim (the monkey tale) coolzero dot info.

-CoolZero

Permalink

Pages: 1 2 3 4 5 6 >>

May 2012
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Search

XML Feeds

powered by free blog software