Simplest Enterprise Continuous Integration Solutions

Monday, May 30, 2011

Enterprise Linux - SGE automated installation

Requirement:
Setup a Grid Engine system as a single cluster with one qmaster host (i.e. linux64-server) and multiple execution hosts (i.e. linux64-client1 linux64-cliennt2)
Note: $SGE_CELL is not set, and the value default is assumed for the cell value.

Prerequisites:
1. Use a network installation directory (root user has read/write access). Network-accessible on all hosts (Grid Engine qmaster host, execution hosts)
i.e.
NFS Network file system configured with a mount point (i.e. /opt/SGE),  to which all hosts have direct access.$SGE_ROOT files could be put onto this network-wide file system
2. Add a dedicated Grid Engine admin user account, who needs read/write access to the network installation directory from all hosts with same id
i.e.
thru NIS,  useradd -u 5039 sgeadmin
3. Using local spooling for Grid Engine master host and execution hosts
i.e. 
DB_SPOOLING_DIR           /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb
QMASTER_SPOOL_DIR         /opt/SGE_LOCAL_SPOOL/default/spool/master
EXECD_SPOOL_DIR           /opt/SGE_LOCAL_SPOOL/default/spool
EXECD_SPOOL_DIR_LOCAL     /opt/SGE_LOCAL_SPOOL/default/spool
Note: make sure /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb and /opt/SGE/default not exist before automated installation
4. Configure Grid Engine TCP/IP communication services are sge_execd and sge_qmaste
i.e.
sge_qmaster     6444/tcp
sge_execd       6445/tcp

Deployment:
Deploy SGE installation files under $SGE_ROOT with the dedicated Grid Engine admin user as owner of this directory
i.e.
cd /opt
gzip -dc /tmp/SGE62base.tar.gz | tar xvpf -
gzip -dc /tmp/SGEcell.tar.gz | tar xvpf -
chown –R  sgeadmin /opt/SGE

Configuration:
Get a copy of sge install config file from $SGE_ROOT/util/install_modules/inst_template.conf, and complete the configuration
i.e.
cp $SGE_ROOT/util/install_modules/inst_template.conf $SGE_ROOT/util/install_modules/sge_inst.conf

#-------------------------------------------------
# SGE default configuration file
#-------------------------------------------------

# Use always fully qualified pathnames, please

# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
#SGE_ROOT="Please enter path"
SGE_ROOT="/opt/SGE"

# SGE_QMASTER_PORT is used by qmaster for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_QMASTER_PORT="Please enter port"
SGE_QMASTER_PORT="6444"

# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_EXECD_PORT="Please enter port"
SGE_EXECD_PORT="6445"

# SGE_ENABLE_SMF
# if set to false SMF will not control SGE services
SGE_ENABLE_SMF="false"

# SGE_ENABLE_ST
# if set to false Sun Service Tags will not be used
SGE_ENABLE_ST="true"

# SGE_CLUSTER_NAME
# Name of this cluster (used by SMF as an service instance name)
#SGE_CLUSTER_NAME="Please enter cluster name"
SGE_CLUSTER_NAME="my_test"

# SGE_JMX_PORT is used by qmasters JMX MBean server
# mandatory if install_qmaster -jmx -auto
# range: 1024-65500
SGE_JMX_PORT="Please enter port"

# SGE_JMX_SSL is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true, the mbean server connection uses
# SSL authentication
SGE_JMX_SSL="false"

# SGE_JMX_SSL_CLIENT is used by qmasters JMX MBean server
# if SGE_JMX_SSL_CLIENT=true, the mbean server connection uses
# SSL authentication of the client in addition
SGE_JMX_SSL_CLIENT="false"

# SGE_JMX_SSL_KEYSTORE is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true the server keystore found here is used
# e.g. /var/sgeCA/port//private/keystore
SGE_JMX_SSL_KEYSTORE="Please enter absolute path of server keystore file"

# SGE_JMX_SSL_KEYSTORE_PW is used by qmasters JMX MBean server
# password for the SGE_JMX_SSL_KEYSTORE file
SGE_JMX_SSL_KEYSTORE_PW="Please enter the server keystore password"

# SGE_JVM_LIB_PATH is used by qmasters jvm thread
# path to libjvm.so
# if value is missing or set to "none" JMX thread will not be installed
# when the value is empty or path does not exit on the system, Grid Engine
# will try to find a correct value, if it cannot do so, value is set to
# "jvmlib_missing" and JMX thread will be configured but will fail to start
#SGE_JVM_LIB_PATH="Please enter absolute path of libjvm.so"

# SGE_ADDITIONAL_JVM_ARGS is used by qmasters jvm thread
# jvm specific arguments as -verbose:jni etc.
# optional, can be empty
SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"

# CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
# Please enter only the name of the cell. No path, please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"

# ADMIN_USER, if you want to use a different admin user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir will be used as admin user
#ADMIN_USER=""
ADMIN_USER="sgeadmin"

# The dir, where qmaster spools this parts, which are not spooled by DB
#(mandatory for qmaster installation)
#QMASTER_SPOOL_DIR="Please, enter spooldir"
QMASTER_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool/qmaster"

# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to use
# berkeley db spooling. Only cluster configuration and jobs will
# be spooled in the database. The execution daemon still needs a spool
# directory 
#(mandatory for qmaster installation)
#EXECD_SPOOL_DIR="Please, enter spooldir"
EXECD_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool"

# For monitoring and accounting of jobs, every job will get
# unique GID. So you have to enter a free GID Range, which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
#GID_RANGE="Please, enter GID range"
GID_RANGE="20000-20100"

# If SGE is compiled with -spool-dynamic, you have to enter here, which
# spooling method should be used. (classic or berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"

# Name of the Server, where the Spooling DB is running on
# if spooling methode is berkeleydb, it must be "none", when
# using no spooling server and it must contain the servername
# if a server should be used. In case of "classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"

# The dir, where the DB spools
# If berkeley db spooling is used, it must contain the path to
# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
# Remember, this directory must be local on the qmaster host or on the
# Berkeley DB Server host. No NFS mount, please
#DB_SPOOLING_DIR="spooldb"
DB_SPOOLING_DIR="/opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb"

# This parameter set the number of parallel installation processes.
# The prevent a system overload, or exeeding the number of open file
# descriptors the user can limit the number of parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20", maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"

# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
#ADMIN_HOST_LIST="host1 host2 host3 host4"
ADMIN_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"

# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
#SUBMIT_HOST_LIST="host1 host2 host3 host4"
SUBMIT_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"

# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
#EXEC_HOST_LIST="host1 host2 host3 host4"
EXEC_HOST_LIST="linux64-client1 linux64-cliennt2"

# The dir, where the execd spools (local configuration)
# If you want configure your execution daemons to spool in
# a local directory, you have to enter this directory here.
# If you do not want to configure a local execution host spool directory
# please leave this empty
#EXECD_SPOOL_DIR_LOCAL="Please, enter spooldir"
EXECD_SPOOL_DIR_LOCAL="/opt/SGE_LOCAL_SPOOL/default/spool"

# If true, the domainnames will be ignored, during the hostname resolving
# if false, the fully qualified domain name will be used for name resolving
HOSTNAME_RESOLVING="true"

# Shell, which should be used for remote installation (rsh/ssh)
# This is only supported, if your hosts and rshd/sshd is configured,
# not to ask for a password, or promting any message.
SHELL_NAME="ssh"

# This remote copy command is used for csp installation.
# The script needs the remote copy command for distributing
# the csp certificates. Using ssl the command scp has to be entered,
# using  the not so secure rsh the command rcp has to be entered.
# Both need a passwordless ssh/rsh connection to the hosts, which
# should be connected to. (mandatory for csp installation mode)
COPY_COMMAND="scp"

# Enter your default domain, if you are using /etc/hosts or NIS configuration
#DEFAULT_DOMAIN="none"

# If a job stops, fails, finish, you can send a mail to this adress
ADMIN_MAIL="none"

# If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
# to start automatically during boottime
#ADD_TO_RC="false"
ADD_TO_RC="true"

#If this is "true" the file permissions of executables will be set to 755
#and of ordenary file to 644. 
SET_FILE_PERMS="true"

# This option is not implemented, yet.
# When a exechost should be uninstalled, the running jobs will be rescheduled
RESCHEDULE_JOBS="wait"

# Enter a one of the three distributed scheduler tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"

# The name of the shadow host. This host must have read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter the servername
# (mandatory for shadowhost installation)
#SHADOW_HOST="hostname"

# Remove this execution hosts in automatic mode
# (mandatory for unistallation of execution hosts)
#EXEC_HOST_LIST_RM="host1 host2 host3 host4"
EXEC_HOST_LIST_RM="linux64-client1 linux64-cliennt2"

# This option is used for startup script removing.
# If true, all rc startup scripts will be removed during
# automatic deinstallation. If false, the scripts won't
# be touched.
# (mandatory for unistallation of execution/qmaster hosts)
#REMOVE_RC="false"
REMOVE_RC="true"

# This is a Windows specific part of the auto isntallation template
# If you going to install windows executions hosts, you have to enable the
# windows support. To do this, please set the WINDOWS_SUPPORT variable
# to "true". ("false" is disabled)
# (mandatory for qmaster installation, by default WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"

# Enabling the WINDOWS_SUPPORT, recommends the following parameter.
# The WIN_ADMIN_NAME will be added to the list of SGE managers.
# Without adding the WIN_ADMIN_NAME the execution host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator" which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation, if windows hosts should be installed)
WIN_ADMIN_NAME="Administrator"

# This parameter is used to switch between local ADMINUSER and Windows
# Domain Adminuser. Setting the WIN_DOMAIN_ACCESS variable to true, the
# Adminuser will be a Windows Domain User. It is recommended that
# a Windows Domain Server is configured and the Windows Domain User is
# created. Setting this variable to false, the local Adminuser will be
# used as ADMINUSER. The install script tries to create this user account
# but we recommend, because it will be saver, to create this user,
# before running the installation.
# (mandatory for qmaster installation, if windows hosts should be installed)
WIN_DOMAIN_ACCESS="false"

# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each installtion, if true.
# In case of false, the certs will be created, if not existing.
# Existing certs won't be overwritten. (mandatory for csp install)
#CSP_RECREATE="true"
CSP_RECREATE="false"

# The created certs won't be copied, if this option is set to false
# If true, the script tries to copy the generated certs. This
# requires passwordless ssh/rsh access for user root to the
# execution hosts
CSP_COPY_CERTS="false"

# csp information, your country code (only 2 characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"

# your state (mandatory for csp install)
CSP_STATE="Germany"

# your location, eg. the building (mandatory for csp install)
CSP_LOCATION="Building"

# your arganisation (mandatory for csp install)
CSP_ORGA="Organisation"

# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"

# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"

Installation:
1. install qmaster host locally
i.e.
logon the "ToBe" qmaster host
cd $SGE_ROOT
./inst_sge -m -auto /opt/SGE/util/install_modules/sge_inst.conf

2. verify qmaster installation log
cat /opt/SGE/default/common/install_logs/qmaster_install_mt-linux64-server_2010-05-25_06:17:35.log
Starting qmaster installation!



Installing Grid Engine as user >sgeadmin<



Your $SGE_ROOT directory: /opt/SGE

Using SGE_QMASTER_PORT >6444<.

Using SGE_EXECD_PORT >6445<.

Using >default< as CELL_NAME.



Your $SGE_CLUSTER_NAME: my_test


Using >/opt/SGE_LOCAL_SPOOL/default/spool/qmaster< as QMASTER_SPOOL_DIR.

Verifying and setting file permissions and owner in >3rd_party<
Verifying and setting file permissions and owner in >bin<
Verifying and setting file permissions and owner in >ckpt<
Verifying and setting file permissions and owner in >dtrace<
Verifying and setting file permissions and owner in >examples<
Verifying and setting file permissions and owner in >inst_sge<
Verifying and setting file permissions and owner in >install_execd<
Verifying and setting file permissions and owner in >install_qmaster<
Verifying and setting file permissions and owner in >lib<
Verifying and setting file permissions and owner in >mpi<
Verifying and setting file permissions and owner in >pvm<
Verifying and setting file permissions and owner in >qmon<
Verifying and setting file permissions and owner in >util<
Verifying and setting file permissions and owner in >utilbin<
Verifying and setting file permissions and owner in >start_gui_installer<
Verifying and setting file permissions and owner in >catman<
Verifying and setting file permissions and owner in >doc<
Verifying and setting file permissions and owner in >include<
Verifying and setting file permissions and owner in >man<

Your file permissions were set

Using >true< as IGNORE_FQDN_DEFAULT.
If it's >true<, the domain name will be ignored.

Making directories


Setting spooling method to dynamic

Dumping bootstrapping information
Initializing spooling database

Using >20000-20100< as gid range.
Using >/opt/SGE_LOCAL_SPOOL/default/spool< as EXECD_SPOOL_DIR.
Using >none< as ADMIN_MAIL.
Adding default parallel environments (PE)

cp /opt/SGE/default/common/sgemaster /etc/init.d/sgemaster.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgemaster.my_test

   starting sge_qmaster

Adding ADMIN_HOST linux64-server
adminhost "linux64-server" already exists
Adding ADMIN_HOST linux64-client1
linux64-client1 added to administrative host list
Adding ADMIN_HOST linux64-client2
linux64-client2 added to administrative host list


Creating the default queue and hostgroup
root@linux64-server added "@allhosts" to host group list
root@linux64-server added "all.q" to cluster queue list

Setting scheduler configuration to >Normal< setting!
changed scheduler configuration
sge_qmaster successfully installed!
3. configure qmaster host
# modify hosts group 
i.e. need for qmaster host add into @allhosts
4. install execution host locally
i.e.
logon the "ToBe" execution host
cd $SGE_ROOT
./inst_sge -m -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf
5. verify execution host installation log
i.e.
cat /opt/SGE/default/common/install_logs/execd_install_linux64-client1_2010-05-26_15:48:46.log
Your $SGE_ROOT directory: /opt/SGE


Using cell: >default<


Using local execd spool directory [/opt/SGE_LOCAL_SPOOL/default/spool]



Creating local configuration for host >linux64-client1<

sgeadmin@linux64-client added "linux64-client1" to configuration list

Local configuration for host >linux64-client1< created.


Host >linux64-client1< already in submit host list!
Host >linux64-client2< already in submit host list!

cp /opt/SGE/default/common/sgeexecd /etc/init.d/sgeexecd.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgeexecd.my_test

   starting sge_execd
6. (optimized) install multiple execution hosts from qmaster host
Note: The root user must be able to access all hosts through ssh without supplying a password
i.e.
root uses SSH public key authentication on all hosts
logon the qmaster host after SGE qmaster installation completed
cat /tmp/execution_hosts_list
linux64_client1
linux64_client2


for host in `cat /tmp/execution_hosts_list`;do ssh root@$host "cd /opt/SGE;./inst_sge -x -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf"; done