Simplest Enterprise Continuous Integration Solutions

Monday, May 30, 2011

Enterprise Linux - SGE automated installation

Requirement:
Setup a Grid Engine system as a single cluster with one qmaster host (i.e. linux64-server) and multiple execution hosts (i.e. linux64-client1 linux64-cliennt2)
Note: $SGE_CELL is not set, and the value default is assumed for the cell value.

Prerequisites:
1. Use a network installation directory (root user has read/write access). Network-accessible on all hosts (Grid Engine qmaster host, execution hosts)
i.e.
NFS Network file system configured with a mount point (i.e. /opt/SGE),  to which all hosts have direct access.$SGE_ROOT files could be put onto this network-wide file system
2. Add a dedicated Grid Engine admin user account, who needs read/write access to the network installation directory from all hosts with same id
i.e.
thru NIS,  useradd -u 5039 sgeadmin
3. Using local spooling for Grid Engine master host and execution hosts
i.e. 
DB_SPOOLING_DIR           /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb
QMASTER_SPOOL_DIR         /opt/SGE_LOCAL_SPOOL/default/spool/master
EXECD_SPOOL_DIR           /opt/SGE_LOCAL_SPOOL/default/spool
EXECD_SPOOL_DIR_LOCAL     /opt/SGE_LOCAL_SPOOL/default/spool
Note: make sure /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb and /opt/SGE/default not exist before automated installation
4. Configure Grid Engine TCP/IP communication services are sge_execd and sge_qmaste
i.e.
sge_qmaster     6444/tcp
sge_execd       6445/tcp

Deployment:
Deploy SGE installation files under $SGE_ROOT with the dedicated Grid Engine admin user as owner of this directory
i.e.
cd /opt
gzip -dc /tmp/SGE62base.tar.gz | tar xvpf -
gzip -dc /tmp/SGEcell.tar.gz | tar xvpf -
chown –R  sgeadmin /opt/SGE

Configuration:
Get a copy of sge install config file from $SGE_ROOT/util/install_modules/inst_template.conf, and complete the configuration
i.e.
cp $SGE_ROOT/util/install_modules/inst_template.conf $SGE_ROOT/util/install_modules/sge_inst.conf

#-------------------------------------------------
# SGE default configuration file
#-------------------------------------------------

# Use always fully qualified pathnames, please

# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
#SGE_ROOT="Please enter path"
SGE_ROOT="/opt/SGE"

# SGE_QMASTER_PORT is used by qmaster for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_QMASTER_PORT="Please enter port"
SGE_QMASTER_PORT="6444"

# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_EXECD_PORT="Please enter port"
SGE_EXECD_PORT="6445"

# SGE_ENABLE_SMF
# if set to false SMF will not control SGE services
SGE_ENABLE_SMF="false"

# SGE_ENABLE_ST
# if set to false Sun Service Tags will not be used
SGE_ENABLE_ST="true"

# SGE_CLUSTER_NAME
# Name of this cluster (used by SMF as an service instance name)
#SGE_CLUSTER_NAME="Please enter cluster name"
SGE_CLUSTER_NAME="my_test"

# SGE_JMX_PORT is used by qmasters JMX MBean server
# mandatory if install_qmaster -jmx -auto
# range: 1024-65500
SGE_JMX_PORT="Please enter port"

# SGE_JMX_SSL is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true, the mbean server connection uses
# SSL authentication
SGE_JMX_SSL="false"

# SGE_JMX_SSL_CLIENT is used by qmasters JMX MBean server
# if SGE_JMX_SSL_CLIENT=true, the mbean server connection uses
# SSL authentication of the client in addition
SGE_JMX_SSL_CLIENT="false"

# SGE_JMX_SSL_KEYSTORE is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true the server keystore found here is used
# e.g. /var/sgeCA/port//private/keystore
SGE_JMX_SSL_KEYSTORE="Please enter absolute path of server keystore file"

# SGE_JMX_SSL_KEYSTORE_PW is used by qmasters JMX MBean server
# password for the SGE_JMX_SSL_KEYSTORE file
SGE_JMX_SSL_KEYSTORE_PW="Please enter the server keystore password"

# SGE_JVM_LIB_PATH is used by qmasters jvm thread
# path to libjvm.so
# if value is missing or set to "none" JMX thread will not be installed
# when the value is empty or path does not exit on the system, Grid Engine
# will try to find a correct value, if it cannot do so, value is set to
# "jvmlib_missing" and JMX thread will be configured but will fail to start
#SGE_JVM_LIB_PATH="Please enter absolute path of libjvm.so"

# SGE_ADDITIONAL_JVM_ARGS is used by qmasters jvm thread
# jvm specific arguments as -verbose:jni etc.
# optional, can be empty
SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"

# CELL_NAME, will be a dir in SGE_ROOT, contains the common dir
# Please enter only the name of the cell. No path, please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"

# ADMIN_USER, if you want to use a different admin user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir will be used as admin user
#ADMIN_USER=""
ADMIN_USER="sgeadmin"

# The dir, where qmaster spools this parts, which are not spooled by DB
#(mandatory for qmaster installation)
#QMASTER_SPOOL_DIR="Please, enter spooldir"
QMASTER_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool/qmaster"

# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to use
# berkeley db spooling. Only cluster configuration and jobs will
# be spooled in the database. The execution daemon still needs a spool
# directory 
#(mandatory for qmaster installation)
#EXECD_SPOOL_DIR="Please, enter spooldir"
EXECD_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool"

# For monitoring and accounting of jobs, every job will get
# unique GID. So you have to enter a free GID Range, which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
#GID_RANGE="Please, enter GID range"
GID_RANGE="20000-20100"

# If SGE is compiled with -spool-dynamic, you have to enter here, which
# spooling method should be used. (classic or berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"

# Name of the Server, where the Spooling DB is running on
# if spooling methode is berkeleydb, it must be "none", when
# using no spooling server and it must contain the servername
# if a server should be used. In case of "classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"

# The dir, where the DB spools
# If berkeley db spooling is used, it must contain the path to
# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb)
# Remember, this directory must be local on the qmaster host or on the
# Berkeley DB Server host. No NFS mount, please
#DB_SPOOLING_DIR="spooldb"
DB_SPOOLING_DIR="/opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb"

# This parameter set the number of parallel installation processes.
# The prevent a system overload, or exeeding the number of open file
# descriptors the user can limit the number of parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20", maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"

# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
#ADMIN_HOST_LIST="host1 host2 host3 host4"
ADMIN_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"

# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
#SUBMIT_HOST_LIST="host1 host2 host3 host4"
SUBMIT_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"

# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add all of your hosts
# by hand, after the installation. The autoinstallation works without
# any entry
# (mandatory for execution host installation)
#EXEC_HOST_LIST="host1 host2 host3 host4"
EXEC_HOST_LIST="linux64-client1 linux64-cliennt2"

# The dir, where the execd spools (local configuration)
# If you want configure your execution daemons to spool in
# a local directory, you have to enter this directory here.
# If you do not want to configure a local execution host spool directory
# please leave this empty
#EXECD_SPOOL_DIR_LOCAL="Please, enter spooldir"
EXECD_SPOOL_DIR_LOCAL="/opt/SGE_LOCAL_SPOOL/default/spool"

# If true, the domainnames will be ignored, during the hostname resolving
# if false, the fully qualified domain name will be used for name resolving
HOSTNAME_RESOLVING="true"

# Shell, which should be used for remote installation (rsh/ssh)
# This is only supported, if your hosts and rshd/sshd is configured,
# not to ask for a password, or promting any message.
SHELL_NAME="ssh"

# This remote copy command is used for csp installation.
# The script needs the remote copy command for distributing
# the csp certificates. Using ssl the command scp has to be entered,
# using  the not so secure rsh the command rcp has to be entered.
# Both need a passwordless ssh/rsh connection to the hosts, which
# should be connected to. (mandatory for csp installation mode)
COPY_COMMAND="scp"

# Enter your default domain, if you are using /etc/hosts or NIS configuration
#DEFAULT_DOMAIN="none"

# If a job stops, fails, finish, you can send a mail to this adress
ADMIN_MAIL="none"

# If true, the rc scripts (sgemaster, sgeexecd, sgebdb) will be added,
# to start automatically during boottime
#ADD_TO_RC="false"
ADD_TO_RC="true"

#If this is "true" the file permissions of executables will be set to 755
#and of ordenary file to 644. 
SET_FILE_PERMS="true"

# This option is not implemented, yet.
# When a exechost should be uninstalled, the running jobs will be rescheduled
RESCHEDULE_JOBS="wait"

# Enter a one of the three distributed scheduler tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"

# The name of the shadow host. This host must have read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter the servername
# (mandatory for shadowhost installation)
#SHADOW_HOST="hostname"

# Remove this execution hosts in automatic mode
# (mandatory for unistallation of execution hosts)
#EXEC_HOST_LIST_RM="host1 host2 host3 host4"
EXEC_HOST_LIST_RM="linux64-client1 linux64-cliennt2"

# This option is used for startup script removing.
# If true, all rc startup scripts will be removed during
# automatic deinstallation. If false, the scripts won't
# be touched.
# (mandatory for unistallation of execution/qmaster hosts)
#REMOVE_RC="false"
REMOVE_RC="true"

# This is a Windows specific part of the auto isntallation template
# If you going to install windows executions hosts, you have to enable the
# windows support. To do this, please set the WINDOWS_SUPPORT variable
# to "true". ("false" is disabled)
# (mandatory for qmaster installation, by default WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"

# Enabling the WINDOWS_SUPPORT, recommends the following parameter.
# The WIN_ADMIN_NAME will be added to the list of SGE managers.
# Without adding the WIN_ADMIN_NAME the execution host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator" which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation, if windows hosts should be installed)
WIN_ADMIN_NAME="Administrator"

# This parameter is used to switch between local ADMINUSER and Windows
# Domain Adminuser. Setting the WIN_DOMAIN_ACCESS variable to true, the
# Adminuser will be a Windows Domain User. It is recommended that
# a Windows Domain Server is configured and the Windows Domain User is
# created. Setting this variable to false, the local Adminuser will be
# used as ADMINUSER. The install script tries to create this user account
# but we recommend, because it will be saver, to create this user,
# before running the installation.
# (mandatory for qmaster installation, if windows hosts should be installed)
WIN_DOMAIN_ACCESS="false"

# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each installtion, if true.
# In case of false, the certs will be created, if not existing.
# Existing certs won't be overwritten. (mandatory for csp install)
#CSP_RECREATE="true"
CSP_RECREATE="false"

# The created certs won't be copied, if this option is set to false
# If true, the script tries to copy the generated certs. This
# requires passwordless ssh/rsh access for user root to the
# execution hosts
CSP_COPY_CERTS="false"

# csp information, your country code (only 2 characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"

# your state (mandatory for csp install)
CSP_STATE="Germany"

# your location, eg. the building (mandatory for csp install)
CSP_LOCATION="Building"

# your arganisation (mandatory for csp install)
CSP_ORGA="Organisation"

# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"

# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"

Installation:
1. install qmaster host locally
i.e.
logon the "ToBe" qmaster host
cd $SGE_ROOT
./inst_sge -m -auto /opt/SGE/util/install_modules/sge_inst.conf

2. verify qmaster installation log
cat /opt/SGE/default/common/install_logs/qmaster_install_mt-linux64-server_2010-05-25_06:17:35.log
Starting qmaster installation!



Installing Grid Engine as user >sgeadmin<



Your $SGE_ROOT directory: /opt/SGE

Using SGE_QMASTER_PORT >6444<.

Using SGE_EXECD_PORT >6445<.

Using >default< as CELL_NAME.



Your $SGE_CLUSTER_NAME: my_test


Using >/opt/SGE_LOCAL_SPOOL/default/spool/qmaster< as QMASTER_SPOOL_DIR.

Verifying and setting file permissions and owner in >3rd_party<
Verifying and setting file permissions and owner in >bin<
Verifying and setting file permissions and owner in >ckpt<
Verifying and setting file permissions and owner in >dtrace<
Verifying and setting file permissions and owner in >examples<
Verifying and setting file permissions and owner in >inst_sge<
Verifying and setting file permissions and owner in >install_execd<
Verifying and setting file permissions and owner in >install_qmaster<
Verifying and setting file permissions and owner in >lib<
Verifying and setting file permissions and owner in >mpi<
Verifying and setting file permissions and owner in >pvm<
Verifying and setting file permissions and owner in >qmon<
Verifying and setting file permissions and owner in >util<
Verifying and setting file permissions and owner in >utilbin<
Verifying and setting file permissions and owner in >start_gui_installer<
Verifying and setting file permissions and owner in >catman<
Verifying and setting file permissions and owner in >doc<
Verifying and setting file permissions and owner in >include<
Verifying and setting file permissions and owner in >man<

Your file permissions were set

Using >true< as IGNORE_FQDN_DEFAULT.
If it's >true<, the domain name will be ignored.

Making directories


Setting spooling method to dynamic

Dumping bootstrapping information
Initializing spooling database

Using >20000-20100< as gid range.
Using >/opt/SGE_LOCAL_SPOOL/default/spool< as EXECD_SPOOL_DIR.
Using >none< as ADMIN_MAIL.
Adding default parallel environments (PE)

cp /opt/SGE/default/common/sgemaster /etc/init.d/sgemaster.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgemaster.my_test

   starting sge_qmaster

Adding ADMIN_HOST linux64-server
adminhost "linux64-server" already exists
Adding ADMIN_HOST linux64-client1
linux64-client1 added to administrative host list
Adding ADMIN_HOST linux64-client2
linux64-client2 added to administrative host list


Creating the default queue and hostgroup
root@linux64-server added "@allhosts" to host group list
root@linux64-server added "all.q" to cluster queue list

Setting scheduler configuration to >Normal< setting!
changed scheduler configuration
sge_qmaster successfully installed!
3. configure qmaster host
# modify hosts group 
i.e. need for qmaster host add into @allhosts
4. install execution host locally
i.e.
logon the "ToBe" execution host
cd $SGE_ROOT
./inst_sge -m -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf
5. verify execution host installation log
i.e.
cat /opt/SGE/default/common/install_logs/execd_install_linux64-client1_2010-05-26_15:48:46.log
Your $SGE_ROOT directory: /opt/SGE


Using cell: >default<


Using local execd spool directory [/opt/SGE_LOCAL_SPOOL/default/spool]



Creating local configuration for host >linux64-client1<

sgeadmin@linux64-client added "linux64-client1" to configuration list

Local configuration for host >linux64-client1< created.


Host >linux64-client1< already in submit host list!
Host >linux64-client2< already in submit host list!

cp /opt/SGE/default/common/sgeexecd /etc/init.d/sgeexecd.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgeexecd.my_test

   starting sge_execd
6. (optimized) install multiple execution hosts from qmaster host
Note: The root user must be able to access all hosts through ssh without supplying a password
i.e.
root uses SSH public key authentication on all hosts
logon the qmaster host after SGE qmaster installation completed
cat /tmp/execution_hosts_list
linux64_client1
linux64_client2


for host in `cat /tmp/execution_hosts_list`;do ssh root@$host "cd /opt/SGE;./inst_sge -x -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf"; done


4 comments:

  1. Hi! Really helped! But should we copy config file to each exec node first before installing SGE ??

    ReplyDelete
  2. You will find that there is an administration for oddity identification in Azure commercial center that you can call to follow any inconsistencies the telemetry information from the servers. machine learning course in pune

    ReplyDelete
  3. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more. what asset is used to build a remarketing list?

    ReplyDelete