Setup a Grid Engine system as a single cluster with one qmaster host (i.e. linux64-server) and multiple execution hosts (i.e. linux64-client1 linux64-cliennt2)
Note: $SGE_CELL is not set, and the value default is assumed for the cell value.
Prerequisites:
1. Use a network installation directory (root user has read/write access). Network-accessible on all hosts (Grid Engine qmaster host, execution hosts)
1. Use a network installation directory (root user has read/write access). Network-accessible on all hosts (Grid Engine qmaster host, execution hosts)
i.e.
NFS Network file system configured with a mount point (i.e. /opt/SGE), to which all hosts have direct access.$SGE_ROOT files could be put onto this network-wide file system
2. Add a dedicated Grid Engine admin user account, who needs read/write access to the network installation directory from all hosts with same id
i.e.
thru NIS, useradd -u 5039 sgeadmin
3. Using local spooling for Grid Engine master host and execution hosts
i.e.
DB_SPOOLING_DIR /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb
QMASTER_SPOOL_DIR /opt/SGE_LOCAL_SPOOL/default/spool/master
EXECD_SPOOL_DIR /opt/SGE_LOCAL_SPOOL/default/spool
EXECD_SPOOL_DIR_LOCAL /opt/SGE_LOCAL_SPOOL/default/spool
Note: make sure /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb and /opt/SGE/default not exist before automated installation
4. Configure Grid Engine TCP/IP communication services are sge_execd and sge_qmaste
i.e.
sge_qmaster 6444/tcp
sge_execd 6445/tcp
Deployment:
Deploy SGE installation files under $SGE_ROOT with the dedicated Grid Engine admin user as owner of this directory
i.e.
cd /opt
gzip -dc /tmp/SGE62base.tar.gz | tar xvpf -
gzip -dc /tmp/SGEcell.tar.gz | tar xvpf -
chown –R sgeadmin /opt/SGE
Configuration:
Get a copy of sge install config file from $SGE_ROOT/util/install_modules/inst_template.conf, and complete the configuration
i.e.
cp
$SGE_ROOT/util/install_modules/inst_template.conf $SGE_ROOT/util/install_modules/sge_inst.conf
#-------------------------------------------------
# SGE default configuration file
#-------------------------------------------------
# Use always fully qualified pathnames, please
# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
#SGE_ROOT="Please
enter path"
SGE_ROOT="/opt/SGE"
# SGE_QMASTER_PORT is used by qmaster for
communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_QMASTER_PORT="Please
enter port"
SGE_QMASTER_PORT="6444"
# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_EXECD_PORT="Please
enter port"
SGE_EXECD_PORT="6445"
# SGE_ENABLE_SMF
# if set to false SMF will not control SGE services
SGE_ENABLE_SMF="false"
# SGE_ENABLE_ST
# if set to false Sun Service Tags will not be used
SGE_ENABLE_ST="true"
# SGE_CLUSTER_NAME
# Name of this cluster (used by SMF as an service
instance name)
#SGE_CLUSTER_NAME="Please
enter cluster name"
SGE_CLUSTER_NAME="my_test"
# SGE_JMX_PORT is used by qmasters JMX MBean server
# mandatory if install_qmaster -jmx -auto
# range: 1024-65500
SGE_JMX_PORT="Please enter port"
# SGE_JMX_SSL is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true, the mbean server connection
uses
# SSL authentication
SGE_JMX_SSL="false"
# SGE_JMX_SSL_CLIENT is used by qmasters JMX MBean
server
# if SGE_JMX_SSL_CLIENT=true, the mbean server
connection uses
# SSL authentication of the client in addition
SGE_JMX_SSL_CLIENT="false"
# SGE_JMX_SSL_KEYSTORE is used by qmasters JMX MBean
server
# if SGE_JMX_SSL=true the server keystore found here
is used
# e.g.
/var/sgeCA/port//private/keystore
SGE_JMX_SSL_KEYSTORE="Please enter absolute
path of server keystore file"
# SGE_JMX_SSL_KEYSTORE_PW is used by qmasters JMX
MBean server
# password for the SGE_JMX_SSL_KEYSTORE file
SGE_JMX_SSL_KEYSTORE_PW="Please enter the
server keystore password"
# SGE_JVM_LIB_PATH is used by qmasters jvm thread
# path to libjvm.so
# if value is missing or set to "none" JMX
thread will not be installed
# when the value is empty or path does not exit on
the system, Grid Engine
# will try to find a correct value, if it cannot do
so, value is set to
# "jvmlib_missing" and JMX thread will be
configured but will fail to start
#SGE_JVM_LIB_PATH="Please
enter absolute path of libjvm.so"
# SGE_ADDITIONAL_JVM_ARGS is used by qmasters jvm
thread
# jvm specific arguments as -verbose:jni etc.
# optional, can be empty
SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"
# CELL_NAME, will be a dir in SGE_ROOT, contains the
common dir
# Please enter only the name of the cell. No path,
please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"
# ADMIN_USER, if you want to use a different admin
user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir
will be used as admin user
#ADMIN_USER=""
ADMIN_USER="sgeadmin"
# The dir, where qmaster spools this parts, which
are not spooled by DB
#(mandatory for qmaster installation)
#QMASTER_SPOOL_DIR="Please,
enter spooldir"
QMASTER_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool/qmaster"
# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to
use
# berkeley db spooling. Only cluster configuration
and jobs will
# be spooled in the database. The execution daemon
still needs a spool
# directory
#(mandatory for qmaster installation)
#EXECD_SPOOL_DIR="Please,
enter spooldir"
EXECD_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool"
# For monitoring and accounting of jobs, every job
will get
# unique GID. So you have to enter a free GID Range,
which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on
one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
#GID_RANGE="Please,
enter GID range"
GID_RANGE="20000-20100"
# If SGE is compiled with -spool-dynamic, you have
to enter here, which
# spooling method should be used. (classic or
berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"
# Name of the Server, where the Spooling DB is
running on
# if spooling methode is berkeleydb, it must be
"none", when
# using no spooling server and it must contain the
servername
# if a server should be used. In case of
"classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"
# The dir, where the DB spools
# If berkeley db spooling is used, it must contain
the path to
# the spooling db. Please enter the full path. (eg.
/tmp/data/spooldb)
# Remember, this directory must be local on the
qmaster host or on the
# Berkeley DB Server host. No NFS mount, please
#DB_SPOOLING_DIR="spooldb"
DB_SPOOLING_DIR="/opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb"
# This parameter set the number of parallel
installation processes.
# The prevent a system overload, or exeeding the
number of open file
# descriptors the user can limit the number of
parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20",
maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"
# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add
all of your hosts
# by hand, after the installation. The
autoinstallation works without
# any entry
#ADMIN_HOST_LIST="host1
host2 host3 host4"
ADMIN_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"
# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add
all of your hosts
# by hand, after the installation. The
autoinstallation works without
# any entry
#SUBMIT_HOST_LIST="host1
host2 host3 host4"
SUBMIT_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"
# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add
all of your hosts
# by hand, after the installation. The
autoinstallation works without
# any entry
# (mandatory for execution host installation)
#EXEC_HOST_LIST="host1
host2 host3 host4"
EXEC_HOST_LIST="linux64-client1 linux64-cliennt2"
# The dir, where the execd spools (local
configuration)
# If you want configure your execution daemons to
spool in
# a local directory, you have to enter this
directory here.
# If you do not want to configure a local execution
host spool directory
# please leave this empty
#EXECD_SPOOL_DIR_LOCAL="Please,
enter spooldir"
EXECD_SPOOL_DIR_LOCAL="/opt/SGE_LOCAL_SPOOL/default/spool"
# If true, the domainnames will be ignored, during
the hostname resolving
# if false, the fully qualified domain name will be
used for name resolving
HOSTNAME_RESOLVING="true"
# Shell, which should be used for remote
installation (rsh/ssh)
# This is only supported, if your hosts and
rshd/sshd is configured,
# not to ask for a password, or promting any
message.
SHELL_NAME="ssh"
# This remote copy command is used for csp
installation.
# The script needs the remote copy command for
distributing
# the csp certificates. Using ssl the command scp
has to be entered,
# using the not so secure rsh the command rcp
has to be entered.
# Both need a passwordless ssh/rsh connection to the
hosts, which
# should be connected to. (mandatory for csp
installation mode)
COPY_COMMAND="scp"
# Enter your default domain, if you are using
/etc/hosts or NIS configuration
#DEFAULT_DOMAIN="none"
# If a job stops, fails, finish, you can send a mail
to this adress
ADMIN_MAIL="none"
# If true, the rc scripts (sgemaster, sgeexecd,
sgebdb) will be added,
# to start automatically during boottime
#ADD_TO_RC="false"
ADD_TO_RC="true"
#If this is "true" the file permissions of
executables will be set to 755
#and of ordenary file to 644.
SET_FILE_PERMS="true"
# This option is not implemented, yet.
# When a exechost should be uninstalled, the running
jobs will be rescheduled
RESCHEDULE_JOBS="wait"
# Enter a one of the three distributed scheduler
tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"
# The name of the shadow host. This host must have
read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter
the servername
# (mandatory for shadowhost installation)
#SHADOW_HOST="hostname"
# Remove this execution hosts in automatic mode
# (mandatory for unistallation of execution hosts)
#EXEC_HOST_LIST_RM="host1
host2 host3 host4"
EXEC_HOST_LIST_RM="linux64-client1 linux64-cliennt2"
# This option is used for startup script removing.
# If true, all rc startup scripts will be removed
during
# automatic deinstallation. If false, the scripts
won't
# be touched.
# (mandatory for unistallation of execution/qmaster
hosts)
#REMOVE_RC="false"
REMOVE_RC="true"
# This is a Windows specific part of the auto
isntallation template
# If you going to install windows executions hosts, you
have to enable the
# windows support. To do this, please set the
WINDOWS_SUPPORT variable
# to "true". ("false" is
disabled)
# (mandatory for qmaster installation, by default
WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"
# Enabling the WINDOWS_SUPPORT, recommends the
following parameter.
# The WIN_ADMIN_NAME will be added to the list of
SGE managers.
# Without adding the WIN_ADMIN_NAME the execution
host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator"
which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME
can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation, if windows
hosts should be installed)
WIN_ADMIN_NAME="Administrator"
# This parameter is used to switch between local
ADMINUSER and Windows
# Domain Adminuser. Setting the WIN_DOMAIN_ACCESS
variable to true, the
# Adminuser will be a Windows Domain User. It is
recommended that
# a Windows Domain Server is configured and the
Windows Domain User is
# created. Setting this variable to false, the local
Adminuser will be
# used as ADMINUSER. The install script tries to
create this user account
# but we recommend, because it will be saver, to
create this user,
# before running the installation.
# (mandatory for qmaster installation, if windows
hosts should be installed)
WIN_DOMAIN_ACCESS="false"
# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each
installtion, if true.
# In case of false, the certs will be created, if
not existing.
# Existing certs won't be overwritten. (mandatory
for csp install)
#CSP_RECREATE="true"
CSP_RECREATE="false"
# The created certs won't be copied, if this option
is set to false
# If true, the script tries to copy the generated
certs. This
# requires passwordless ssh/rsh access for user root
to the
# execution hosts
CSP_COPY_CERTS="false"
# csp information, your country code (only 2
characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"
# your state (mandatory for csp install)
CSP_STATE="Germany"
# your location, eg. the building (mandatory for csp
install)
CSP_LOCATION="Building"
# your arganisation (mandatory for csp install)
CSP_ORGA="Organisation"
# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"
# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"
Installation:
1. install qmaster host locally
i.e.
logon the "ToBe" qmaster host
cd $SGE_ROOT
./inst_sge -m -auto /opt/SGE/util/install_modules/sge_inst.conf
logon the "ToBe" qmaster host
cd $SGE_ROOT
./inst_sge -m -auto /opt/SGE/util/install_modules/sge_inst.conf
2. verify qmaster installation log
cat /opt/SGE/default/common/install_logs/qmaster_install_mt-linux64-server_2010-05-25_06:17:35.log
Starting qmaster installation!
cat /opt/SGE/default/common/install_logs/qmaster_install_mt-linux64-server_2010-05-25_06:17:35.log
Starting qmaster installation!
Installing Grid Engine as user >sgeadmin<
Your $SGE_ROOT directory: /opt/SGE
Using SGE_QMASTER_PORT >6444<.
Using SGE_EXECD_PORT >6445<.
Using >default< as CELL_NAME.
Your $SGE_CLUSTER_NAME: my_test
Using >/opt/SGE_LOCAL_SPOOL/default/spool/qmaster< as QMASTER_SPOOL_DIR.
Verifying and setting file permissions and owner in >3rd_party<
Verifying and setting file permissions and owner in >bin<
Verifying and setting file permissions and owner in >ckpt<
Verifying and setting file permissions and owner in >dtrace<
Verifying and setting file permissions and owner in >examples<
Verifying and setting file permissions and owner in >inst_sge<
Verifying and setting file permissions and owner in >install_execd<
Verifying and setting file permissions and owner in >install_qmaster<
Verifying and setting file permissions and owner in >lib<
Verifying and setting file permissions and owner in >mpi<
Verifying and setting file permissions and owner in >pvm<
Verifying and setting file permissions and owner in >qmon<
Verifying and setting file permissions and owner in >util<
Verifying and setting file permissions and owner in >utilbin<
Verifying and setting file permissions and owner in >start_gui_installer<
Verifying and setting file permissions and owner in >catman<
Verifying and setting file permissions and owner in >doc<
Verifying and setting file permissions and owner in >include<
Verifying and setting file permissions and owner in >man<
Your file permissions were set
Using >true< as IGNORE_FQDN_DEFAULT.
If it's >true<, the domain name will be ignored.
Making directories
Setting spooling method to dynamic
Dumping bootstrapping information
Initializing spooling database
Using >20000-20100< as gid range.
Using >/opt/SGE_LOCAL_SPOOL/default/spool< as EXECD_SPOOL_DIR.
Using >none< as ADMIN_MAIL.
Adding default parallel environments (PE)
cp /opt/SGE/default/common/sgemaster /etc/init.d/sgemaster.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgemaster.my_test
starting sge_qmaster
Adding ADMIN_HOST linux64-server
adminhost "linux64-server" already exists
Adding ADMIN_HOST linux64-client1
linux64-client1 added to administrative host list
Adding ADMIN_HOST linux64-client2
linux64-client2 added to administrative host list
Creating the default
root@linux64-server added "@allhosts" to host group list
root@linux64-server added "all.q" to cluster queue list
Setting scheduler configuration to >Normal< setting!
changed scheduler configuration
sge_qmaster successfully installed!
3. configure qmaster host
# modify hosts group
i.e. need for qmaster host add
into @allhosts
4. install execution host locally
i.e.
i.e.
logon the "ToBe" execution host
cd $SGE_ROOT
./inst_sge -m -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf
5. verify execution host installation log
i.e.
cat /opt/SGE/default/common/install_logs/execd_install_linux64-client1_2010-05-26_15:48:46.log
Your $SGE_ROOT directory: /opt/SGE
Using cell: >default<
Using local execd spool directory [/opt/SGE_LOCAL_SPOOL/default/spool]
cd $SGE_ROOT
./inst_sge -m -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf
5. verify execution host installation log
i.e.
cat /opt/SGE/default/common/install_logs/execd_install_linux64-client1_2010-05-26_15:48:46.log
Your $SGE_ROOT directory: /opt/SGE
Using cell: >default<
Using local execd spool directory [/opt/SGE_LOCAL_SPOOL/default/spool]
Creating local configuration for host >linux64-client1<
sgeadmin@linux64-client added "linux64-client1" to configuration list
Local configuration for host >linux64-client1< created.
Host >linux64-client1< already in submit host list!
Host >linux64-client2< already in submit host list!
cp /opt/SGE/default/common/sgeexecd /etc/init.d/sgeexecd.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgeexecd.my_test
starting sge_execd
6. (optimized) install multiple execution hosts from qmaster host
Note: The root user must be able to access all hosts through ssh without supplying a password
i.e.
i.e.
root uses SSH public key authentication on all hosts
logon the qmaster host after SGE qmaster installation completed
cat /tmp/execution_hosts_list
logon the qmaster host after SGE qmaster installation completed
cat /tmp/execution_hosts_list
linux64_client1
linux64_client2
for host in `cat /tmp/execution_hosts_list`;do ssh root@$host "cd /opt/SGE;./inst_sge -x -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf"; done
linux64_client2
for host in `cat /tmp/execution_hosts_list`;do ssh root@$host "cd /opt/SGE;./inst_sge -x -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf"; done