Setup a Grid Engine system as a single cluster with one qmaster host (i.e. linux64-server) and multiple execution hosts (i.e. linux64-client1 linux64-cliennt2)
Note: $SGE_CELL is not set, and the value default is assumed for the cell value.
Prerequisites:
1. Use a network installation directory (root user has read/write access). Network-accessible on all hosts (Grid Engine qmaster host, execution hosts)
1. Use a network installation directory (root user has read/write access). Network-accessible on all hosts (Grid Engine qmaster host, execution hosts)
i.e.
NFS Network file system configured with a mount point (i.e. /opt/SGE), to which all hosts have direct access.$SGE_ROOT files could be put onto this network-wide file system
2. Add a dedicated Grid Engine admin user account, who needs read/write access to the network installation directory from all hosts with same id
i.e.
thru NIS, useradd -u 5039 sgeadmin
3. Using local spooling for Grid Engine master host and execution hosts
i.e.
DB_SPOOLING_DIR /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb
QMASTER_SPOOL_DIR /opt/SGE_LOCAL_SPOOL/default/spool/master
EXECD_SPOOL_DIR /opt/SGE_LOCAL_SPOOL/default/spool
EXECD_SPOOL_DIR_LOCAL /opt/SGE_LOCAL_SPOOL/default/spool
Note: make sure /opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb and /opt/SGE/default not exist before automated installation
4. Configure Grid Engine TCP/IP communication services are sge_execd and sge_qmaste
i.e.
sge_qmaster 6444/tcp
sge_execd 6445/tcp
Deployment:
Deploy SGE installation files under $SGE_ROOT with the dedicated Grid Engine admin user as owner of this directory
i.e.
cd /opt
gzip -dc /tmp/SGE62base.tar.gz | tar xvpf -
gzip -dc /tmp/SGEcell.tar.gz | tar xvpf -
chown –R sgeadmin /opt/SGE
Configuration:
Get a copy of sge install config file from $SGE_ROOT/util/install_modules/inst_template.conf, and complete the configuration
i.e.
cp
$SGE_ROOT/util/install_modules/inst_template.conf $SGE_ROOT/util/install_modules/sge_inst.conf
#-------------------------------------------------
# SGE default configuration file
#-------------------------------------------------
# Use always fully qualified pathnames, please
# SGE_ROOT Path, this is basic information
#(mandatory for qmaster and execd installation)
#SGE_ROOT="Please
enter path"
SGE_ROOT="/opt/SGE"
# SGE_QMASTER_PORT is used by qmaster for
communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_QMASTER_PORT="Please
enter port"
SGE_QMASTER_PORT="6444"
# SGE_EXECD_PORT is used by execd for communication
# Please enter the port in this way: 1300
# Please do not this: 1300/tcp
#(mandatory for qmaster installation)
#SGE_EXECD_PORT="Please
enter port"
SGE_EXECD_PORT="6445"
# SGE_ENABLE_SMF
# if set to false SMF will not control SGE services
SGE_ENABLE_SMF="false"
# SGE_ENABLE_ST
# if set to false Sun Service Tags will not be used
SGE_ENABLE_ST="true"
# SGE_CLUSTER_NAME
# Name of this cluster (used by SMF as an service
instance name)
#SGE_CLUSTER_NAME="Please
enter cluster name"
SGE_CLUSTER_NAME="my_test"
# SGE_JMX_PORT is used by qmasters JMX MBean server
# mandatory if install_qmaster -jmx -auto
# range: 1024-65500
SGE_JMX_PORT="Please enter port"
# SGE_JMX_SSL is used by qmasters JMX MBean server
# if SGE_JMX_SSL=true, the mbean server connection
uses
# SSL authentication
SGE_JMX_SSL="false"
# SGE_JMX_SSL_CLIENT is used by qmasters JMX MBean
server
# if SGE_JMX_SSL_CLIENT=true, the mbean server
connection uses
# SSL authentication of the client in addition
SGE_JMX_SSL_CLIENT="false"
# SGE_JMX_SSL_KEYSTORE is used by qmasters JMX MBean
server
# if SGE_JMX_SSL=true the server keystore found here
is used
# e.g.
/var/sgeCA/port//private/keystore
SGE_JMX_SSL_KEYSTORE="Please enter absolute
path of server keystore file"
# SGE_JMX_SSL_KEYSTORE_PW is used by qmasters JMX
MBean server
# password for the SGE_JMX_SSL_KEYSTORE file
SGE_JMX_SSL_KEYSTORE_PW="Please enter the
server keystore password"
# SGE_JVM_LIB_PATH is used by qmasters jvm thread
# path to libjvm.so
# if value is missing or set to "none" JMX
thread will not be installed
# when the value is empty or path does not exit on
the system, Grid Engine
# will try to find a correct value, if it cannot do
so, value is set to
# "jvmlib_missing" and JMX thread will be
configured but will fail to start
#SGE_JVM_LIB_PATH="Please
enter absolute path of libjvm.so"
# SGE_ADDITIONAL_JVM_ARGS is used by qmasters jvm
thread
# jvm specific arguments as -verbose:jni etc.
# optional, can be empty
SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"
# CELL_NAME, will be a dir in SGE_ROOT, contains the
common dir
# Please enter only the name of the cell. No path,
please
#(mandatory for qmaster and execd installation)
CELL_NAME="default"
# ADMIN_USER, if you want to use a different admin
user than the owner,
# of SGE_ROOT, you have to enter the user name, here
# Leaving this blank, the owner of the SGE_ROOT dir
will be used as admin user
#ADMIN_USER=""
ADMIN_USER="sgeadmin"
# The dir, where qmaster spools this parts, which
are not spooled by DB
#(mandatory for qmaster installation)
#QMASTER_SPOOL_DIR="Please,
enter spooldir"
QMASTER_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool/qmaster"
# The dir, where the execd spools (active jobs)
# This entry is needed, even if your are going to
use
# berkeley db spooling. Only cluster configuration
and jobs will
# be spooled in the database. The execution daemon
still needs a spool
# directory
#(mandatory for qmaster installation)
#EXECD_SPOOL_DIR="Please,
enter spooldir"
EXECD_SPOOL_DIR="/opt/SGE_LOCAL_SPOOL/default/spool"
# For monitoring and accounting of jobs, every job
will get
# unique GID. So you have to enter a free GID Range,
which
# is assigned to each job running on a machine.
# If you want to run 100 Jobs at the same time on
one host you
# have to enter a GID-Range like that: 16000-16100
#(mandatory for qmaster installation)
#GID_RANGE="Please,
enter GID range"
GID_RANGE="20000-20100"
# If SGE is compiled with -spool-dynamic, you have
to enter here, which
# spooling method should be used. (classic or
berkeleydb)
#(mandatory for qmaster installation)
SPOOLING_METHOD="berkeleydb"
# Name of the Server, where the Spooling DB is
running on
# if spooling methode is berkeleydb, it must be
"none", when
# using no spooling server and it must contain the
servername
# if a server should be used. In case of
"classic" spooling,
# can be left out
DB_SPOOLING_SERVER="none"
# The dir, where the DB spools
# If berkeley db spooling is used, it must contain
the path to
# the spooling db. Please enter the full path. (eg.
/tmp/data/spooldb)
# Remember, this directory must be local on the
qmaster host or on the
# Berkeley DB Server host. No NFS mount, please
#DB_SPOOLING_DIR="spooldb"
DB_SPOOLING_DIR="/opt/SGE_LOCAL_SPOOLDB/default/spool/spooldb"
# This parameter set the number of parallel
installation processes.
# The prevent a system overload, or exeeding the
number of open file
# descriptors the user can limit the number of
parallel install processes.
# eg. set PAR_EXECD_INST_COUNT="20",
maximum 20 parallel execd are installed.
PAR_EXECD_INST_COUNT="20"
# A List of Host which should become admin hosts
# If you do not enter any host here, you have to add
all of your hosts
# by hand, after the installation. The
autoinstallation works without
# any entry
#ADMIN_HOST_LIST="host1
host2 host3 host4"
ADMIN_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"
# A List of Host which should become submit hosts
# If you do not enter any host here, you have to add
all of your hosts
# by hand, after the installation. The
autoinstallation works without
# any entry
#SUBMIT_HOST_LIST="host1
host2 host3 host4"
SUBMIT_HOST_LIST="linux64-server linux64-client1 linux64-cliennt2"
# A List of Host which should become exec hosts
# If you do not enter any host here, you have to add
all of your hosts
# by hand, after the installation. The
autoinstallation works without
# any entry
# (mandatory for execution host installation)
#EXEC_HOST_LIST="host1
host2 host3 host4"
EXEC_HOST_LIST="linux64-client1 linux64-cliennt2"
# The dir, where the execd spools (local
configuration)
# If you want configure your execution daemons to
spool in
# a local directory, you have to enter this
directory here.
# If you do not want to configure a local execution
host spool directory
# please leave this empty
#EXECD_SPOOL_DIR_LOCAL="Please,
enter spooldir"
EXECD_SPOOL_DIR_LOCAL="/opt/SGE_LOCAL_SPOOL/default/spool"
# If true, the domainnames will be ignored, during
the hostname resolving
# if false, the fully qualified domain name will be
used for name resolving
HOSTNAME_RESOLVING="true"
# Shell, which should be used for remote
installation (rsh/ssh)
# This is only supported, if your hosts and
rshd/sshd is configured,
# not to ask for a password, or promting any
message.
SHELL_NAME="ssh"
# This remote copy command is used for csp
installation.
# The script needs the remote copy command for
distributing
# the csp certificates. Using ssl the command scp
has to be entered,
# using the not so secure rsh the command rcp
has to be entered.
# Both need a passwordless ssh/rsh connection to the
hosts, which
# should be connected to. (mandatory for csp
installation mode)
COPY_COMMAND="scp"
# Enter your default domain, if you are using
/etc/hosts or NIS configuration
#DEFAULT_DOMAIN="none"
# If a job stops, fails, finish, you can send a mail
to this adress
ADMIN_MAIL="none"
# If true, the rc scripts (sgemaster, sgeexecd,
sgebdb) will be added,
# to start automatically during boottime
#ADD_TO_RC="false"
ADD_TO_RC="true"
#If this is "true" the file permissions of
executables will be set to 755
#and of ordenary file to 644.
SET_FILE_PERMS="true"
# This option is not implemented, yet.
# When a exechost should be uninstalled, the running
jobs will be rescheduled
RESCHEDULE_JOBS="wait"
# Enter a one of the three distributed scheduler
tuning configuration sets
# (1=normal, 2=high, 3=max)
SCHEDD_CONF="1"
# The name of the shadow host. This host must have
read/write permission
# to the qmaster spool directory
# If you want to setup a shadow host, you must enter
the servername
# (mandatory for shadowhost installation)
#SHADOW_HOST="hostname"
# Remove this execution hosts in automatic mode
# (mandatory for unistallation of execution hosts)
#EXEC_HOST_LIST_RM="host1
host2 host3 host4"
EXEC_HOST_LIST_RM="linux64-client1 linux64-cliennt2"
# This option is used for startup script removing.
# If true, all rc startup scripts will be removed
during
# automatic deinstallation. If false, the scripts
won't
# be touched.
# (mandatory for unistallation of execution/qmaster
hosts)
#REMOVE_RC="false"
REMOVE_RC="true"
# This is a Windows specific part of the auto
isntallation template
# If you going to install windows executions hosts, you
have to enable the
# windows support. To do this, please set the
WINDOWS_SUPPORT variable
# to "true". ("false" is
disabled)
# (mandatory for qmaster installation, by default
WINDOWS_SUPPORT is
# disabled)
WINDOWS_SUPPORT="false"
# Enabling the WINDOWS_SUPPORT, recommends the
following parameter.
# The WIN_ADMIN_NAME will be added to the list of
SGE managers.
# Without adding the WIN_ADMIN_NAME the execution
host installation
# won't install correctly.
# WIN_ADMIN_NAME is set to "Administrator"
which is default on most
# Windows systems. In some cases the WIN_ADMIN_NAME
can be prefixed with
# the windows domain name (eg. DOMAIN+Administrator)
# (mandatory for qmaster installation, if windows
hosts should be installed)
WIN_ADMIN_NAME="Administrator"
# This parameter is used to switch between local
ADMINUSER and Windows
# Domain Adminuser. Setting the WIN_DOMAIN_ACCESS
variable to true, the
# Adminuser will be a Windows Domain User. It is
recommended that
# a Windows Domain Server is configured and the
Windows Domain User is
# created. Setting this variable to false, the local
Adminuser will be
# used as ADMINUSER. The install script tries to
create this user account
# but we recommend, because it will be saver, to
create this user,
# before running the installation.
# (mandatory for qmaster installation, if windows
hosts should be installed)
WIN_DOMAIN_ACCESS="false"
# This section is used for csp installation mode.
# CSP_RECREATE recreates the certs on each
installtion, if true.
# In case of false, the certs will be created, if
not existing.
# Existing certs won't be overwritten. (mandatory
for csp install)
#CSP_RECREATE="true"
CSP_RECREATE="false"
# The created certs won't be copied, if this option
is set to false
# If true, the script tries to copy the generated
certs. This
# requires passwordless ssh/rsh access for user root
to the
# execution hosts
CSP_COPY_CERTS="false"
# csp information, your country code (only 2
characters)
# (mandatory for csp install)
CSP_COUNTRY_CODE="DE"
# your state (mandatory for csp install)
CSP_STATE="Germany"
# your location, eg. the building (mandatory for csp
install)
CSP_LOCATION="Building"
# your arganisation (mandatory for csp install)
CSP_ORGA="Organisation"
# your organisation unit (mandatory for csp install)
CSP_ORGA_UNIT="Organisation_unit"
# your email (mandatory for csp install)
CSP_MAIL_ADDRESS="name@yourdomain.com"
Installation:
1. install qmaster host locally
i.e.
logon the "ToBe" qmaster host
cd $SGE_ROOT
./inst_sge -m -auto /opt/SGE/util/install_modules/sge_inst.conf
logon the "ToBe" qmaster host
cd $SGE_ROOT
./inst_sge -m -auto /opt/SGE/util/install_modules/sge_inst.conf
2. verify qmaster installation log
cat /opt/SGE/default/common/install_logs/qmaster_install_mt-linux64-server_2010-05-25_06:17:35.log
Starting qmaster installation!
cat /opt/SGE/default/common/install_logs/qmaster_install_mt-linux64-server_2010-05-25_06:17:35.log
Starting qmaster installation!
Installing Grid Engine as user >sgeadmin<
Your $SGE_ROOT directory: /opt/SGE
Using SGE_QMASTER_PORT >6444<.
Using SGE_EXECD_PORT >6445<.
Using >default< as CELL_NAME.
Your $SGE_CLUSTER_NAME: my_test
Using >/opt/SGE_LOCAL_SPOOL/default/spool/qmaster< as QMASTER_SPOOL_DIR.
Verifying and setting file permissions and owner in >3rd_party<
Verifying and setting file permissions and owner in >bin<
Verifying and setting file permissions and owner in >ckpt<
Verifying and setting file permissions and owner in >dtrace<
Verifying and setting file permissions and owner in >examples<
Verifying and setting file permissions and owner in >inst_sge<
Verifying and setting file permissions and owner in >install_execd<
Verifying and setting file permissions and owner in >install_qmaster<
Verifying and setting file permissions and owner in >lib<
Verifying and setting file permissions and owner in >mpi<
Verifying and setting file permissions and owner in >pvm<
Verifying and setting file permissions and owner in >qmon<
Verifying and setting file permissions and owner in >util<
Verifying and setting file permissions and owner in >utilbin<
Verifying and setting file permissions and owner in >start_gui_installer<
Verifying and setting file permissions and owner in >catman<
Verifying and setting file permissions and owner in >doc<
Verifying and setting file permissions and owner in >include<
Verifying and setting file permissions and owner in >man<
Your file permissions were set
Using >true< as IGNORE_FQDN_DEFAULT.
If it's >true<, the domain name will be ignored.
Making directories
Setting spooling method to dynamic
Dumping bootstrapping information
Initializing spooling database
Using >20000-20100< as gid range.
Using >/opt/SGE_LOCAL_SPOOL/default/spool< as EXECD_SPOOL_DIR.
Using >none< as ADMIN_MAIL.
Adding default parallel environments (PE)
cp /opt/SGE/default/common/sgemaster /etc/init.d/sgemaster.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgemaster.my_test
starting sge_qmaster
Adding ADMIN_HOST linux64-server
adminhost "linux64-server" already exists
Adding ADMIN_HOST linux64-client1
linux64-client1 added to administrative host list
Adding ADMIN_HOST linux64-client2
linux64-client2 added to administrative host list
Creating the default
root@linux64-server added "@allhosts" to host group list
root@linux64-server added "all.q" to cluster queue list
Setting scheduler configuration to >Normal< setting!
changed scheduler configuration
sge_qmaster successfully installed!
3. configure qmaster host
# modify hosts group
i.e. need for qmaster host add
into @allhosts
4. install execution host locally
i.e.
i.e.
logon the "ToBe" execution host
cd $SGE_ROOT
./inst_sge -m -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf
5. verify execution host installation log
i.e.
cat /opt/SGE/default/common/install_logs/execd_install_linux64-client1_2010-05-26_15:48:46.log
Your $SGE_ROOT directory: /opt/SGE
Using cell: >default<
Using local execd spool directory [/opt/SGE_LOCAL_SPOOL/default/spool]
cd $SGE_ROOT
./inst_sge -m -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf
5. verify execution host installation log
i.e.
cat /opt/SGE/default/common/install_logs/execd_install_linux64-client1_2010-05-26_15:48:46.log
Your $SGE_ROOT directory: /opt/SGE
Using cell: >default<
Using local execd spool directory [/opt/SGE_LOCAL_SPOOL/default/spool]
Creating local configuration for host >linux64-client1<
sgeadmin@linux64-client added "linux64-client1" to configuration list
Local configuration for host >linux64-client1< created.
Host >linux64-client1< already in submit host list!
Host >linux64-client2< already in submit host list!
cp /opt/SGE/default/common/sgeexecd /etc/init.d/sgeexecd.my_test
/usr/lib/lsb/install_initd /etc/init.d/sgeexecd.my_test
starting sge_execd
6. (optimized) install multiple execution hosts from qmaster host
Note: The root user must be able to access all hosts through ssh without supplying a password
i.e.
i.e.
root uses SSH public key authentication on all hosts
logon the qmaster host after SGE qmaster installation completed
cat /tmp/execution_hosts_list
logon the qmaster host after SGE qmaster installation completed
cat /tmp/execution_hosts_list
linux64_client1
linux64_client2
for host in `cat /tmp/execution_hosts_list`;do ssh root@$host "cd /opt/SGE;./inst_sge -x -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf"; done
linux64_client2
for host in `cat /tmp/execution_hosts_list`;do ssh root@$host "cd /opt/SGE;./inst_sge -x -noremote -auto /opt/SGE/util/install_modules/sge_inst.conf"; done
Hi! Really helped! But should we copy config file to each exec node first before installing SGE ??
ReplyDeleteNo.
ReplyDeleteYou will find that there is an administration for oddity identification in Azure commercial center that you can call to follow any inconsistencies the telemetry information from the servers. machine learning course in pune
ReplyDeleteGreat post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more. what asset is used to build a remarketing list?
ReplyDelete