GT 3.9.5 WS GRAM : System Administrator's Guide

Introduction

This guide contains advanced configuration information for system administrators working with WS GRAM. It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation. It also describes additional prerequisites and host settings necessary for WS GRAM operation. Readers should be familiar with the Key Concepts and Implementation Approach for WS GRAM to understand the motivation for and interaction between the various deployed components.

This information is in addition to the basic installation instructions in the GT 3.9.5 System Administrator's Guide.

Building and Installing

Local Prerequisites

WS GRAM requires the following:

Host credentials

In order to use WS GRAM, the services running in the WSRF hosting environment require access to an appropriate host certificate.

GRAM service account

WS GRAM requires a dedicated local account within which the WSRF hosting environment and GRAM services will execute. This account will often be a globus account used for all local services, but may also be specialized to only host WS GRAM. User jobs will run in separate accounts as specified in the grid-mapfile or associated authorization policy configuration of the host.

Gridmap authorization of user account

In order to authorize a user to call GRAM services, the security configuration must map the Distinguished Name (DN) of the user to the name of the user in the system where the GRAM services run. Here are the configuration steps:

  1. In order to obtain the DN, which is the subject of the user certificate, run the bin/grid-cert-info command in $GLOBUS_LOCATION on the submission machine:
    % bin/grid-cert-info -identity
    /O=Grid/OU=GlobusTest/OU=simpleCA-foo.bar.com/OU=bar.com/CN=John Doe
      
  2. Create a /etc/grid-security/grid-mapfile. The syntax is to have one line per user, with the distinguished name followed by a whitespace and then the user account name on the GRAM machine. Since the distinguished name usually contains whitespace, it is placed between quotation marks, as in:
    "/O=Grid/OU=GlobusTest/OU=simpleCA-foo.bar.com/OU=bar.com/CN=John Doe" johndoe

Functioning sudo

WS GRAM requires that the sudo command is installed and functioning on the service host where WS GRAM software will execute.

Authorization rules will need to be added to the sudoers file to allow the WS GRAM service account to execute (without a password) local scheduler adapters in the accounts of authorized GRAM users. This topic is covered in detail in the Configuring sudo section.

Local scheduler

WS GRAM depends on a local mechanism for starting and controlling jobs. If the fork-based WS GRAM mode is to be used, no special software is required. For batch scheduling mechanisms, the local scheduler must be installed and configured for local job submission prior to deploying and operating WS GRAM. The supported batch schedulers in the GT 3.9.5 release are: PBS, Condor, LSF

RFT prerequisites include PostgreSQL to be installed and configured. The instructions are here. WS GRAM depends on RFT for file staging and cleanup. File staging from client host to compute host and visa versa. Important: Jobs requesting these functions will fail if RFT is not properly setup.

Full GT 3.9.5 Installation including WS GRAM

Please refer to the GT 3.9.5 System Administrator's Guide for instructions on how to install the toolkit.

If you wish to install only the WS-GRAM component and it's dependencies, do the following instead of the final make step in the above mentioned instructions:

    globus$ make wsgram
    globus$ gpt-postinstall

Configuring

Configuration settings

Locating configuration files

All the GRAM service configuration files are located in subdirectories of the $GLOBUS_LOCATION/etc directory. The names of the GRAM configuration directories all start with gram-service. For instance, with a default GRAM installation, the command line:

% ls etc | grep gram-service
gives the following output:
gram-service
gram-service-Fork
gram-service-Multi

Web service deployment configuration

The file $GLOBUS_LOCATION/etc/gram-service/server-config.wsdd contains information necessary to deploy and instantiate the GRAM services in the Globus container.

Three GRAM services are deployed:

  • ManagedExecutableJobService: service invoked when querying or managing an executable job
  • ManagedMultiJobService: service invoked when querying or managing a multijob
  • ManagedJobFactoryService: service invoked when submitting a job
Each service deployment information contains the name of the Java service implementation class, the path to the WSDL service file, the name of the operation providers that the service reuses for its implementation of WSDL-defined operations, etc. More information about the service deployment configuration information can be found here.

JNDI application configuration

The configuration of WSRF resources and application-level service configuration not related to service deployment is contained in JNDI files. The JNDI-based GRAM configuration is of two kinds:

Common job factory configuration

The file $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml contains configuration information that is common to every local resource manager.

More precisely, the configuration data it contains pertains to the implementation of the GRAM WSRF resources (factory resources and job resources), as well as initial values of WSRF resource properties that are always published by any Managed Job Factory WSRF resource.

The data is categorized by service, because according to WSRF, in spite of the service/resource separation of concern, a given service will use only one XML Schema type of resource. In practice it is therefore clearer to categorize the configuration resource implementation by service, even if theoretically speaking a given resource implementation could be used by several services. For more information, refer to the Java WS Core documentation.

Here is the decomposition, in JNDI objects, of the common configuration data, categorized by service. Each XYZHome object contains the same Globus Core-defined information for the implementation of the WSRF resource, such as the Java implementation class for the resource (resourceClass datum), the Java class for the resource key (resourceKeyType datum), etc.

  • ManagedExecutableJobService
    • ManagedExecutableJobHome: configuration of the implementation of resources for the service.
  • ManagedMultiJobService
    • ManagedMultiJobHome: configuration of the implementation of resources for the service
  • ManagedJobFactoryService
    • FactoryServiceConfiguration: this encapsulates configuration information used by the factory service. Currently this identifies the service to associate to a newly created job resource in order to create an endpoint reference and return it.
    • ManagedJobFactoryHome: implementation of resources for the service resourceClass
    • FactoryHomeConfiguration: this contains GRAM application-level configuration data i.e. values for resource properties common to all factory resources. For instance, the path to the Globus installation, host information such as CPU type, manufacturer, operating system name and version, etc.

Local resource manager configuration

When a SOAP call is made to a GRAM factory service in order to submit a job, the call is actually made to a GRAM service-resource pair, where the factory resource represents the local resource manager to be used to execute the job.

There is one directory gram-service-<manager>/ for each local resource manager supported by the GRAM installation.

For instance, let's assume the command line:

% ls etc | grep gram-service-
gives the following output:
gram-service-Fork
gram-service-LSF
gram-service-Multi

In this example, the Multi, Fork and LSF job factory resources have been installed. Multi is a special kind of local resource manager which enables the GRAM services to support multijobs.

The JNDI configuration file located under each manager directory contains configuration information for the GRAM support of the given local resource manager, such as the name that GRAM uses to designate the given resource manager. This is referred to as the GRAM name of the local resource manager.

For instance, $GLOBUS_LOCATION/etc/gram-service-Fork/jndi-config.xml contains the following XML element structure:

    <service name="ManagedJobFactoryService">
        <!-- LRM configuration:  Fork -->
        <resource
            name="ForkResourceConfiguration"
            type="org.globus.exec.service.factory.FactoryResourceConfiguration">
            <resourceParams>
                [...]
                <parameter>
                    <name>
                        localResourceManagerName
                    </name>
                    <value>
                        Fork
                    </value>
                </parameter>           
                <parameter>
                    <name>
                        scratchDirectory
                    </name>
                    <value>
                        ${GLOBUS_USER_HOME}
                    </value>
                </parameter>           
            </resourceParams>
        </resource>        
    </service>

In the example above, the GRAM name of the local resource manager is Fork. This value can be used with the GRAM command line client in order to specify which factory resource to use when submitting a job. Similarly, it is used to create contruct an endpoint reference to the chosen factory service-resource pair when using the GRAM client API.

In the example above, the scratchDirectory is set to ${GLOBUS_USER_HOME}. This is the default setting, it can be configured to point to an alternate netowrk file sustem path that is common to the compute cluster and is typically less reliable (auto purging), while offering a greater amount of disk space. (e.g. /scratch)

Security descriptor

The file $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml contains the Core security configuration for the GRAM ManagedJobFactory service:
  • default security information for all remote invocations, such as:
    • the authorization method, based on a Gridmap file (in order to resolve user credentials to local user names)
    • limited proxy credentials will be rejected
  • security information for the createManagedJob operation
The file $GLOBUS_LOCATION/etc/gram-service/managed-job-security-config.xml contains the Core security configuration for the GRAM job resources:
  • The default is to only allow the identity that called the createManagedJob operation to access the resource.
Note: GRAM does not override the container security credentials defined in $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml. These are the credentials used to authenticate all service requests.

GRAM and GridFTP file system mapping

The file $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml contains information to associate local resource managers with GridFTP servers. GRAM uses the GridFTP server (via RFT) to perform all file staging directives. Since the GridFTP server and the Globus service container can be run on separate hosts, a mapping is needed between the common file system paths of these 2 hosts. This enables the GRAM services to resolve file:/// staging directives to the local GridFTP URLs.

below is the default Fork entry. mapping a jobPath of / to ftpPath of / will allow any file staging directive to be attempted.

    <map>
        <scheduler>Fork</scheduler>
        <ftpServer>
           <protocol>gsiftp</protocol>
           <host>myhost.org</host>
           <port>2811</port>
        </ftpServer>
        <mapping>
           <jobPath>/</jobPath>
           <ftpPath>/</ftpPath>
        </mapping>
    </map>
For a scheduler, where jobs will typically run on a compute node, a default entry is not provided. This means staging directives will fail until a mapping is entered. Here is an example of a compute cluster with PBS installed and has 2 common mount points between the front end host and the GridFTP server host.
    <map>
        <scheduler>PBS</scheduler>
        <ftpServer>
           <protocol>gsiftp</protocol>
           <host>myhost.org</host>
           <port>2811</port>
        </ftpServer>
        <mapping>
           <jobPath>/pvfs/mount1/users</jobPath>
           <ftpPath>/pvfs/mount2/users</ftpPath>
        </mapping>
        <mapping>
           <jobPath>/pvfs/jobhome</jobPath>
           <ftpPath>/pvfs/ftphome</ftpPath>
        </mapping>
    </map>
The file system mapping schema doc is here.

 Scheduler-Specific Configuration Files

In addition to the service configuration described above, there are scheduler-specific configuration files for the Scheduler Event Generator modules. These files consist of name=value pairs separated by newlines. These files are:

$GLOBUS_LOCATION/etc/globus-fork.conf
Configuration for the Fork SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG Fork log (used by the globus-fork-starter and the SEG). The value of this should be the path to a world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-fork.log. This file must be readable by the account that the SEG is running as.
$GLOBUS_LOCATION/etc/globus-condor.conf
Configuration for the Condor SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG Condor log (used by the Globus::GRAM::JobManager::condor perl module and Condor SEG module. The value of this should be the path to a world-readable and world-writable file. The default value for this created by the Fork setup package is $GLOBUS_LOCATION/var/globus-condor.log
$GLOBUS_LOCATION/etc/globus-pbs.conf
Configuration for the PBS SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG PBS logs (used by the Globus::GRAM::JobManager::pbs perl module and PBS SEG module. The value of this should be the path to the directory containing the server logs generated by PBS. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.
$GLOBUS_LOCATION/etc/globus-lsf.conf
Configuration for the PBS SEG module implementation. The attributes names for this file are:
log_path
Path to the SEG LSF log directory. This is used by the LSF SEG module. The value of this should be the path to the directory containing the server logs generated by LSF. For the SEG to operate, these files must have file permissions such that the files may be read by the user the SEG is running as.

Setting up service credentials

In a default build and install of the Globus Toolkit, the local account is configured to use host credentials at /etc/grid-service/containercert.pem and containerkey.pem. If you already have host certs, then you can just copy them to the new name and set ownership.
	% cd /etc/grid-security
	% cp hostcert.pem containercert.pem
	% cp hostkey.pem containerkey.pem
	% chown globus.globus container*.pem

Replace globus.globus with the user and group the container is installed as.

You should now have something like:

/etc/grid-security$ ls -l *.pem
-rw-r--r--  1 globus globus 1785 Oct 14 14:47 containercert.pem
-r--------  1 globus globus  887 Oct 14 14:47 containerkey.pem
-rw-r--r--  1 root   root   1785 Oct 14 14:42 hostcert.pem
-r--------  1 root   root    887 Sep 29 09:59 hostkey.pem
The result is a copy of the host credentials which are accessible by the container.

If this is not an option, then you can configure an alternate location to point to host credentials -or- configure to use just a user proxy (personal mode).

Enabling Local Scheduler Adapter

The batch scheduler interface implementations included in the release tarball are: PBS, Condor and LSF. To install one of the batch scheduler adapters, follow these steps (shown for pbs):

    % cd $GLOBUS_LOCATION\gt3.9.5-all-source-installer

    % make gt4-gram-pbs

    % gpt-postinstall

Using PBS as the example, make sure the batch scheduler commands are in your path (qsub, qstat, pbsnodes).

For PBS, another setup step is required to configure the remote shell for rsh access:


    % cd $GLOBUS_LOCATION/setup/globus

    % ./setup-globus-job-manager-pbs --remote-shell=rsh

The last thing is to define the GRAM and GridFTP file system mapping for PBS.

Done! You have added the PBS scheduler adapters to your GT installation.

Configuring sudo

When the credentials of the service account and the job submitter are different (multi user mode), then GRAM will prepend a call to sudo to the local adapter callout command. Important: If sudo is not configured properly, the command and thus job will fail.

As root, here are the two lines to add to the /etc/sudoers file for each GLOBUS_LOCATION installation, where /opt/globus/GT3.9.5 should be replaced with the GLOBUS LOCATION for your installation:

# Globus GRAM entries
   globus  ALL=(username1,username2) 
           NOPASSWD: /opt/globus/GT3.9.5/libexec/globus-gridmap-and-execute 
           /opt/globus/GT3.9.5/libexec/globus-job-manager-script.pl *
   globus  ALL=(username1,username2) 
           NOPASSWD: /opt/globus/GT3.9.5/libexec/globus-gridmap-and-execute 
           /opt/globus/GT3.9.5/libexec/globus-gram-local-proxy-tool *
      

Extra steps for non-default installation

Non-default service credentials

Alternate location for host credentials

If setting up host credentials in the default location of /etc/grid-security/containercert.pem and containerkey.pem is not an option for you, then you can configure an alternate location to point to host credentials.

Security descriptor configuration details are here, but the quick change is to edit this file - $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml - by changing the cert and key paths to point to host credentials that the service account owns.

User proxy
To run the container using just a user proxy, simply comment out the ContainerSecDesc parameter in this file $GLOBUS_LOCATION/etc/globus_wsrf_core/server-config.wsdd as follows:
    <!--
        <parameter 
            name="containerSecDesc" 
            value="etc/globus_wsrf_core/global_security_descriptor.xml"/>
     -->
      

Running in personal mode (user proxy), another GRAM configuration setting is required. For GRAM to authorize the RFT service when performing staging functions, it needs to know the subject DN for verification. Here are the steps:

	% cd $GLOBUS_LOCATION/setup/globus
	% ./setup-gram-service-common --staging-subject=
         "/DC=org/DC=doegrids/OU=People/CN=Stuart Martin 564720"
      

You can get your subject DN by running this command:

	% grid-cert-info -subject

Non-default GridFTP server

By default, the GridFTP server is assumed to run as root on localhost:2811. If this is not true for your site then change it by running this command with the proper GridFTP URL values:

	% cd $GLOBUS_LOCATION/setup/globus
	% ./setup-gram-service-common --gridftp-server="gsiftp://gridftp.host.org:1234"
      

Also, the GridFTP host and/or port must be updated by editing the GRAM and GridFTP file system mapping config file: $GLOBUS_LOCATION/etc/gram-service/globus_gram_fs_map_config.xml.

Non-default container port

By default, the globus services will assume 8443 is the port the Globus container is using. However the container can be run under a non-standard port, for example:

	% globus-start-container -p 4321
      

When doing this, GRAM needs to be told the port to use to contact the RFT service, like so:

	% cd $GLOBUS_LOCATION/setup/globus
	% ./setup-gram-service-common --staging-port="4321"

Non-default gridmap

If you wish to specify a non-standard gridmap file in a multi-user installation, two basic configurations need to be changed:

  • $GLOBUS_LOCATION/etc/globus_wsrf_core/global_security_descriptor.xml
    • As specified in the gridmap config instructions, add a <gridmap value="..."/> element to the file appropriately.
  • /etc/sudoers
    • Add
      -g /path/to/grid-mapfile
      as the first argument to all instances of the
      globus-gridmap-and-exec
      command.

Example:

global_security_descriptor.xml

    ...

    <gridmap value="/opt/grid-mapfile"/>

    ...
    
sudoers
   ...

   # Globus GRAM entries
   globus  ALL=(username1,username2) 
           NOPASSWD: /opt/globus/GT3.9.5/libexec/globus-gridmap-and-execute 
           -g /opt/grid-mapfile
           /opt/globus/GT3.9.5/libexec/globus-job-manager-script.pl *
   globus  ALL=(username1,username2) 
           NOPASSWD: /opt/globus/GT3.9.5/libexec/globus-gridmap-and-execute 
           -g /opt/grid-mapfile
           /opt/globus/GT3.9.5/libexec/globus-gram-local-proxy-tool *

    ...
    

Non-default job resource limit

The current limit on the number of job resources (both exec and multi) allowed to exist at any one time is 1000. This limit was chosen from scalability tests as an appropriate precaution to avoid out-of-memory errors. To change this value to, say, 150, use the setup-gram-service-common script as follows:

	% cd $GLOBUS_LOCATION/setup/globus
	% ./setup-gram-service-common --max-job-limit="150"

Testing

See the WS GRAM users guide for information about submitting a test job.

Security Considerations

No special security considerations exist at this time.

Troubleshooting

The job manager detected an invalid script response

  • Check for a restrictive umask. When the service writes the native scheduler job description to a file, an overly restrictive umask will cause the permissions on the file to be such that the submission script run through sudo as the user cannot read the file (bug #2655).

Usage statistics collection by the Globus Alliance

The following usage statistics are sent by default in a UDP packet (in addition to the GRAM component code, packet version, timestamp, and source IP address) at the end of each job (i.e. when Done or Failed state is entered).

  • job creation timestamp (helps determine the rate at which jobs are submitted)
  • scheduler type (Fork, PBS, LSF, Condor, etc...)
  • jobCredentialEndpoint present in RSL flag (to determine if server-side user proxies are being used)
  • fileStageIn present in RSL flag (to determine if the staging in of files is used)
  • fileStageOut present in RSL flag (to determine if the staging out of files is used)
  • fileCleanUp present in RSL flag (to determine if the cleaning up of files is used)
  • CleanUp-Hold requested flag (to determine if streaming is being used)
  • job type (Single, Multiple, MPI, or Condor)
  • gt2 error code if job failed (to determine common scheduler script errors users experience)
  • fault class name if job failed (to determine general classes of common faults users experience)

If you wish to disable this feature, please see the Java WS Core System Administrator's Guide section on Usage Statistics Configuration for instructions.

Also, please see our policy statement on the collection of usage statistics.