Introduction
This guide is intended to help a developer create GRAM5 clients in C. It provides an overview of the concepts and APIs needed to interact with GRAM services.
Table of Contents
- 1. Before you begin
- 2. GRAM5 Concepts for Developers
- 3. Basic GRAM Client Scenarios
- 4. Advanced GRAM Client Scenarios
- 5. Tutorials
- 6. APIs
- 7. RSL Specification v1.1
- 8. Debugging
- 9. Troubleshooting
- 10. Semantics and syntax of protocols
- 11. Related Documentation
- 12. Internal Components
- Glossary
- Index
Table of Contents
New Features new since 5.2.0:
- Better integration with Linux operating systems with native RPM and Debian packages
- Improved logging and integration with system log rotation tools
- Improved scalability and reliability
Other Standard Supported Features
- Remote job execution and management
- Uniform and flexible interface to local resource managers
- File staging before and after job execution
- File and directory clean up after job termination
- Service auditing for each submitted
Removed Features
- Condor SEG module is no longer included. Its functionality has been moved into the core of the job manager program.
Tested platforms for GRAM5:
Linux
- CentOS 5, 6 x86_64, i386
- Fedora 15, 16 x86_64, i386
- Red Hat Enterprise Linux 5, 6 x86_64, i386
- Scientific Linux 5, 6 x86_64, i386
- Debian 6, 7 (testing) x86_64, i386
- Ubuntu 10.04LTS, 10.10, 11.04, 11.10, 12.04LTS (testing) x86_64, i386
Mac OS X
- Mac OS X 10.7 (Lion)
Solaris
- Solaris 11
Protocol changes in GRAM since GT4 series:
- The GRAM5 service uses a superset of the GRAM2 protocol for communciation between the client and service. The extensions supported in GRAM5 are implemented in such a way that they are ignored by GRAM2 services or clients. These extensions provide improved error messages and version detection.
- GRAM5 does not support task coallocation using DUROC and its related protocols. Jobs submitted using DUROC directives will fail.
- GRAM5 does not support file streaming. The standard output and standard error streams are sent after the job completes instead of during execution. As a special case, support for the Condor grid monitor program implements a small subset of the streaming capabilities of GRAM2 in GT 4.2.x.
GRAM5 runs different parts of itself under different privilege
levels. The globus-gatekeeper runs as root, and uses its root privilege to access
the host's private key. It uses the grid map file to
map Grid Certificates to
local user ids and then uses the setuid() function
to change to that user and execute the globus-job-manager program
The globus-job-manager program runs as a local non-root account. It receives a delegated limited proxy certificate from the GRAM5 client which it uses to access Grid storage resources via GridFTP and to authenticate job signals (such as client cancel requests), and send job state callbacks to registered clients. This proxy is generally short-lived, and is automatically removed by the job manager when the job completes.
The globus-job-manager program uses a publicly-writable directory for job state files. This directory has the sticky bit set, so users may not remove other users files. Each file is named by a UUID, so it should be unique.
Table of Contents
In the GRAM Client API, all functions that involve sending messages over the network have both blocking and nonblocking variants. These are useful in different programming situations.
The blocking variants, such as the
globus_gram_client_job_request function
require less application code, but will prevent subsequent
instructions from executing until the request has been sent
and the reply parsed. In a non-threaded environment, other
callback functions registered with the Globus event driver may
be invoked while the blocking function is running. In a threaded
environment, other events may occur in other threads while the
function is blocking, but the current thread will be blocked until
the response is parsed.
The nonblocking variants, such as
globus_gram_client_register_job_request
require the application to include a callback function which will
be called by the Globus event driver when the reply has been
parsed. In a non-threaded environment, applications must poll
the event driver using functions from the
globus_poll or
globus_cond_wait families of
functions. In a threaded environment, the callback function
may be invoked in another thread than the one calling the
non-blocking function, even before the non-blocking function has
returned. Application writers must be careful in using
synchronization primitives such as globus_mutex_t
and globus_cond_t when using non-blocking functions.
An application writer should use the non-blocking variants if the application will be submitting many jobs concurrently or requires custom network or security attributes. Using the non-blocking variants allows the Globus event driver to better schedule network I/O in these cases.
GRAM uses three types of contact strings to describe how to contact different services. These service contacts are:
Table 2.1. GRAM Contact String Types
Type | Meaning |
|---|---|
Gatekeeper Service Contact | This string describes how to contact a gatekeeper service. It is used to submit jobs, send "ping" requests to determine if a service is properly deployed, and version requests to determine what version of the software is deployed. Full details of the syntax of this contact is located in the next section. |
Callback Contact | This string is an HTTPS URL that is an endpoint for GRAM job state callbacks. An https message is posted to this address when the Job Manager detects a job state change. |
Job Contact | This string is an HTTPS URL that is an endpoint for contacting an existing GRAM job. An https message is posted to this address to cancel, signal, or query a GRAM job. |
In GRAM5, a Gatekeeper Service Contact
contains the host, port, service name, and service identity
required to contact a particular GRAM service. For convenience,
default values are used when parts of the contact are omitted.
An example of a full gatekeeper service contact is
grid.example.org:2119/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org.
The various forms of the resource name using default values follow:
HOSTHOST:PORTHOST:PORT/SERVICEHOST/SERVICEHOST:/SERVICEHOST:PORT:SUBJECTHOST/SERVICE:SUBJECTHOST:/SERVICE:SUBJECTHOST:PORT/SERVICE:SUBJECT
Where the various values have the following meaning:
HOST- Network name of the machine hosting the service.
PORT- Network port number that the service is listening on. If not specified, the default of
2119is used. SERVICE- Path of the service entry in
. If not specified, the default of$GLOBUS_LOCATION/etc/grid-servicesjobmanageris used. SUBJECT- X.509 identity of the credential used by the service. If not specified, the default of
host@HOSTis used.
Example 2.1. Gatekeeper Service Contact Examples
The following strings all name the service
grid.example.org:2119/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org
using the formats with the various defaults described above.
grid.example.orggrid.example.org:2119grid.example.org:2119/jobmanagergrid.example.org/jobmanagergrid.example.org:/jobmanagergrid.example.org:2119:/C=US/O=Example/OU=Grid/CN=host/grid.example.orggrid.example.org/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.orggrid.example.org:/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.orggrid.example.org:2119/jobmanager:/C=US/O=Example/OU=Grid/CN=host/grid.example.org
GRAM clients and learn about a job's state in two ways: by registering for job state callbacks and by polling for status. These two methods have different performance characteristics and costs.
In order to receive job state callbacks, a client application must
create an HTTPS listener using the
globus_gram_client_callback_allow or
globus_gram_client_info_callback_allow functions.
A non-threaded application must then periodically call a function from
either the globus_cond_wait or
globus_poll families in order to process the
job state callbacks. Additionally, the network must be configured to
allow the GRAM job manager to send messages to the port that the client
is listening on. This may be difficult if there is a firewall between
the client and service.
The GRAM service initiates the job state callbacks, and thus they are usually sent very shortly after the job state changes, so clients can be notified about the state changes quickly.
In order to poll for job states, a client can call either the blocking
or nonblocking variant of the
globus_gram_client_job_status or
globus_gram_client_job_status_with_info functions.
These functions require that the network be configured to allow
the client to contact the network port that the GRAM service is
listening on (the Job Contact).
The client intiates these polling operations, so they are only as accurate as the polling frequence of the client. If the client polls very often, it will receive job state changes more quickly, at the risk of increasing the computing and network cost of both the client and service.
The GRAM5 protocols all use GSSAPIv2 abstractions to provide authentication and authorization. By default, GRAM uses an SSL-based GSSAPI for its security.
The client delegates a credential to the gatekeeper service after authentication, and the GRAM job manager service uses this delegated credential as both a job-specific credential and for subsequent communication with GRAM clients.
If a client or clients submit multiple jobs to a gatekeeper service, they will eventually all be handled by a single job manager process. This process will use whichever delegated credential will remain valid the longest for accepting new connections and connecting to clients to send job state callbacks. When a client delegates a new credential to a job, this credential may also be used as the job manager's credential for future connections.
Table of Contents
This chapter contains a series of examples demonstrating how to use different features of the GRAM APIs to interact with the GRAM service. These examples can be compiled by using GNU make with the makefile from Makefile.examples.
This example shows how to use a gatekeeper "ping" request to determine if a service is running and if the client is authorized to contact it. It takes a gatekeeper service contact as its only command-line option. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
int
main(int argc, char *argv[])
{
int rc;
if (argc != 2)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT\n", argv[0]);
rc = 1;
goto out;
}
printf("Pinging GRAM resource: %s\n", argv[1]);
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/*
* Ping the service passed as our first command-line option. If successful,
* this function will return GLOBUS_SUCCESS, otherwise an integer
* error code.
*/
rc = globus_gram_client_ping(argv[1]);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to ping service at %s because %s (Error %d)\n",
argv[1], globus_gram_client_error_string(rc), rc);
}
else
{
printf("Ping successful\n");
}
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_ping_example.c */
This example shows how to use the "version" command to determine what software version a gatekeeper service is running. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include "globus_gram_protocol.h"
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
int rc;
globus_hashtable_t extensions = NULL;
globus_gram_protocol_extension_t * extension_value;
if (argc != 2)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT\n", argv[0]);
rc = 1;
goto out;
}
printf("Checking version of GRAM resource: %s\n", argv[1]);
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/*
* Contact the service passed as our first command-line option and perform
* a version check. If successful,
* this function will return GLOBUS_SUCCESS, otherwise an integer
* error code. Old versions of the job manager will return
* GLOBUS_GRAM_PROTOCOL_ERROR_HTTP_UNPACK_FAILED as they do not support
* the version operation.
*/
rc = globus_gram_client_get_jobmanager_version(argv[1], &extensions);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to get service version from %s because %s "
"(Error %d)\n",
argv[1], globus_gram_client_error_string(rc), rc);
}
else
{
/* The version information is returned in the extensions hash table */
extension_value = globus_hashtable_lookup(
&extensions,
"toolkit-version");
if (extension_value == NULL)
{
printf("Unknown toolkit version\n");
}
else
{
printf("Toolkit Version: %s\n", extension_value->value);
}
extension_value = globus_hashtable_lookup(
&extensions,
"version");
if (extension_value == NULL)
{
printf("Unknown package version\n");
}
else
{
printf("Package Version: %s\n", extension_value->value);
}
/* Free the extensions hash and its values */
globus_gram_protocol_hash_destroy(&extensions);
}
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_version_example.c */
This example shows how to submit a job to a GRAM service. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
int
main(int argc, char *argv[])
{
int rc;
char * job_contact = NULL;
if (argc != 3)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT RSL\n", argv[0]);
rc = 1;
goto out;
}
printf("Submitting job to GRAM resource: %s\n", argv[1]);
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/*
* Submit the job request to the service passed as our first command-line
* option. If successful, this function will return GLOBUS_SUCCESS,
* otherwise an integer error code.
*/
rc = globus_gram_client_job_request(
argv[1], argv[2], 0, NULL, &job_contact);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to submit job to %s because %s (Error %d)\n",
argv[1], globus_gram_client_error_string(rc), rc);
if (job_contact != NULL)
{
printf("Job Contact: %s\n", job_contact);
}
}
else
{
/* Display job contact string */
printf("Job submit successful: %s\n", job_contact);
}
if (job_contact != NULL)
{
free(job_contact);
}
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_submit_example.c */
This example shows how to submit a job to a GRAM service and then wait until the job reaches the FAILED or DONE state. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
struct monitor_t
{
globus_mutex_t mutex;
globus_cond_t cond;
globus_gram_protocol_job_state_t state;
};
/*
* Job State Callback Function
*
* This function is called when the job manager sends job states.
*/
static
void
example_callback(void * callback_arg, char * job_contact, int state,
int errorcode)
{
struct monitor_t * monitor = callback_arg;
globus_mutex_lock(&monitor->mutex);
printf("Old Job State: %d\nNew Job State: %d\n", monitor->state, state);
monitor->state = state;
if (state == GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED ||
state == GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE)
{
globus_cond_signal(&monitor->cond);
}
globus_mutex_unlock(&monitor->mutex);
}
int
main(int argc, char *argv[])
{
int rc;
char * callback_contact = NULL;
char * job_contact = NULL;
struct monitor_t monitor;
if (argc != 3)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT RSL\n", argv[0]);
rc = 1;
goto out;
}
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
rc = globus_mutex_init(&monitor.mutex, NULL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error initializing mutex\n");
goto deactivate;
}
rc = globus_cond_init(&monitor.cond, NULL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error initializing condition variable\n");
goto destroy_mutex;
}
monitor.state = GLOBUS_GRAM_PROTOCOL_JOB_STATE_UNSUBMITTED;
globus_mutex_lock(&monitor.mutex);
/*
* Allow GRAM state change callbacks
*/
rc = globus_gram_client_callback_allow(
example_callback, &monitor, &callback_contact);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error allowing callbacks because %s (Error %d)\n",
globus_gram_client_error_string(rc), rc);
goto destroy_cond;
}
/*
* Submit the job request to the service passed as our first command-line
* option.
*/
rc = globus_gram_client_job_request(
argv[1], argv[2],
GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED|
GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE,
callback_contact, &job_contact);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to submit job to %s because %s (Error %d)\n",
argv[1], globus_gram_client_error_string(rc), rc);
/* Job submit failed. Short circuit the while loop below by setting
* the job state to failed
*/
monitor.state = GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED;
}
else
{
/* Display job contact string */
printf("Job submit successful: %s\n", job_contact);
}
/* Wait for job state callback to let us know the job has completed */
while (monitor.state != GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE &&
monitor.state != GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED)
{
globus_cond_wait(&monitor.cond, &monitor.mutex);
}
rc = globus_gram_client_callback_disallow(callback_contact);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error disabling callbacks because %s (Error %d)\n",
globus_gram_client_error_string(rc), rc);
}
globus_mutex_unlock(&monitor.mutex);
if (job_contact != NULL)
{
free(job_contact);
}
destroy_cond:
globus_cond_destroy(&monitor.cond);
destroy_mutex:
globus_mutex_destroy(&monitor.mutex);
deactivate:
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_submit_and_wait_example.c */
This example shows how to submit a job to a GRAM service and then wait until the job reaches the FAILED or DONE state. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
int
main(int argc, char *argv[])
{
int rc;
int status = 0;
int failure_code = 0;
if (argc != 2)
{
fprintf(stderr, "Usage: %s JOB-CONTACT\n", argv[0]);
rc = 1;
goto out;
}
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/*
* Check the job status of the job named by the first argument to
* this program.
*/
rc = globus_gram_client_job_status(argv[1], &status, &failure_code);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to check job status because %s (Error %d)\n",
globus_gram_client_error_string(rc), rc);
}
else
{
switch (status)
{
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_UNSUBMITTED:
printf("Unsubmitted\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_STAGE_IN:
printf("StageIn\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING:
printf("Pending\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_ACTIVE:
printf("Active\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_SUSPENDED:
printf("Suspended\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_STAGE_OUT:
printf("StageOut\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE:
printf("Done\n");
break;
case GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED:
printf("Failed (%d)\n", failure_code);
break;
default:
printf("Unknown job state\n");
break;
}
}
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_poll_example.c */
This example shows how to cancel a job being run by a GRAM service. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
int
main(int argc, char *argv[])
{
int rc;
if (argc != 2)
{
fprintf(stderr, "Usage: %s JOB-CONTACT\n", argv[0]);
rc = 1;
goto out;
}
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/*
* Cancel the job named by the first argument to
* this program.
*/
rc = globus_gram_client_job_cancel(argv[1]);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to cancel job because %s (Error %d)\n",
globus_gram_client_error_string(rc), rc);
}
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_cancel_example.c */
This example shows how to refresh a GRAM job's credential after the job has been submitted by some other means. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
int
main(int argc, char *argv[])
{
int rc;
if (argc != 2)
{
fprintf(stderr, "Usage: %s JOB-CONTACT\n", argv[0]);
rc = 1;
goto out;
}
printf("Refreshing Credential for GRAM Job: %s\n", argv[1]);
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/*
* Refresh the credential of the job running at the contact named
* by the first command-line argument to this program. We'll use the
* process's default credential by passing in GSS_C_NO_CREDENTIAL.
*/
rc = globus_gram_client_job_refresh_credentials(
argv[1], GSS_C_NO_CREDENTIAL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to refresh credential for job %s because %s (Error %d)\n",
argv[1], globus_gram_client_error_string(rc), rc);
}
else
{
printf("Refresh successful\n");
}
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_refresh_example.c */
This example shows how to submit a series of GRAM jobs using the
non-blocking function
globus_gram_client_register_job_request and
wait until all submissions have completed. This example throttles
the number of concurrent job submissions to reduce the load
on the service node. The
source to
this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
struct monitor_t
{
globus_mutex_t mutex;
globus_cond_t cond;
int submit_pending;
int successful_submits;
};
#define CONCURRENT_SUBMITS 5
static
void
example_submit_callback(
void * user_callback_arg,
globus_gram_protocol_error_t operation_failure_code,
const char * job_contact,
globus_gram_protocol_job_state_t job_state,
globus_gram_protocol_error_t job_failure_code)
{
struct monitor_t * monitor = user_callback_arg;
globus_mutex_lock(&monitor->mutex);
monitor->submit_pending--;
if (monitor->submit_pending < CONCURRENT_SUBMITS)
{
globus_cond_signal(&monitor->cond);
}
printf("Submitted job %s\n",
job_contact ? job_contact : "UNKNOWN");
if (operation_failure_code == GLOBUS_SUCCESS)
{
monitor->successful_submits++;
}
else
{
printf("submit failed because %s (Error %d)\n",
globus_gram_client_error_string(operation_failure_code),
operation_failure_code);
}
globus_mutex_unlock(&monitor->mutex);
}
int
main(int argc, char *argv[])
{
int rc;
int i;
struct monitor_t monitor;
if (argc < 3)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT RSL-SPEC...\n",
argv[0]);
rc = 1;
goto out;
}
printf("Submiting %d jobs to %s\n", argc-2, argv[1]);
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
rc = globus_mutex_init(&monitor.mutex, NULL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error initializing mutex %d\n", rc);
goto deactivate;
}
rc = globus_cond_init(&monitor.cond, NULL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error initializing condition variable %d\n", rc);
goto destroy_mutex;
}
monitor.submit_pending = 0;
/* Submits jobs from argv[2] until end of the argv array. At most
* CONCURRENT_SUBMITS will be pending at any given time.
*/
globus_mutex_lock(&monitor.mutex);
for (i = 2; i < argc; i++)
{
/* This throttles the number of concurrent job submissions */
while (monitor.submit_pending >= CONCURRENT_SUBMITS)
{
globus_cond_wait(&monitor.cond, &monitor.mutex);
}
/* When the job has been submitted, the example_submit_callback
* will be called, either from another thread or from a
* globus_cond_wait in a nonthreaded build
*/
rc = globus_gram_client_register_job_request(
argv[1], argv[i], 0, NULL, NULL, example_submit_callback,
&monitor);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to submit job %s because %s (Error %d)\n",
argv[i], globus_gram_client_error_string(rc), rc);
}
else
{
monitor.submit_pending++;
}
}
/* Wait until the example_submit_callback function has been called for
* each job submission
*/
while (monitor.submit_pending > 0)
{
globus_cond_wait(&monitor.cond, &monitor.mutex);
}
globus_mutex_unlock(&monitor.mutex);
printf("Submitted %d jobs (%d successfully)\n",
argc-2, monitor.successful_submits);
globus_cond_destroy(&monitor.cond);
destroy_mutex:
globus_mutex_destroy(&monitor.mutex);
deactivate:
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_nonblocking_submit_example.c */
This example shows how to submit a job and delegate a full credential to the job. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module functions
* and GRAM Client API functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include <stdio.h>
struct monitor_t
{
globus_mutex_t mutex;
globus_cond_t cond;
globus_bool_t done;
};
static
void
example_submit_callback(
void * user_callback_arg,
globus_gram_protocol_error_t operation_failure_code,
const char * job_contact,
globus_gram_protocol_job_state_t job_state,
globus_gram_protocol_error_t job_failure_code)
{
struct monitor_t * monitor = user_callback_arg;
globus_mutex_lock(&monitor->mutex);
monitor->done = GLOBUS_TRUE;
globus_cond_signal(&monitor->cond);
if (operation_failure_code == GLOBUS_SUCCESS)
{
printf("Submitted job %s\n",
job_contact ? job_contact : "UNKNOWN");
}
else
{
printf("submit failed because %s (Error %d)\n",
globus_gram_client_error_string(operation_failure_code),
operation_failure_code);
}
globus_mutex_unlock(&monitor->mutex);
}
int
main(int argc, char *argv[])
{
int rc;
globus_gram_client_attr_t attr;
struct monitor_t monitor;
if (argc < 3)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT RSL-SPEC...\n",
argv[0]);
rc = 1;
goto out;
}
printf("Submiting job to %s with full proxy\n", argv[1]);
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
rc = globus_mutex_init(&monitor.mutex, NULL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error initializing mutex %d\n", rc);
goto deactivate;
}
rc = globus_cond_init(&monitor.cond, NULL);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error initializing condition variable %d\n", rc);
goto destroy_mutex;
}
monitor.done = GLOBUS_FALSE;
/* Initialize attribute so that we can set the delegation attribute */
rc = globus_gram_client_attr_init(&attr);
/* Set the proxy attribute */
rc = globus_gram_client_attr_set_delegation_mode(
attr,
GLOBUS_IO_SECURE_DELEGATION_MODE_FULL_PROXY);
/* Submit the job rsl from argv[2]
*/
globus_mutex_lock(&monitor.mutex);
/* When the job has been submitted, the example_submit_callback
* will be called, either from another thread or from a
* globus_cond_wait in a nonthreaded build
*/
rc = globus_gram_client_register_job_request(
argv[1], argv[2], 0, NULL, attr, example_submit_callback,
&monitor);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to submit job %s because %s (Error %d)\n",
argv[2], globus_gram_client_error_string(rc), rc);
}
/* Wait until the example_submit_callback function has been called for
* the job submission
*/
while (!monitor.done)
{
globus_cond_wait(&monitor.cond, &monitor.mutex);
}
globus_mutex_unlock(&monitor.mutex);
globus_cond_destroy(&monitor.cond);
destroy_mutex:
globus_mutex_destroy(&monitor.mutex);
deactivate:
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_attr_example.c */
This example shows how to programmatically add environment variable definitions to an RSL prior to submitting a job. The source to this example can be downloaded.
/*
* These headers contain declarations for the globus_module,
* the GRAM Client, RSL, and protocol functions
*/
#include "globus_common.h"
#include "globus_gram_client.h"
#include "globus_rsl.h"
#include "globus_gram_protocol.h"
#include <stdio.h>
#include <strings.h>
static
int
example_rsl_attribute_match(void * datum, void * arg)
{
const char * relation_attribute = globus_rsl_relation_get_attribute(datum);
const char * attribute = arg;
/* RSL attributes are case-insensitive */
return (relation_attribute &&
strcasecmp(relation_attribute, attribute) == 0);
}
int
main(int argc, char *argv[])
{
int rc;
globus_rsl_t *rsl, *environment_relation;
globus_rsl_value_t *new_env_pair = NULL;
globus_list_t *environment_relation_node;
char * rsl_string;
char * job_contact;
if (argc != 3)
{
fprintf(stderr, "Usage: %s RESOURCE-MANAGER-CONTACT RSL\n", argv[0]);
rc = 1;
goto out;
}
/*
* Always activate the GLOBUS_GRAM_CLIENT_MODULE prior to using any
* functions from the GRAM Client API or behavior is undefined.
*/
rc = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Error activating %s because %s (Error %d)\n",
GLOBUS_GRAM_CLIENT_MODULE->module_name,
globus_gram_client_error_string(rc),
rc);
goto out;
}
/* Parse the RSL string into a syntax tree */
rsl = globus_rsl_parse(argv[2]);
if (rsl == NULL)
{
rc = 1;
fprintf(stderr, "Error parsing RSL string\n");
goto deactivate;
}
/* Create the new environment variable pair that we'll insert
* into the RSL. We'll start by making an empty sequence
*/
new_env_pair = globus_rsl_value_make_sequence(NULL);
if (new_env_pair == NULL)
{
fprintf(stderr, "Error creating value sequence\n");
rc = 1;
goto free_rsl;
}
/* Then insert the name-value pair in reverse order */
rc = globus_list_insert(
globus_rsl_value_sequence_get_list_ref(new_env_pair),
globus_rsl_value_make_literal(
strdup("itsvalue")));
if (rc != GLOBUS_SUCCESS)
{
goto free_env_pair;
}
rc = globus_list_insert(
globus_rsl_value_sequence_get_list_ref(new_env_pair),
globus_rsl_value_make_literal(
strdup("EXAMPLE_ENVIRONMENT_VARIABLE")));
if (rc != GLOBUS_SUCCESS)
{
goto free_env_pair;
}
/* Now, check to see if the RSL already contains an environment
* attribute.
*/
environment_relation_node = globus_list_search_pred(
globus_rsl_boolean_get_operand_list(rsl),
example_rsl_attribute_match,
GLOBUS_GRAM_PROTOCOL_ENVIRONMENT_PARAM);
if (environment_relation_node == NULL)
{
/* Not present yet, create a new relation and insert it into
* the RSL.
*/
environment_relation = globus_rsl_make_relation(
GLOBUS_RSL_EQ,
strdup(GLOBUS_GRAM_PROTOCOL_ENVIRONMENT_PARAM),
globus_rsl_value_make_sequence(NULL));
rc = globus_list_insert(
globus_rsl_boolean_get_operand_list_ref(rsl),
environment_relation);
if (rc != GLOBUS_SUCCESS)
{
globus_rsl_free_recursive(environment_relation);
goto free_env_pair;
}
}
else
{
/* Pull the environment relation out of the node returned from the
* search function
*/
environment_relation = globus_list_first(environment_relation_node);
}
/* Add the new environment binding to the value sequence associated with
* the environment relation
*/
rc = globus_list_insert(
globus_rsl_value_sequence_get_list_ref(
globus_rsl_relation_get_value_sequence(environment_relation)),
new_env_pair);
if (rc != GLOBUS_SUCCESS)
{
goto free_env_pair;
}
new_env_pair = NULL;
/* Convert the RSL parse tree to a string */
rsl_string = globus_rsl_unparse(rsl);
/*
* Submit the augmented RSL to the service passed as our first command-line
* option. If successful, this function will return GLOBUS_SUCCESS,
* otherwise an integer error code.
*/
rc = globus_gram_client_job_request(
argv[1],
rsl_string,
0,
NULL,
&job_contact);
if (rc != GLOBUS_SUCCESS)
{
fprintf(stderr, "Unable to submit job to %s because %s (Error %d)\n",
argv[1], globus_gram_client_error_string(rc), rc);
}
else
{
printf("Job submitted successfully: %s\n", job_contact);
}
free(rsl_string);
if (job_contact)
{
free(job_contact);
}
free_env_pair:
if (new_env_pair != NULL)
{
globus_rsl_value_free_recursive(new_env_pair);
}
free_rsl:
globus_rsl_free_recursive(rsl);
deactivate:
/*
* Deactivating the module allows it to free memory and close network
* connections.
*/
rc = globus_module_deactivate(GLOBUS_GRAM_CLIENT_MODULE);
out:
return rc;
}
/* End of gram_rsl_example.c */
The following tutorials are available for GRAM5 developers:
Table of Contents
Table 6.1. GRAM Client APIs
Name | Purpose |
|---|---|
Low-level functions for processing GRAM protocol messages. Symbolic constants for RSL attributes, signals, and job states. | |
Functions for submitting job requests, sending signals, and listening for job state updates. | |
Functions for parsing and manipulating job specifications in the RSL language. |
Table of Contents
This is a document to specify the existing RSL v1.0 implementation and interfaces, as they are provided in the GT 5.2.1 release. This document serves as a reference, and more introductory text.
The Globus Resource
Specification Language (RSL) provides a common interchange language to
describe resources. The various components of the Globus Resource
Management architecture manipulate RSL strings to perform their management
functions in cooperation with the other components in the system. The RSL
provides the skeletal syntax used to compose complicated resource
descriptions, and the various resource management components introduce
specific
ATTRIBUTE,VALUE>
pairings into this common structure. Each attribute in a resource
description serves as a parameter to control the behavior of one or more
components in the resource management
system.
The core syntax of the RSL syntax is the relation.
Relations associate an attribute name with a value, eg the relation
executable=a.out provides the name of an executable in a
resource request. There are two generative syntactic structures
in the RSL that are used to build more complicated resource descriptions
out of the basic relations: compound requests and
value sequences. In addition, the RSL
syntax includes a facility to both introduce and dereference string
substitution variables.
The simplest form of compound request, utilized by all resource management components, is the conjunct-request. The conjuct-request expresses a conjunction of simple relations or compound requests (like a boolean AND). The most common conjunct-request in Globus RSL strings is the combination of multiple relations such as executable name, node count, executable arguments, and output files for a basic GRAM job request. Similarly, the core RSL syntax includes a disjunct-request form to represent disjunctive relations (like a boolean OR). Currently, however, no resource management component utilizes the disjunct-request form.
The last form of compound request is the multi-request.
The multi-request expresses multiple parallel resources that
make up a resource description. The multi-request form differs
from the conjunction and disjunction in two ways: multi-requests introduce
new variable scope, meaning variables defined in one clause of a
multi-request are not visible to the other clauses, and multi-requests
introduce a non-reducible hierarchy to the resource
description. Whereas relations within a conjunct-request can be thought of
as constraints on the resource being described, the
subclauses of a
multi-request are best thought of as individual resource descriptions that
together constitute an abstract
resource collection; the same attributes may be
constrained in different ways in each
subclause without causing a logical contradiction. An example of a
contradiction would be to constrain the executable
attribute to be two conflicting values within a conjunction. Currently,
however, no resource management component utilizes the disjunct-request
form.
The simplest form of value in the RSL syntax is the string literal. When explicitly quoted, literals can contain any character, and many common literals that don't contain special characters can appear without quotes. Values can also be variable references, in which case the variable reference is in essence replaced with the string value defined for that variable. RSL descriptions can also express string-concatenation of values, especially useful to construct long strings out of several variable references. String concatenation is supported with both an explicit concatenation operator and implicit concatenation for many idiomatic constructions involving variable references and literals.
In addition to the simple value forms given above, the RSL syntax includes the value sequence to express ordered sets of values. The value sequence syntax is used primarily for defining variables and for providing the argument list for a program.
Each RSL string consists of a sequence of RSL tokens, whitespace, and comments. The RSL tokens are either special syntax or regular unquoted literals, where special syntax contains one or more of the following listed special characters and unquoted literals are made of sequences of characters excluding the special characters.
The complete set of special characters that cannot appear as part of an unquoted literal is:
+(plus)&(ampersand)|(pipe)((left paren))(right paren)=(equal)<(left angle)>(right angle)!(exclamation)"(double quote)'(apostrophe)^(carat)#(pound)$(dollar)
These characters can only be used for the special syntactic forms described in the section and in the section or as within quoted literals.
Quoted literals are introduced with the "
(double quote) or ' (single quote/apostrophe)
and consist of all the characters up to (but not including) the next solo
double or single quote, respectively. To escape a quote character within a
quoted literal, the appearance of the quote character twice in a row is
converted to a single instance of the character and the literal continues
until the next solo quote character. For any quoted literal, there is only
one possible escape sequence, eg within a literal delimited by the single
quote character only the single quote character uses the escape notation
and the double quote character can appear without escape.
Quoted literals can also be introduced with an alternate
user delimiter notation. User delimited literals are
introduced with the ^ (carat) character followed
immediately by a user-provided delimiter; the literal consists of all
the characters after the user's delimiter up to (but not including) the
next solo instance of the delimiter. The delimiter itself may be escaped
within the literal by providing two instances in a row, just as the regular
quote delimiters are escaped in regular quoted literals.
RSL string comments use a notation similar to comments in the C programming
language. Comments are introduced by the prefix (*.
Comments continue to the first
terminating suffix *) and cannot be nested. Comments are
stripped from the RSL string during processing and are syntactically
equivalent to whitespace.
Example 7.1. Quoted Literal Examples
Assign the value Hello. Welcome to "The Grid" to
the attribute arguments, using double-quote as the
delimiter and the escaping sequence.
arguments = "Hello. Welcome to ""The Grid"""
Assign the value Hello. Welcome to "The Grid" to
the attribute arguments using the single-quote delimiter.
arguments = 'Hello. Welcome to "The Grid'
Assign the value Hello. Welcome to "The Grid" to
the attribute arguments using a user-defined quoting
character !.
arguments = ^!Hello. Welcome to "The Grid"!
RSL strings can introduce and reference string variables. String
substitution variables are defined in a special relation using the
rsl_substitution attribute, and the definitions affect
variable references made in the same conjunct-request (or
disjunct-request), as well as references made within any multi-request
nested inside one of the clauses of the conjunction (or disjunction). Each
multi-request introduces a new variable scope for each subrequest, and
variable definitions do not escape the closest enclosing scope.
Within any given scope, variable definitions are processed left-to-right in the resource description. Outermost scopes are processed before inner scopes, and the definitions in inner scopes augment the inherited definitions with new and/or updated variable definitions.
Variable definitions and variable references are processed in a single pass, with each definition updating the environment prior to processing the next definition. The value provided in a variable definition may include a reference to a previously-defined variable. References to variables that are not yet provided with definitions in the standard RSL variable processing order are replaced with an empty literal string.
The RSL syntax is extensible because it defines structure without too many keywords. Each Globus resource management component introduces additional attributes to the set recognized by RSL-aware components, so it is difficult to provide a complete listing of attributes which might appear in a resource description. Resource management components are designed to utilize attributes they recognize and pass unrecongnized relations through unchanged. This allows powerful compositions of different resource management functions.
The following listing summarizes the attribute names utilized by existing resource management components in the standard Globus release. Please see the individual component documentation for discussion of the attribute semantics.
Name
rsl — GRAM5 RSL Attributes
Description
arguments- The command line arguments for the executable. Use quotes, if a space is required in a single argument.
count- The number of executions of the executable. [Default:
1] directory- Specifies the path of the directory the jobmanager will use as the default directory for the requested job. [Default:
$(HOME)] dry_run- If dryrun = yes then the jobmanager will not submit the job for execution and will return success. [Default:
no] environment- The environment variables that will be defined for the executable in addition to default set that is given to the job by the jobmanager.
executable- The name of the executable file to run on the remote machine. If the value is a GASS URL, the file is transferred to the remote gass cache before executing the job and removed after the job has terminated.
expiration- Time (in seconds) after a a job fails to receive a two-phase commit end signal before it is cleaned up. [Default:
14400] file_clean_up- Specifies a list of files which will be removed after the job is completed.
file_stage_in- Specifies a list of ("remote URL" "local file") pairs which indicate files to be staged to the nodes which will run the job.
file_stage_in_shared- Specifies a list of ("remote URL" "local file") pairs which indicate files to be staged into the cache. A symlink from the cache to the "local file" path will be made.
file_stage_out- Specifies a list of ("local file" "remote URL") pairs which indicate files to be staged from the job to a GASS-compatible file server.
gass_cache- Specifies location to override the GASS cache location.
gram_my_job- Obsolete and ignored. [Default:
collective] host_count- Only applies to clusters of SMP computers, such as newer IBM SP systems. Defines the number of nodes ("pizza boxes") to distribute the "count" processes across.
job_type- This specifies how the jobmanager should start the job. Possible values are single (even if the count > 1, only start 1 process or thread), multiple (start count processes or threads), mpi (use the appropriate method (e.g. mpirun) to start a program compiled with a vendor-provided MPI library. Program is started with count nodes), and condor (starts condor jobs in the "condor" universe.) [Default:
multiple] library_path- Specifies a list of paths to be appended to the system-specific library path environment variables. [Default:
$(GLOBUS_LOCATION)/lib] loglevel- Override the default log level for this job. The value of this attribute consists of a combination of the strings FATAL, ERROR, WARN, INFO, DEBUG, TRACE joined by the | character
logpattern- Override the default log path pattern for this job. The value of this attribute is a string (potentially containing RSL substitutions) that is evaluated to the path to write the log to. If the resulting string contains the string $(DATE) (or any other RSL substitution), it will be reevaluated at log time.
max_cpu_time- Explicitly set the maximum cputime for a single execution of the executable. The units is in minutes. The value will go through an atoi() conversion in order to get an integer. If the GRAM scheduler cannot set cputime, then an error will be returned.
max_memory- Explicitly set the maximum amount of memory for a single execution of the executable. The units is in Megabytes. The value will go through an atoi() conversion in order to get an integer. If the GRAM scheduler cannot set maxMemory, then an error will be returned.
max_time- The maximum walltime or cputime for a single execution of the executable. Walltime or cputime is selected by the GRAM scheduler being interfaced. The units is in minutes. The value will go through an atoi() conversion in order to get an integer.
max_wall_time- Explicitly set the maximum walltime for a single execution of the executable. The units is in minutes. The value will go through an atoi() conversion in order to get an integer. If the GRAM scheduler cannot set walltime, then an error will be returned.
min_memory- Explicitly set the minimum amount of memory for a single execution of the executable. The units is in Megabytes. The value will go through an atoi() conversion in order to get an integer. If the GRAM scheduler cannot set minMemory, then an error will be returned.
project- Target the job to be allocated to a project account as defined by the scheduler at the defined (remote) resource.
proxy_timeout- Obsolete and ignored. Now a job-manager-wide setting.
queue- Target the job to a queue (class) name as defined by the scheduler at the defined (remote) resource.
remote_io_url- Writes the given value (a URL base string) to a file, and adds the path to that file to the environment throught the GLOBUS_REMOTE_IO_URL environment variable. If this is specified as part of a job restart RSL, the job manager will update the file's contents. This is intended for jobs that want to access files via GASS, but the URL of the GASS server has changed due to a GASS server restart.
restart- Start a new job manager, but instead of submitting a new job, start managing an existing job. The job manager will search for the job state file created by the original job manager. If it finds the file and successfully reads it, it will become the new manager of the job, sending callbacks on status and streaming stdout/err if appropriate. It will fail if it detects that the old jobmanager is still alive (via a timestamp in the state file). If stdout or stderr was being streamed over the network, new stdout and stderr attributes can be specified in the restart RSL and the jobmanager will stream to the new locations (useful when output is going to a GASS server started by the client that's listening on a dynamic port, and the client was restarted). The new job manager will return a new contact string that should be used to communicate with it. If a jobmanager is restarted multiple times, any of the previous contact strings can be given for the restart attribute.
rsl_substitution- Specifies a list of values which can be substituted into other rsl attributes' values through the $(SUBSTITUTION) mechanism.
save_state- Causes the jobmanager to save it's job state information to a persistent file on disk. If the job manager exits or is suspended, the client can later start up a new job manager which can continue monitoring the job.
savejobdescription- Save a copy of the job description to $HOME [Default:
no] scratch_dir- Specifies the location to create a scratch subdirectory in. A SCRATCH_DIRECTORY RSL substitution will be filled with the name of the directory which is created.
stderr- The name of the remote file to store the standard error from the job. If the value is a GASS URL, the standard error from the job is transferred dynamically during the execution of the job. There are two accepted forms of this value. It can consist of a single destination: stderr = URL, or a sequence of destinations: stderr = (DESTINATION) (DESTINATION). In the latter case, the DESTINATION may itself be a URL or a sequence of an x-gass-cache URL followed by a cache tag. [Default:
/dev/null] stderr_position- Specifies where in the file remote standard error streaming should be restarted from. Must be 0.
stdin- The name of the file to be used as standard input for the executable on the remote machine. If the value is a GASS URL, the file is transferred to the remote gass cache before executing the job and removed after the job has terminated. [Default:
/dev/null] stdout- The name of the remote file to store the standard output from the job. If the value is a GASS URL, the standard output from the job is transferred dynamically during the execution of the job. There are two accepted forms of this value. It can consist of a single destination: stdout = URL, or a sequence of destinations: stdout = (DESTINATION) (DESTINATION). In the latter case, the DESTINATION may itself be a URL or a sequence of an x-gass-cache URL followed by a cache tag. [Default:
/dev/null] stdout_position- Specifies where in the file remote output streaming should be restarted from. Must be 0.
two_phase- Use a two-phase commit for job submission and completion. The job manager will respond to the initial job request with a WAITING_FOR_COMMIT error. It will then wait for a signal from the client before doing the actual job submission. The integer supplied is the number of seconds the job manager should wait before timing out. If the job manager times out before receiving the commit signal, or if a client issues a cancel signal, the job manager will clean up the job's files and exit, sending a callback with the job status as GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED. After the job manager sends a DONE or FAILED callback, it will wait for a commit signal from the client. If it receives one, it cleans up and exits as usual. If it times out and save_state was enabled, it will leave all of the job's files in place and exit (assuming the client is down and will attempt a job restart later). The timeoutvalue can be extended via a signal. When one of the following errors occurs, the job manager does not delete the job state file when it exits: GLOBUS_GRAM_PROTOCOL_ERROR_COMMIT_TIMED_OUT, GLOBUS_GRAM_PROTOCOL_ERROR_TTL_EXPIRED, GLOBUS_GRAM_PROTOCOL_ERROR_JM_STOPPED, GLOBUS_GRAM_PROTOCOL_ERROR_USER_PROXY_EXPIRED. In these cases, it can not be restarted, so the job manager will not wait for the commit signal after sending the FAILED callback
username- Verify that the job is running as this user.
The following are some simple example RSL strings to illustrate idiomatic usage with existing tools and to make concrete some of the more interesting cases of tokenization, concatenation, and variable semantics. These are meant to illustrate the use of the RSL notation without much regard for the specific details of a particular resource management component.
Typical GRAM5 resource descriptions contain at least a few relations in a conjunction:
Example 7.2. GRAM5 Job Request Examples
This example shows a conjunct request containing values that are unquoted literals and ordered sequences of a mix of quoted and unquoted literals.
(* this is a comment *) & (executable = a.out (* <-- that is an unquoted literal *)) (directory = /home/nobody ) (arguments = arg1 "arg 2") (count = 1)
This example demonstrates RSL substitutions, which can be used to make sure a string is used consistently multiple times in a resource description:
& (rsl_substitution = (TOPDIR "/home/nobody")
(DATADIR $(TOPDIR)"/data")
(EXECDIR $(TOPDIR)/bin) )
(executable = $(EXECDIR)/a.out
(* ^-- implicit concatenation *))
(directory = $(TOPDIR) )
(arguments = $(DATADIR)/file1
(* ^-- implicit concatenation *)
$(DATADIR) # /file2
(* ^-- explicit concatenation *)
'$(FOO)' (* <-- a quoted literal *))
(environment = (DATADIR $(DATADIR)))
(count = 1)
Performing all variable substitution and removing comments yields an equivalent RSL string:
& (rsl_substitution = (TOPDIR "/home/nobody")
(DATADIR "/home/nobody/data")
(EXECDIR "/home/nobody/bin") )
(executable = "/home/nobody/bin/a.out" )
(directory = "/home/nobody" )
(arguments = "/home/nobody/data/file1"
"/home/nobody/data/file2"
"$(FOO)" )
(environment = (DATADIR "/home/nobody/data"))
(count = 1)
Note in the above variable-substitution example, the variable
substitution definitions are not automatically made a part of the job's
environment. And explicit environment attribute must be
used to add environment variables for the job. Also note that the third
value in the arguments clause is not a variable reference but
only quoted literal that happens to contain one of the special characters.
The following is a modified BNF grammar for the Resource
Specification Language. Lexical rules are provided for
the implicit concatenation sequences in the form of conventional regular
expressions; for the implicit-concat non-terminal
rules, whitespace is not allowed between juxtaposed non-terminals. Grammar
comments are provided in square brackets in a column to the right
of the productions, eg [comment] to help relate
productions in the grammar to the terminology used in the above discussion.
Regular expressions are provided for the terminal class
string-literal and for RSL comments. These regular
expression make use of a common inverted character-class notation,
as popularized by the various lex tools.
Comments are syntactically equivalent to whitespace and can only appear
where the comment prefix cannot be mistaken for the trailing part of a
multi-character unquoted literal.
| RSL Grammar | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Table of Contents
Log output from GRAM5 is a useful tool for debugging issues. GRAM5 can log to either local files or syslog. See the Admin Guide for information about how to configure logging.
In most cases, logging at the INFO level will
produce enough information to show progress of most operations. Adding
DEBUG will also allow log information from the GRAM
LRM scripts.
The first thing to determine when debugging unexpected failures is to determine whether the gatekeeper service is running, reachable from the client, and properly configured.
First, determine that the gatekeeper is running by using a tool such as telnet to connect to the TCP/IP port that the gatekeeper is listening on. From the GRAM service node, using a default configuration, use a command like:
%telnet localhost 2119Trying 127.0.0.1... Connected to localhost. Escape character is '^]'
An error message like the following indicates that the gatekeeper service is not starting:
telnet: connect to address 127.0.0.1: Connection refused
telnet: Unable to connect to remote host
If the telnet command exits immediately, then the gatekeeper service
is being started but not running. Check the gatekeeper log (by default
)
to see if there is an error message. A common error is having a missing
library path environment variable in the gatekeeper's environment or
having a malformed configuration file. See the globus-gatekeeper for information
on the configuration options.
$GLOBUS_LOCATION/var/globus-gatekeeper.log
The next recommended diagnostic is to run the same telnet command from
the machine which is acting as the GRAM client if it is distinct from
the GRAM service node. Be sure to replace localhost
with the actual host name of the GRAM service. Again, check for log
entries in the case of immediate exit or refused connection. If the
connection does not work, then there may be some network connectivity
or firewall issues preventing access.
Next use a tool like globusrun to diagnose whether
the client is authorized to contact the gatekeeper service. This is
done by using the -a command-line option. For example:
%globusrun -a -r grid.example.orgGRAM Authentication test successful
If you do not get the success message above, then check the gatekeeper log to see if there is a diagnostic message. A common problem is that the identity of the client is not in the grid mapfile used by the gatekeeper.
The next test is to use the -dryrun option to
globusrun to verify that the job manager service
is properly configured. To do so, try the following:
%globusrun -dryrun -r grid.example.org "&(executable=/bin/sh)"globus_gram_client_callback_allow successful Dryrun successful
If you do not get the success message above, first check the error
number in the GRAM5 Error codes
table to determine how to proceed. If the result is unclear,
check the job manager log (default
)
to see if there are any further details of the error.
$HOME/gram_DATE.log
The final test is to submit a test job to the GRAM5 service and wait for it to terminate, such as this example shows:
%globus-job-run grid.example.org /bin/sh -c 'echo "hello, grid"'hello, grid
If the process appears to hang, it might be that the job manager is unable to send state callbacks to the client. Check that there are no firewalls or network issues that would prevent the job manager process from connecting from the GRAM service node to the client node.
The methods described in this section are intended for debugging problems in the GRAM code, not in the user environment.
To debug the GRAM5 job manager, run the command located in
(ignoring the first 3 fields). For example:
$GLOBUS_LOCATION/etc/grid-services/jobmnager-LRM
% $GLOBUS_LOCATION/libexec/globus-job-manager \
-conf $GLOBUS_LOCATION/etc/globus-job-manager.conf -type fork
When the job manager is started in this way, it will log messages to standard error and will terminate 60 seconds after its last job has completed. This only works if there are no job managers running for this particular user. The job manager can be started in a debugger such as gdb or valgrind using a similar command-line.
Table of Contents
For a list of error codes generated by GRAM5, see Section 3, “Errors”.
GRAM requires a client certificate and private key in order
authenticate with the GRAM service. If these are not available, the
GRAM client will fail. In typical use, a user will create a temporary
proxy certificate either derived from their identity certificate issued
by some certificate authority, or from a service such as myproxy. If a
GRAM client command returns any error containing the string
GSS Major Status you've hit a credential problem.
Look at the Troubleshooting Section
of the GSI manual for details about how to diagnose and correct these
errors. The grid-cert-diagnostics tool
with the -p command-line option is especially helpful
for diagnosing some of these types of problems.
There are a few things which can go wrong when trying to contact a GRAM service. These have slightly different error types which can help diagnose which problem is occurring.
If the hostname or TCP port you are using for a GRAM resource name is not correct, then the GRAM client will be unable to access the service. Errors of this type will look like this:
%globus-job-run grid.example.org/jobmanager-fork /bin/hostnameGRAM Job submission failed because the connection to the server failed (check host and port) (error code 12)
When this occurs, check with the resource administrator for correct resource naming so that you can contact the service.
GRAM performs mutual authentication, that is, both the client and service
provide certificates indicating who they are. The service uses the client's
identity to map the user to a local unix account. The client uses the
server's identity to verify that the service is running with a host
credential. The failure of the client to trust the server's certificate
will generate an error message that looks like this:
globus_gsi_gssapi: Authorization denied: The expected name for the remote host (host@alias.example.org) does not match the authenticated name of the remote host (host@grid.example.org). This happens when the name in the host certificate does not match the information obtained from DNS and is often a DNS configuration problem.
This mismatch can happen for a number of reasons: a site administrator has
multiple hosts sharing a certificate, a host has multiple DNS aliases, and
the client is not aware of which name the server is using for its
certificate, or a host's name has changed since the certificate was issued.
The remedy for the client, after confirming with the GRAM administrator
that the name after "authenticated name of the remote host" is the correct
certificate name is to use a form of the GRAM resource name which includes
this name. For example, explicitly adding a name to the abbreviated GRAM
contact so that instead of alias.example.org, you would
use alias.example.org::host@grid.example.org.
Because of the mutual authentication, both GRAM users and services can hit problems if they do not trust their peer's certificate or the Certificate Authority which issued it. If the client doesn't trust the server's certificate, it is easier to diagnose, because the GRAM service doesn't send much information back to the client if it doesn't trust it. However, working with the system administrator to get information from the GRAM logs will usually fix these problems fairly easily.
If the service's certificate is not trusted, the client will receive a message like this:
%globus-job-run grid.example.org /bin/hostnameGRAM Job submission failed because an authentication operation failed OpenSSL Error: s3_clnt.c:915: in library: SSL routines, function SSL3_GET_SERVER_CERTIFICATE: certificate verify failed globus_gsi_callback_module: Could not verify credential globus_gsi_callback_module: Can't get the local trusted CA certificate: Untrusted self-signed certificate in chain with hash bbfccedf
This error indicates that certificate chain from the service
certificate to the client contained a self-signed certificate (usually an
indication that it's a CA certificate), which the client doesn't trust, and
includes the hash of the certificate name (bbfccedf in
this case). If you hit this particular type
of error, you should send the information to the GRAM administrator and
determine which CA should be trusted and what its signing policy is, to
determine if you want to add it to your local set of trust roots.
![]() | Note |
|---|---|
Different versions of OpenSSL produce different hashes for the same certificate names. If you upgrade a system (or transfer CA certificates between systems) to a different version of OpenSSL, you may hit this problem even if you think you have the CA certificate in your trusted certificate directory. If so, run the globus-update-certificate-dir program to update your hashes. |
There are other reasons why a certificate might not be trusted (it's in a revoked list, it has expired or was issued in the future, etc). For more details look at the troubleshooting information in the GSI user's guide.
If for some reason the service does not trust your certificate, you'll get a rather cryptic message from GRAM that looks like this:
%globus-job-run grid.example.org /bin/hostnameGRAM Job submission failed because an authentication operation failed globus_gsi_gssapi: Unable to verify remote side's credentials globus_gsi_gssapi: Unable to verify remote side's credentials: Couldn't verify the remote certificate OpenSSL Error: s3_pkt.c:1086: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate SSL alert number 42 (error code 7)
To remedy this, consult the GRAM administrator to get information from the
/var/log/globus-gatekeeper.log file to determine the
reason why the gatekeeper didn't like your certificate. Again it could be
CA trust issues, clock skew, or a revoked certificate. The error in the
gatekeeper log would typically look like the client-side trust issue above.
Once the GRAM service has authenticated the client, it maps the client's identity to a local user account using a grid-mapfile or other mapping service. If this fails, the client will receive a message that looks like this:
%globus-job-run grid.example.org /bin/hostnameGRAM Job submission failed because authentication with the remote server failed (error code 7)
To remedy this, consult the system administrator of the GRAM resource to be added to the authorized user's list. Be sure to send your credential subject name to make it easier for them. To get that information, run the command grid-cert-info -s.
Recall that a GRAM resource name includes a component called the
service name. The default if not specified is
jobmanager, but some sites may not use that
name, or have a different LRM name than you expect. If you specify
an incorrect service name, or the default is not present, you'll get
an error that looks like this:
%globus-job-run grid.example.org /bin/hostnameGRAM Job submission failed because the gatekeeper failed to find the requested service (error code 93)
If you get this error, you'll need to determine which services are
available on that GRAM resource, either by asking the admin or by looking
at the entries in /etc/grid-services
The GRAM service is split between a priveleged process called the
globus-gatekeeper and a non-privileged process called
the globus-job-manager which runs as a user process. If the
globus-gatekeeper is unable to locate the
globus-job-manager process, then this misconfiguration will
show up like this:
%globus-job-run grid.example.org /bin/hostnameGRAM Job submission failed because the gatekeeper failed to run the job manager (error code 47)
This is an installation mistake, and the administrator of the GRAM resource must fix this.
One problem GRAM users sometimes encounter is that it looks like jobs submitted to GRAM are not making any progress, even though the local resource manager thinks they've run. There are a couple of reasons why this might occur: GRAM is not getting the information it needs from the local resource manager or the GRAM client is not getting the information it needs. We'll cover diagnosing and handling the latter case in this document, as the other is an system administrator issue.
The way globus-job-run and globusrun determine that jobs have completed is via GRAM job state callbacks. These are messages sent by the GRAM service to the client node indicating that something significant has happened in the lifecycle of the job. If for some reason the GRAM service can not get those messages to the client, the client will not be able to detect job state changes.
In order to determine if this is the case, submit a job using globus-job-submit, and then use the globus-job-status command to see if the job state changes. If it does not, then consult the GRAM administrator---there might be some problem with the installation. If it does, then for some reason the callbacks are not happening. This might be firewall issues or host naming issues.
The GRAM client sends a "callback contact" to the GRAM service when it submits
a job, in order that it can receive notifications. This contact is a reference
to a https server embedded in the GRAM client which only handles GRAM state
callbacks. As with all web servers, it has a URL which defines how to contact it,
which in this case consists of the client host name and the service port number.
If the host name that is used is not resolvable (such as a for a laptop with a
dynamic address), then the GRAM service will not be able to contact it. If
that's the case, you can set the GLOBUS_HOSTNAME
environment variable to the IP address that your client can be reached at, and
then submit your jobs. This will cause GRAM to publish that address instead of
what it thinks the client's host name is.
Another way that the GRAM service would be unable to send job state updates to
a client would be if there's a firewall between the service and the client.
If that's the case, you might need to set the GLOBUS_TCP_PORT_RANGE
environment variable to a comma-separated list of numbers which represent a range of
minimum and maximum TCP port numbers to listen on. You might have to contact
your site administrator to determine what TCP ports are allowed. If there are none,
you can still use globus-job-submit and
globus-job-status to track your job's state changes, or
use another tool like those mentioned in the section about client tools.
The GRAM service has a log file which contains information about the job as
it is processed. These logs are located by default in
/var/log/globus/gram_.
There are some different logging levels available, as
described in
the GRAM Adminstrator's Guide. These can be controlled on a per-job
basis by adding the $USERNAME.logloglevel RSL attribute to your job
description. The default is to log only FATAL and
ERROR messages, but other levels can sometimes help
understand what is going on.
Sometimes, bugs creep into the LRM adapter scripts. When that occurs, the GRAM job will usually fail with an error like this:
GRAM Job failed because the job manager detected an invalid script status (error
code 25)
If this occurs, you may have to work with a GRAM administrator to
help debug this problem. One helpful thing you can do when reporting it is
to save the GRAM internal script data so that it can be used outside of the
GRAM service to see what the low-level error looks like. To do this, add the
RSL fragment (savejobdescription = yes) to your job
request. This will cause GRAM to leave a file called something like
$HOME/gram_[0-9]*.pl in your home directory. You can use this with
the internal tool /usr/share/globus/globus-job-manager-script.pl to
try to submit the job to the LRM without using the GRAM service. The command line
/usr/share/globus/globus-job-manager-script.pl -m LRM -c submit -f GRAM-PL-FILE will attempt to submit the job to the LRM. It will show all the information the LRM
script sends to the GRAM service, which might include some perl-language error or badly formatted output from the script (which must only output lines which begin with GRAM_SCRIPT_.
In some extreme cases, the savejobdescription option will not generate a
file. If that's the case, pass /dev/null as the argument
to the -f command-line option. The problem is likely a
perl syntax error which will be reached before the job description is loaded.
If all else fails, please send information about your problem to <gram-user@globus.org>.
You'll have to subscribe to a list before you can send an e-mail to it.
See here for
general e-mail lists and information on how to subscribe to a list and
here
for GRAM specific lists. Depending on the problem, you may be requested to file a bug report to
the globus project's Issue Tracker.
GRAM requires a host certificate and private key in order for the globus-gatekeeeper service to run. These are typically located in
/etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem, but the path is configurable in the
gatekeeper
configuration file. The key must be protected by file permissions allowing
only the root user to read it.
GRAM also (by default) uses a grid-mapfile to authorize Grid
users as local users. This file is typically located in /etc/grid-security/grid-mapfile, but is configurable in the gatekeeper
configuration file.
Problems in either of these configurations will show up in the gatekeeper log
described below. See the GSI documentation for more
detailed information about obtaining and installing host certificates and maintaining a
grid-mapfile.
GRAM relies on the globus-gatekeeper program and (in some cases)
the globus-scheduler-event-generator programs to process jobs. If the
former is not running, jobs requests will fail with a "connection refused" error. If the
latter is not running, GRAM jobs will appear to "hang" in the PENDING
state.
The globus-gatekeeper is typically started via an init script
installed in /etc/init.d/globus-gatekeeper. The command /etc/init.d/globus-gatekeeper status will indicate whether the service is
running. See Section 2, “Starting and Stopping GRAM5 services” for more information about starting and stopping the globus-gatekeeper program.
If the globus-gatekeeper service fails to start, the output of the command globus-gatekeeper -test will output information describing some types of configuration problems.
The globus-scheduler-event-generator is typically started via an
init script installed in /etc/init.d/globus-scheduler-event-generator. It is only needed when the
LRM-specific "setup-seg" package is installed. The command /etc/init.d/globus-scheduler-event-generator status will indicate whether
the service is running. See Section 2, “Starting and Stopping GRAM5 services” for more information about starting
and stopping the globus-scheduler-event-generator program.
The globus-gatekeeper program starts the globus-job-manager service with different command-line parameters depending on the LRM being used. Use the command globus-gatekeeper-admin -l to list which LRMs the gatekeeper is configured to use.
The globus-job-manager-script.pl is the interface between the GRAM job manager process and the LRM adapter. The command /usr/share/globus/globus-job-manager-script.pl -h will print the list of available adapters.
%/usr/share/globus/globus-job-manager-script.pl -hUSAGE: /usr/share/globus/globus-job-manager-script.pl -m MANAGER -f FILE -c COMMAND Installed managers: condor fork
The globus-scheduler-event-generator also uses an LRM-specific module to generate scheduler events for GRAM to reduce the amount of resources GRAM uses on the machine where it runs. To determine which LRMs are installed and configured, use the command globus-scheduler-event-generator-admin -l.
%globus-scheduler-event-generator-admin -lfork [DISABLED]
If any of these do not show the LRM you are trying to use, install the relevant packages related to that LRM and restart the GRAM services. See the GRAM Administrator's Guide for more information about starting and stopping the GRAM services.
All GRAM5 LRM adapters have a configuration file for site customizations, such as queue names, paths to executables needed to interface with the LRM, etc. Check that the values in these files are correct. These files are described in Section 4, “LRM Adapter Configuration”.
The /var/log/globus-gatekeeper.log file contains information
about service requests from clients, and will be useful when diagnosing service startup
failures, authentication failures, and authorization failures.
GRAM uses GSI to authenticate client job requests. If there is a problem with the GSI configuration for your host, or a client is trying to connect with a certificate signed by a CA your host does not trust, the job request will fail. This will show up in the log as a "GSS authentication failure". See the GSI Administrator's Guide for information about diagnosing authentication failures.
After authentication is complete, GRAM maps the Grid identity to a local user prior to starting the globus-job-manager process. If this fails, an error will show up in the log as "globus_gss_assist_gridmap() failed authorization". See the GSI Administrator's Guide for information about managing gridmap files.
A per-user job manager log is typically located in
/var/log/globus/gram_.
This log contains information from the job manager as it attempts
to execute GRAM jobs via a local resource manager. The logs can be
fairly verbose. Sometimes looking for log entries near those
containing the string $USERNAME.loglevel=ERROR will show more information
about what caused a particular failure.
Once you've found an error in the log, it is generally useful to find log entries
related to the job which hit that error. There are two job IDs associated with
each job, one a GRAM-specific ID, and one an LRM-specific ID. To determine the
GRAM ID associated with a job, look for the attribute
gramid in the log message. Finding that, looking for all
other log messages which contain that gramid value will
give a better picture of what the job manager is doing. To determine the
LRM-specific ID, look for a message at TRACE level with the
matching GRAM ID found above with the response value matching
GRAM_SCRIPT_JOB_ID:LRM-ID. You
can then find follow the state of the LRM-ID as well
as the GRAM ID in the log, and correlate the LRM-ID
information with local resource manager logs and administrative tools.
If all else fails, please send information about your problem to
<gram-user@globus.org>. You'll have to subscribe to a list before you
can send an e-mail to it. See here for general e-mail lists and information on how to subscribe to a list
and here for
GRAM-specific lists. Depending on the problem, you may be requested to file a bug report
to the Globus project's Issue Tracker.
Table 9.1. GRAM5 Errors
| Error Code | Reason | Possible Solutions |
|---|---|---|
| 1 | one of the RSL parameters is not supported | Check RSL documentation |
| 2 | the RSL length is greater than the maximum allowed | Use RSL substitutions to reduce length of RSL strings |
| 3 | an I/O operation failed | Enable trace logging and report to gram-dev@globus.org |
| 4 | jobmanager unable to set default to the directory requested | Check that RSL directory attribute refers to a directory that exists on the target system. |
| 5 | the executable does not exist | Check that the RSL executable attribute refers to an executable that exists on the target system. |
| 6 | of an unused INSUFFICIENT_FUNDS | Unimplemented feature. |
| 7 | authentication with the remote server failed | Check that the contact string contains the proper X.509 DN. |
| 8 | the user cancelled the job | Don't cancel jobs you want to complete. |
| 9 | the system cancelled the job | Check RSL requirements such as maximum time and memory are valid for the job. |
| 10 | data transfer to the server failed | Check gatekeeper and/or job manager logs to see why the process failed. |
| 11 | the stdin file does not exist | Check that the RSL stdin attribute refers to a file that exists on the target system or has a valid ftp, gsiftp, http, or https URL. |
| 12 | the connection to the server failed (check host and port) | Check that the service is running on the expected TCP/IP port.
Check that no firewall prevents contacting that TCP/IP port.
Check for runtme configuration errors. |
| 13 | the provided RSL 'maxtime' value is not an integer | Check that the RSL maxtime value evaluates to an integer. |
| 14 | the provided RSL 'count' value is not an integer | Check that the RSL count value evaluates to an integer. |
| 15 | the job manager received an invalid RSL | Check that the RSL string can be parsed by using globusrun -p RSL. |
| 16 | the job manager failed in allowing others to make contact | Check job manager log. |
| 17 | the job failed when the job manager attempted to run it | Verify that the LRM is configured properly. |
| 18 | an invalid paradyn was specified | OBSOLETE IN GRAM2 |
| 19 | the provided RSL 'jobtype' value is invalid | The RSL jobtype attribute is not indicated as supported by the LRM. Valid jobtype values are single, multiple, mpi, and condor. |
| 20 | the provided RSL 'myjob' value is invalid | OBSOLETE IN GRAM5 |
| 21 | the job manager failed to locate an internal script argument file | Check that exists and is executable.
Check that the LRM-specific perl module is located in directory and is valid. The command perl -I$GLOBUS_LOCATION/lib/perl $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/LRM.pm can be used to check if there are any syntax errors in the script. |
| 22 | the job manager failed to create an internal script argument file | Check that your home directory is writable and not full. |
| 23 | the job manager detected an invalid job state | Check job manager logs. |
| 24 | the job manager detected an invalid script response | Check job manager logs. This is likely a bug in the LRM script. |
| 25 | the job manager detected an invalid script status | Check job manager logs. This is likely a bug in the LRM script. |
| 26 | the provided RSL 'jobtype' value is not supported by this job manager | Check that the RSL jobtype attribute is implemented by the LRM script. Note that some job types require configuration |
| 27 | unused ERROR_UNIMPLEMENTED | LRM does not support some feature included in the job request. |
| 28 | the job manager failed to create an internal script submission file | Check that the user's home file system is not full. Check job manager log |
| 29 | the job manager cannot find the user proxy | Check that client is delegating a proxy when authenticating with the gatekeeper.
Check that the user's home filesystem and the /tmp file system are not full. |
| 30 | the job manager failed to open the user proxy | Check that the user's home filesystem and the /tmp file system are not full. |
| 31 | the job manager failed to cancel the job as requested | Check that the user's home filesystem and the /tmp file system are not full. |
| 32 | system memory allocation failed | Check job manager log for details. |
| 33 | the interprocess job communication initialization failed | OBSOLETE IN GRAM5 |
| 34 | the interprocess job communication setup failed | OBSOLETE IN GRAM5 |
| 35 | the provided RSL 'host count' value is invalid | Check that the RSL host_count attribute evaluates to an integer. |
| 36 | one of the provided RSL parameters is unsupported | Check job manager log for details about invalid parameter. |
| 37 | the provided RSL 'queue' parameter is invalid | Check that the RSL queue attribute evaluates to a string that corresponds to an LRM-specific queue name. |
| 38 | the provided RSL 'project' parameter is invalid | Check that the RSL project attribute evaluates to a string that corresponds to an LRM-specific project name. |
| 39 | the provided RSL string includes variables that could not be identified | Check that all RSL substitutions are defined before being used in the job description. |
| 40 | the provided RSL 'environment' parameter is invalid | Check that the RSL environment attribute contains a sequence of VARIABLE VALUE pairs. |
| 41 | the provided RSL 'dryrun' parameter is invalid | Remove the RSL dryrun attribute from the job description. |
| 42 | the provided RSL is invalid (an empty string) | Include a non-empty RSL string in your job submission request. |
| 43 | the job manager failed to stage the executable | Check that the file service hosting the executable is reachable from the GRAM5 service node. Check that the executable exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the executable. |
| 44 | the job manager failed to stage the stdin file | Check that the file service hosting the standard input file is reachable from the GRAM5 service node. Check that the standard input file exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the standard input file. |
| 45 | the requested job manager type is invalid | OBSOLETE IN GRAM5 |
| 46 | the provided RSL 'arguments' parameter is invalid | OBSOLETE IN GRAM2 |
| 47 | the gatekeeper failed to run the job manager | Check the gatekeeper or job manager logs for more information. |
| 48 | the provided RSL could not be properly parsed | Check that the RSL string can be parsed by using globusrun -p RSL. |
| 49 | there is a version mismatch between GRAM components | Ask system administrator to upgrade GRAM service to GRAM2 or GRAM5 |
| 50 | the provided RSL 'arguments' parameter is invalid | Check that the RSL arguments attribute evaluates to a sequence of strings. |
| 51 | the provided RSL 'count' parameter is invalid | Check that the RSL count attribute evaluates to a positive integer value. |
| 52 | the provided RSL 'directory' parameter is invalid | Check that the RSL directory attribute evaluates to a string. |
| 53 | the provided RSL 'dryrun' parameter is invalid | Check that the RSL dryrun attribute evaluates to either yes or no. |
| 54 | the provided RSL 'environment' parameter is invalid | Check that the RSL environment attribute evaluates to a sequence of VARIABLE, VALUE pairs. |
| 55 | the provided RSL 'executable' parameter is invalid | Check that the RSL executable attribute evaluates to a string value. |
| 56 | the provided RSL 'host_count' parameter is invalid | Check that the RSL host_count attribute evaluates to a positive integer value. |
| 57 | the provided RSL 'jobtype' parameter is invalid | Check that the RSL jobtype attribute evaluates to one of single, multiple, mpi, or condor |
| 58 | the provided RSL 'maxtime' parameter is invalid | Check that the RSL maxtime attribute evaluates to a positive integer value. |
| 59 | the provided RSL 'myjob' parameter is invalid | OBSOLETE IN GRAM5. |
| 60 | the provided RSL 'paradyn' parameter is invalid | OBSOLETE IN GRAM2. |
| 61 | the provided RSL 'project' parameter is invalid | Check that the RSL project attribute evaluates to a string value. |
| 62 | the provided RSL 'queue' parameter is invalid | Check that the RSL queue attribute evaluates to a string value. |
| 63 | the provided RSL 'stderr' parameter is invalid | Check that the RSL stderr attribute evaluates to a string value or a sequence of DESTINATION URLs with optional CACHE_TAG string parameters. |
| 64 | the provided RSL 'stdin' parameter is invalid | Check that the RSL stdin attribute evaluates to a string value. |
| 65 | the provided RSL 'stdout' parameter is invalid | Check that the RSL stdout attribute evaluates to a string value or a sequence of DESTINATION URLs with optional CACHE_TAG string parameters. |
| 66 | the job manager failed to locate an internal script | Check job manager log for more details. |
| 67 | the job manager failed on the system call pipe() | OBSOLETE IN GRAM5 |
| 68 | the job manager failed on the system call fcntl() | OBSOLETE IN GRAM2 |
| 69 | the job manager failed to create the temporary stdout filename | OBSOLETE IN GRAM5 |
| 70 | the job manager failed to create the temporary stderr filename | OBSOLETE IN GRAM5 |
| 71 | the job manager failed on the system call fork() | OBSOLETE IN GRAM2 |
| 72 | the executable file permissions do not allow execution | Check that the RSL executable attribute refers to an executable program or script. |
| 73 | the job manager failed to open stdout | Check that the RSL stdout attribute refers to one or more valid destination files or URLs. |
| 74 | the job manager failed to open stderr | Check that the RSL stderr attribute refers to one or more valid destination files or URLs. |
| 75 | the cache file could not be opened in order to relocate the user proxy | Check that the user's home directory is writable and not full on the GRAM5 service node. |
| 76 | cannot access cache files in ~/.globus/.gass_cache, check permissions, quota, and disk space | Check that the user's home directory is writable and not full on the GRAM5 service node. |
| 77 | the job manager failed to insert the contact in the client contact list | Check job manager log |
| 78 | the contact was not found in the job manager's client contact list | Don't attempt to unregister callback contacts that are not registered |
| 79 | connecting to the job manager failed. Possible reasons: job terminated, invalid job contact, network problems, ... | Check that the job manager process is running. Check that the job manager credential has not expired. Check that the job manager contact refers to the correct TCP/IP host and port. Check that the job manager contact is not blocked by a firewall. |
| 80 | the syntax of the job contact is invalid | Check the syntax of job contact string. |
| 81 | the executable parameter in the RSL is undefined | Include the RSL executable in all job requests. |
| 82 | the job manager service is misconfigured. condor arch undefined | Add the -condor-arch to the command-line or configuration file for a job manager configured to use the condor LRM. |
| 83 | the job manager service is misconfigured. condor os undefined | Add the -condor-os to the command-line or configuration file for a job manager configured to use the condor LRM. |
| 84 | the provided RSL 'min_memory' parameter is invalid | Check that the RSL min_memory attribute evaluates to a positive integer value. |
| 85 | the provided RSL 'max_memory' parameter is invalid | Check that the RSL max_memory attribute evaluates to a positive integer value. |
| 86 | the RSL 'min_memory' value is not zero or greater | Check that the RSL min_memory attribute evaluates to a positive integer value. |
| 87 | the RSL 'max_memory' value is not zero or greater | Check that the RSL max_memory attribute evaluates to a positive integer value. |
| 88 | the creation of a HTTP message failed | Check job manager log. |
| 89 | parsing incoming HTTP message failed | Check job manager log. |
| 90 | the packing of information into a HTTP message failed | Check job manager log. |
| 91 | an incoming HTTP message did not contain the expected information | Check job manager log. |
| 92 | the job manager does not support the service that the client requested | Check that the client is talking to the correct servce |
| 93 | the gatekeeper failed to find the requested service | OBSOLETE IN GRAM2 |
| 94 | the jobmanager does not accept any new requests (shutting down) | Execute queries before the job has been cleaned up. |
| 95 | the client failed to close the listener associated with the callback URL | Call globus_gram_client_callback_disallow() with a valid the callback contact. |
| 96 | the gatekeeper contact cannot be parsed | Check the syntax of the gatekeeper contact string you are attempting to contact. |
| 97 | the job manager could not find the 'poe' command | OBSOLETE IN GRAM2 |
| 98 | the job manager could not find the 'mpirun' command | Configure the LRM script with mpirun in your path. |
| 99 | the provided RSL 'start_time' parameter is invalid | OBSOLETE IN GRAM2 |
| 100 | the provided RSL 'reservation_handle' parameter is invalid | OBSOLETE IN GRAM2 |
| 101 | the provided RSL 'max_wall_time' parameter is invalid | Check that the RSL max_wall_time attribute evaluates to a positive integer. |
| 102 | the RSL 'max_wall_time' value is not zero or greater | Check that the RSL max_wall_time attribute evaluates to a positive integer. |
| 103 | the provided RSL 'max_cpu_time' parameter is invalid | Check that the RSL max_cpu_time attribute evaluates to a positive integer. |
| 104 | the RSL 'max_cpu_time' value is not zero or greater | Check that the RSL max_cpu_time attribute evaluates to a positive integer. |
| 105 | the job manager is misconfigured, a scheduler script is missing | Check that the adminstrator has configured the LRM by running its setup script. |
| 106 | the job manager is misconfigured, a scheduler script has invalid permissions | Check that the adminstrator has installed the script.
Check that the file system containing that script allows file execution. |
| 107 | the job manager failed to signal the job | OBSOLETE IN GRAM2 |
| 108 | the job manager did not recognize/support the signal type | Check that your signal operation is using the correct signal constant. |
| 109 | the job manager failed to get the job id from the local scheduler | OBSOLETE IN GRAM2 |
| 110 | the job manager is waiting for a commit signal | Send a two-phase commit signal to the job manager to acknowledge receiving the job contact from the job manager. |
| 111 | the job manager timed out while waiting for a commit signal | Send a two-phase commit signal to the job manager to acknowledge receiving the job contact from the job manager. Increase the two-phase commit time out for your job. Check that the job manager contact TCP/IP port is reachable from your client. |
| 112 | the provided RSL 'save_state' parameter is invalid | Check that the RSL save_state attribute is set to yes or no. |
| 113 | the provided RSL 'restart' parameter is invalid | Check that the RSL restart attribute evaluates to a string containing a job contact string. |
| 114 | the provided RSL 'two_phase' parameter is invalid | Check that the RSL two_phase attribute evaluates to a positive integer. |
| 115 | the RSL 'two_phase' value is not zero or greater | Check that the RSL two_phase attribute evaluates to a positive integer. |
| 116 | the provided RSL 'stdout_position' parameter is invalid | OBSOLETE IN GRAM5 |
| 117 | the RSL 'stdout_position' value is not zero or greater | OBSOLETE IN GRAM5 |
| 118 | the provided RSL 'stderr_position' parameter is invalid | OBSOLETE IN GRAM5 |
| 119 | the RSL 'stderr_position' value is not zero or greater | OBSOLETE IN GRAM5 |
| 120 | the job manager restart attempt failed | OBSOLETE IN GRAM2 |
| 121 | the job state file doesn't exist | Check that the job contact you are trying to restart matches one that the job manager returned to you. |
| 122 | could not read the job state file | Check that the state file directory is not full. |
| 123 | could not write the job state file | Check that the state file directory is not full. |
| 124 | old job manager is still alive | Contact the returned job manager contact to manage the job you are trying to restart. |
| 125 | job manager state file TTL expired | OBSOLETE in GRAM2 |
| 126 | it is unknown if the job was submitted | Check job manager log. |
| 127 | the provided RSL 'remote_io_url' parameter is invalid | Check that the RSL remote_io_url attribute evaluates to a string value. |
| 128 | could not write the remote io url file | Check that the user's home file system on the job manager service node is writable and not full. |
| 129 | the standard output/error size is different | Send a stdio update signal to redirect the job manager output to a new URL |
| 130 | the job manager was sent a stop signal (job is still running) | Submit a restart request to monitor the job. |
| 131 | the user proxy expired (job is still running) | Generate a new proxy and then submit a restart request to monitor the job. |
| 132 | the job was not submitted by original jobmanager | OBSOLETE IN GRAM2 |
| 133 | the job manager is not waiting for that commit signal | Do not send a commit signal to a job that is not waiting for a commit signal. |
| 134 | the provided RSL scheduler specific parameter is invalid | Check the LRM-specific documentation to determine what values are legal for the RSL extensions implemented by the LRM. |
| 135 | the job manager could not stage in a file | Check that the file service hosting the file to stage is reachable from the GRAM5 service node. Check that the file to stage exists on the file service node. Check that there is sufficient disk space in the user's home directory on the service node to store the file to stage. |
| 136 | the scratch directory could not be created | Check that the directory named by the RSL scratch_dir attribute exists and is writable.
Check that the directory named by the RSL scratch_dir attribute is not full. |
| 137 | the provided 'gass_cache' parameter is invalid | Check that the RSL gass_cache attribute evaluates to a string. |
| 138 | the RSL contains attributes which are not valid for job submission | Do not use restart- or signal-only RSL attributes when submitting a job. |
| 139 | the RSL contains attributes which are not valid for stdio update | Do not use submit- or restart-only RSL attributes when sending a stdio update signal to a job. |
| 140 | the RSL contains attributes which are not valid for job restart | Do not use submit- or signal-only RSL attributes when restarting a job. |
| 141 | the provided RSL 'file_stage_in' parameter is invalid | Check that the RSL file_stage_in attribute evaluates to a sequence of SOURCE DESTINATION pairs. |
| 142 | the provided RSL 'file_stage_in_shared' parameter is invalid | Check that the RSL file_stage_in_shared attribute evaluates to a sequence of SOURCE DESTINATION pairs. |
| 143 | the provided RSL 'file_stage_out' parameter is invalid | Check that the RSL file_stage_out attribute evaluates to a sequence of SOURCE DESTINATION pairs. |
| 144 | the provided RSL 'gass_cache' parameter is invalid | Check that the RSL gass_cache attribute evaluates to a string. |
| 145 | the provided RSL 'file_cleanup' parameter is invalid | Check that the RSL file_clean_up attribute evaluates to a sequence of strings. |
| 146 | the provided RSL 'scratch_dir' parameter is invalid | Check that the RSL scratch_dir attribute evaluates to a string. |
| 147 | the provided scheduler-specific RSL parameter is invalid | Check the LRM-specific documentation to determine what values are legal for the RSL extensions implemented by the LRM. |
| 148 | a required RSL attribute was not defined in the RSL spec | Check that the RSL executable attribute is present in your job request RSL.
Check that the RSL restart attributes is present in your restart RSL. |
| 149 | the gass_cache attribute points to an invalid cache directory | Check that the RSL gass_cache attributes evaluates to a directory that exists or can be created.
Check that the user's home file system is writable and not full. |
| 150 | the provided RSL 'save_state' parameter has an invalid value | Check that the RSL save_state attribute has a value of yes or no. |
| 151 | the job manager could not open the RSL attribute validation file | Check that is present and readable on the job manager service node.
Check that is readable on the job manager service node if present. |
| 152 | the job manager could not read the RSL attribute validation file | Check that is valid.
Check that is valid if present. |
| 153 | the provided RSL 'proxy_timeout' is invalid | Check that RSL proxy_timeout attribute evaluates to a positive integer. |
| 154 | the RSL 'proxy_timeout' value is not greater than zero | Check that RSL proxy_timeout attribute evaluates to a positive integer. |
| 155 | the job manager could not stage out a file | Check that the source file being staged exists on the job manager service node. Check that the directory of the destination file being staged exists on the file service node. Check that the directory of the destination file being staged is writable by the user. Check that the destination file service is reachable by the job manager service node. |
| 156 | the job contact string does not match any which the job manager is handling | Check that the job contact string matches one returned from a job request. |
| 157 | proxy delegation failed | Check that the job manager service node trusts the signer of your credential. Check that you trust the signer of the job manager service node's credential. |
| 158 | the job manager could not lock the state lock file | Check that the file system holding the job state directory supports POSIX advisory locking. Check that the job state directory is writable by the user on the service node. Check that the job state directory is not full. |
| 159 | an invalid globus_io_clientattr_t was used. | Check that you have initialized the globus_io_clientattr_t attribute prior to using it with the GRAM client API. |
| 160 | an null parameter was passed to the gram library | Check that you are passing legal values to all GRAM API calls. |
| 161 | the job manager is still streaming output | OBSOLETE IN GRAM5 |
| 162 | the authorization system denied the request | Check with your GRAM system administrator to allow a particular certificate to be authorized. |
| 163 | the authorization system reported a failure | Check with your system administrator to verify that the authorization system is configured properly. |
| 164 | the authorization system denied the request - invalid job id | Check with your system administrator to verify that the authorization system is configured properly. Use a credential which is authorized to interact with a particular GRAM job. |
| 165 | the authorization system denied the request - not authorized to run the specified executable | Check with your system administrator to verify that the authorization system is configured properly. Use a credential which is authorized to interact with a particular GRAM job. |
| 166 | the provided RSL 'user_name' parameter is invalid. | Check that the RSL user_name attribute evaluates to a string. |
| 167 | the job is not running in the account named by the 'user_name' parameter. | Ask with the GRAM system administrator to add an authorization entry to allow your credential to run jobs as the specified user account. |
Table of Contents
The GRAM Protocol is used to handle communication between the Gatekeeper, Job Manager, and GRAM Clients. The protocol is based on a subset of the HTTP/1.1 protocol, with a small set of message types and responses sent as the body of the HTTP requests and responses. This document describes GRAM Protocol version 2 as used by GRAM5. This is compatible with with the GRAM Protocol parsers in GRAM2 with extensions.
GRAM messages are framed in HTTP/1.1 messages. However, only a small subset of the HTTP specification is used or understood by the GRAM system. All GRAM requests are HTTP POST messages. Only the following HTTP headers are understood:
- Host
- Content-Type (set to "application/x-globus-gram" in all cases)
- Content-Length
- Connection (set to "close" in all HTTP responses)
Only the following status codes are supported in response's HTTP Status-Line:
- 200 OK
- 403 Forbidden
- 404 Not Found
- 500 Internal Server Error
- 400 Bad Request
All messages use the carriage return (ASCII value 13) followed by line feed
(ASCII value 10) sequence to delimit lines. In all cases, a blank line
separates the HTTP header from the message body. All
application/x-globus-gram message bodies consist of
attribute names followed by a colon, a space, and then the value of the
attribute. When the value may contain a newline or double-quote character,
a special escaping rule is used to encapsulate the complete string. This
encapsulation consists of surrounding the string with double-quotes, and
escaping all double-quote and backslash characters within the string with a
backslash. All other characters are sent without modification. For example,
the string
rsl: &( executable = "/bin/echo" ) ( arguments = "hello" )
becomes
rsl: "&( executable = \"bin/echo\" ) (arguments = \"hello\" )"
In GRAM5, protocol extensions are supported in the status update messages. These extensions are implemented as extra attribute names after all of the attributes defined in the messages below. Older GRAM protocol parsers will ignore those extensions that occur after the attributes in the messages defined below. In GRAM5, the following extensions are used:
exit-code- Job exit code. Sent in job state callbacks and in job status replies when the job completes.
gt3-failure-type- Failure detail type for staging errors. Sent in job state callbacks and in job status replies when a job fails.
gt3-failure-message- Failure detail message for more context for errors. Sent in job state callbacks and in job status replies when a job fails.
gt3-failure-source- Failure detail message for the source of a failed file transfer. Sent in job state callbacks and in job status replies when a job fails.
gt3-failure-destination- Failure detail message for the destination of a failed file transfer. Sent in job state callbacks and in job status replies when a job fails.
version- Job manager package version. Sent in all messages from the job manager.
toolkit-version- Toolkit release that the job manager is running. Sent in all messages from the job manager.
This is the only form of quoting which
application/x-globus-gram messages support. Use of
% HEX HEX escapes (such as seen in URL encodings) is
not meaningful for this protocol.
A ping request is used to verify that the gatekeeper is configured properly to handle a named service. The ping request consists of the following:
POST ping/job-manager-nameHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version
The values of the message-specific strings are
job-manager-name- The name of the service to have the
gatekeeper check. The service name corresponds to one of
the gatekeeper's configured grid-services, and is usually
of the form
"jobmanager-
LRM". host-name- The name of the host on which the gatekeeper is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is being used. For the protocol defined in this document, the value must be the string "2".
A job request is used to scheduler a job remotely using GRAM. The ping
request consists of the HTTP framing described above with the
request-URI consisting of job-manager-name,
where job-manager name is the name of the
service to use to schedule the job. The format of a job request message
consists of the following:
POSTjob-manager-name[@user-name]HTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:versionjob-state-mask:maskcallback-url:callback-contactrsl:rsl-description
The values of the emphasized text items are as below:
job-manager-name- The name of the service to submit the job
request to. The service name corresponds to one of the
gatekeeper's configured grid-services, and is usually of the
form
jobmanager-
LRM. user-name- Starting with GT4.0, a client may request
that a certain account by used by the gatekeeper to start the
job manager. This is done optionally by appending the @ symbol
and the local user name that the job should be run as to the
job-manager-name. If the @ and username are not present, then the first grid map entry will be used. If the client credential is not authorized in the grid map to use the specified account, an authorization error will occur in the gatekeeper. host-name- The name of the host on which the gatekeeper is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is
being used. For the protocol defined in this document, the
value must be the string
2. mask- An integer representation of the job state mask. This value is obtained from a bitwise-OR of the job state values which the client wishes to receive job status callbacks about. These meanings of the various job state values are defined in the GRAM Protocol API documentation.
callback-contact- A https URL which defines a GRAM protocol listener which will receive job state updates. The from a bitwise-OR of the job state values which the client wishes to receive job status callbacks about. The job status update messages are defined below.
rsl-description- A quoted string containing the RSL description of the job request.
A status request is used by a GRAM client to get the current job state of a running job. This type of message can only be sent to a job manager's job-contact (as returned in the reply to a job request message). The format of a job request message consists of the following:
POSTjob-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version"status"
The values of the emphasized text items are as below:
job-contact- The job contact string returned in a response to a job request message, or determined by querying the MDS system.
host-name- The name of the host on which the job manager is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is
being used. For the protocol defined in this document, the
value must be the string
2.
A callback register request is used by a GRAM client to register a new callback contact to receive GRAM job state updates. This type of message can only be sent to a job manager's job-contact (as returned in the reply to a job request message). The format of a job request message consists of the following:
POSTjob-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version"registermaskcallback-contact"
The values of the emphasized text items are as below:
job-contact- The job contact string returned in a response to a job request message, or determined by querying the MDS system.
host-name- The name of the host on which the job manager is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is
being used. For the protocol defined in this document, the
value must be the string
2. mask- An integer representation of the job state mask. This value is obtained from a bitwise-OR of the job state values which the client wishes to receive job status callbacks about. These meanings of the various job state values are defined in the GRAM Protocol API documentation.
callback-contact- A https URL which defines a GRAM protocol listener which will receive job state updates. The from a bitwise-OR of the job state values which the client wishes to receive job status callbacks about. The job status update messages are defined below.
A callback unregister request is used by a GRAM client to request that the job manager no longer send job state updates to the specified callback contact. This type of message can only be sent to a job manager's job-contact (as returned in the reply to a job request message). The format of a job request message consists of the following:
POSTjob-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version"unregistercallback-contact"
The values of the emphasized text items are as below:
job-contact- The job contact string returned in a response to a job request message, or determined by querying the MDS system.
host-name- The name of the host on which the job manager is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is being used. For the protocol defined in this document, the value must be the string "2".
callback-contact- A https URL which defines a GRAM protocol listener which should no longer receive job state updates. The from a bitwise-OR of the job state values which the client wishes to receive job status callbacks about. The job status update messages are defined @ref globus_gram_protocol_job_state_updates "below".
A job cancel request is used by a GRAM client to request that the job manager terminate a job. This type of message can only be sent to a job manager's job-contact (as returned in the reply to a job request message). The format of a job request message consists of the following:
POSTjob-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version"cancel"
The values of the emphasized text items are as below:
job-contact- The job contact string returned in a response to a job request message, or determined by querying the MDS system.
host-name- The name of the host on which the job manager is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is
being used. For the protocol defined in this document, the
value must be the string
2.
A job signal request is used by a GRAM client to request that the job manager process a signal for a job. The arguments to the various signals are discussed in the protocol library documentation. The format of a job request message consists of the following:
POSTjob-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version"signal"
The values of the emphasized text items are as below:
job-contact- The job contact string returned in a response to a job request message, or determined by querying the MDS system.
host-name- The name of the host on which the job manager is running. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is
being used. For the protocol defined in this document, the
value must be the string
2. signal- A quoted string containing the signal number and its parameters.
A job status update message is sent by the job manager to all registered callback contacts when the job's status changes. The format of the job status update messages is as follows:
POSTcallback-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:versionjob-manager-url:job-contactstatus:status-codefailure-code:failure-code
The values of the emphasized text items are as below:
callback-contact- The callback contact string registered with the
job manager either by being passed as the
callback-contactin a job request message or in a callback register message. host-name- The host part of the callback-contact URL. This exists only for compatibility with the HTTP/1.1 protocol.
message-size- The length of the content of the message, not including the HTTP/1.1 header.
version- The version of the GRAM protocol which is being
used. For the protocol defined in this document, the value must be
the string
2. job-contact- The job contact of the job which has changed states.
A proxy delegation message is sent by the client to the job manager to initiate a delegation handshake to generate a new proxy credential for the job manager. This credential is used by the job manager or the job when making further secured connections. The format of the delegation message is as follows:
POSTcallback-contactHTTP/1.1Host:host-nameContent-Type: application/x-globus-gramContent-Length:message-sizeprotocol-version:version"renew"
If a successful (200) reply is sent in response to this message, then the client will procede with a GSI delegation handshake. The tokens in this handshake will be framed with a 4 byte big-endian token length header. The framed tokens will then be wrapped using the GLOBUS_IO_SECURE_CHANNEL_MODE_SSL_WRAP wrapping mode. The job manager will frame response tokens in the same manner. After the job manager receives its final delegation token, it will respond with another response message that indicates whether the delegation was processed or not. This response message is a standard GRAM response message.
The following security attributes are needed to communicate with the Gatekeeper:
- Authentication must be done using GSSAPI mutual authentication
- Messages must be wrapped with support for the delegation message. When using Globus I/O, this is accomplished by using the the GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP wrapping mode.
As the GRAM service processes a job, the job undergoes a series of state transitions. These states and their meanings follow:
Table 10.1. GRAM Job States
State | Meaning |
|---|---|
| Initial job state |
| Job staging in progress |
| Job submitted to LRM, awaiting execution |
| Job executing |
| Job made progress executing but is now suspended |
| Job staging in progress after job completed |
| Job completed successfully |
| Job was canceled or failed |
C
G
- grid map file
A file containing entries mapping certificate subjects to local user names. This file can also serve as a access control list for GSI enabled services and is typically found in
/etc/grid-security/grid-mapfile. For more information see the Gridmap section here.
P
- proxy certificate
A short lived certificate issued using a EEC. A proxy certificate typically has the same effective subject as the EEC that issued it and can thus be used in its place. GSI uses proxy certificates for single sign on and delegation of rights to other entities.
For more information about types of proxy certificates and their compatibility in different versions of GT, see http://dev.globus.org/wiki/Security/ProxyCertTypes.
A
- apis, APIs
- overview, Programming Model Overview
C
- compatibility, Backward compatibility summary
D
- debugging, Debugging
- dependencies, Technology dependencies
E
- errors, Errors
F
- features, Feature summary
P
- platforms, tested, Tested platforms
T
- troubleshooting, Troubleshooting, GRAM Client Troubleshooting
![[Note]](/docbook-images/note.gif)
