Table of Contents
The Grid Resource Allocation and Management (GRAM5) component is used to locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM5 is not a Local Resource Manager, but rather a set of services and clients for communicating with a range of different batch/cluster job schedulers using a common protocol. GRAM5 is meant to address a range of jobs where reliable operation, stateful monitoring, credential management, and file staging are important.
New Features new since 5.2.0:
- Better integration with Linux operating systems with native RPM and Debian packages
- Improved logging and integration with system log rotation tools
- Improved scalability and reliability
Other Standard Supported Features
- Remote job execution and management
- Uniform and flexible interface to local resource managers
- File staging before and after job execution
- File and directory clean up after job termination
- Service auditing for each submitted
Removed Features
- Condor SEG module is no longer included. Its functionality has been moved into the core of the job manager program.
- GRAM-321: globus-job-manager emits warning about all jobs on restart
- GRAM-230: globus-gatekeeper does not reap children in threaded mode
- GRAM-232: Incorrect directory permissions cause an infinite loop
- RIC-205: Missing directories $GLOBUS_LOCATION/var/lock and $GLOBUS_LOCATION/var/run
- GRAM-296: Compile Failure on Solaris
- GRAM-297: job manager service definitions contain unresolved variables
- GRAM-299: Not all job log messages obey loglevel RSL attribute
- GRAM-300: GRAM job manager doxygen refers to obsolete command-line options
- GRAM-301: GRAM validation file parser doesn't handle empty quoted values correctly
- GRAM-302: Incorrect error when state file write fails
- GRAM-303: Gatekeeper's syslog output cannot be controlled
- GRAM-305: Jobmanager reporting DONE status when stage-out failed
- RIC-226: Some dependencies are missing in GPT metadata
- GRAM-306: Job Manager stdio_size query logging crash
- GRAM-309: GRAM5 doesn't work with IPv4 only gatekeepers
- GRAM-310: sge configure script error
- GRAM-311: Undefined variable defaults in shell scripts
- GRAM-312: Make crontab not fail if the package is uninstalled
- GRAM-315: Job locking doesn't handle ENOENT gracefully
- RIC-239: GSSAPI Token inspection fails when using TLS 1.2
- GRAM-317: job manager fails transferring job between processes if the proxy is larger than the socket buffer
- GRAM-318: Periodic lockup of SEG
- GRAM-323: RVF parser leaks file descriptors
- GRAM-325: job manager crashes when reading empty condor log
- GRAM-326: Can't renew job proxy after GLOBUS_GRAM_PROTOCOL_ERROR_COMMIT_TIMED_OUT error
- GRAM-328: job manager waits for two-phase delay when stopping
- GRAM-330: Buffer overflow in globus_gram_job_manager_seg_parse_condor_id
- GRAM-334: job manager doesn't work if unix socket path is too long
- GRAM-335: init scripts fail on solaris because of stop alias
- GRAM-336: Job manager can't guess osname on some operating systems
- GRAM-337: GRAM job manager config file has unresolved paths
- GRAM-338: GRAM job manager mishandles peer name when proxying messages through the gatekeeper
- GRAM-339: globus-job-run and globus-job-submit can't always handle "-e" as an argument
- GRAM-340: job manager crashes during stdio size query
- GRAM-341: globusrun ignores state callbacks that occur too early
- GRAM-342: intra-job manager protocol doesn't keep do signal-safe reads
- GRAM-343: lrm packages grid-service files aren't in CLEANFILES
- GRAM-320: globus-gatekeeper leaks logfile to globus-job-manager
- GRAM-105: Held Condor jobs should be reported as SUSPENDED
- GRAM-138: GRAM5 job manager uses a lot of memory when SEG is pointed to incorrect log path
- GRAM-231: audit not working when proxy expires
- GRAM-237: Fork LRM doesn't include softenv RSL attribute in rvf file
- GRAM-238: GRAM Fork LRM's softenv implementation doesn't work without SEG
- GRAM-291: RSL eval doesn't indicate what symbol was not found
Tested platforms for GRAM5:
Linux
- CentOS 5, 6 x86_64, i386
- Fedora 15, 16 x86_64, i386
- Red Hat Enterprise Linux 5, 6 x86_64, i386
- Scientific Linux 5, 6 x86_64, i386
- Debian 6, 7 (testing) x86_64, i386
- Ubuntu 10.04LTS, 10.10, 11.04, 11.10, 12.04LTS (testing) x86_64, i386
Mac OS X
- Mac OS X 10.7 (Lion)
Solaris
- Solaris 11
Protocol changes in GRAM since GT4 series:
- The GRAM5 service uses a superset of the GRAM2 protocol for communciation between the client and service. The extensions supported in GRAM5 are implemented in such a way that they are ignored by GRAM2 services or clients. These extensions provide improved error messages and version detection.
- GRAM5 does not support task coallocation using DUROC and its related protocols. Jobs submitted using DUROC directives will fail.
- GRAM5 does not support file streaming. The standard output and standard error streams are sent after the job completes instead of during execution. As a special case, support for the Condor grid monitor program implements a small subset of the streaming capabilities of GRAM2 in GT 4.2.x.
See GRAM5 for more information about this component.
L
- Local Resource Manager (LRM)
A system which controls access to a compute resource, such as a compute cluster or parallel computer. Such systems provide batch execution interfaces, which GRAM uses to execute jobs. Condor, PBS, GridEngine are examples of local resource managers.