GT 3.9.5 Component Fact Sheet: Web Service Grid Resource Allocation and Management (WS GRAM)
- Brief component overview
- Summary of features
- Usability summary
- Backward compatibility summary
- Technology dependencies
- Tested platforms
- For more information
Brief component overview
Web Services Grid Resource Allocation and Management (WS GRAM) component comprises a set of WSRF-compliant Web services to locate, submit, monitor, and cancel jobs on Grid computing resources. WS GRAM is not a job scheduler, but rather a set of services and clients for communicating with a range of different local job schedulers using a common protocol. WS GRAM is meant to address a range of jobs where reliable operation, stateful monitoring, credential management, and file staging are important.
Summary of features
Features new in release 3.9.5
- Improved service performance:
- Job concurrency
- Throughput
- Latency
- Improved service reliability/recovery
- Support for mpich-g2 jobs:
- multi-job submission capabilites
- ability to coordinate processes in a job
- ability to coordinate subjobs in a multi-job
- Publishing of the job's exit code
- The ability to select the account under which the remote job will be run. If a user's grid credential is mapped to multiple accounts, then the user can specify, in the RSL, under which account the job should be run.
- Optional client-specified hold on a state. Released with the new "release" operation.
Other Supported Features
- Remote job execution and management
- Uniform and flexible interface to batch scheduling systems
- File staging before and after job execution
- File / directory clean up after job execution (after file stage out)
Deprecated Features
- Service managed data streaming of job's
stdout/errduring execution. - File staging using the GASS protocol
- File caching of stages files, e.g. GASS Cache
Usability summary
Usability improvements for WS GRAM:
- improvement #1
- ...
- improvement #n
Backward compatibility summary
Protocol changes since GT version 3.2:
- The protocol has been changed to be WSRF compliant. There is no backward compatibility between this version and any previous versions.
API changes since GT version 3.2:
- The MJFS
createoperation has becomecreateManagedJoband, now provides the option to send a uuid. A client can use this uuid to recover a job EPR in the event that the reply message is not received. Given this new scheme, thestartoperation was removed. The createManagedJob() operation also allows a notification subscription request to be specified. This is the only way to reliably get all job state notifications. - The MJS
startoperation has been removed. Its purpose was to ensure that the client had recieved the job EPR prior to the job being executed (and thus consuming resources), and is redundant with the uuid functionality.
Fault changes since GT version 3.2:
- CacheFaultType was removed since there is no longer a GASS cache.
- RepeatedlyStartedFaultType
was removed since there is no longer a
startoperation. Repeat creates with the same submission ID simply return the job EPR. - SLAFaultType was changed to ServiceLevelAgreementFaultType for clarification.
- StreamServiceCreationFaultType was removed since there is no longer a stream service.
- UnresolvedSubstitutionReferencesFaultType was removed since there is no longer support for substitution definitions and references in the RSL.
- DatabaseAccessFaultType was removed since a database is no longer used to save job data.
RSL schema changes since GT version 3.2. See the 3.9.5 User's Guide for more information about the new RSL syntax:
- executable is now a single local file path. Remote URLs are no longer allowed. If executable staging is desired, it should be added to the fileStageIn directive.
- stdin is now a single local file path. Remote URLs are no longer allowed. If stdin staging is desired, it should be added to the fileStageIn directive.
- stdout is now a single local file path, instead of a list of remote URLs. If stdout staging is desired, it should be added to the fileStageOut directive.
- stderr is now a single local file path, instead of a list of remote URLs. If stderr staging is desired, it should be added to the fileStageOut directive.
- scratchDirectory has been removed.
- gramMyJobType has been removed. "Collective" functionality is always available if a job chooses to use it.
- dryRun has been removed. This is obsolete given the addition of the holdState attribute. setting holdState to "StageIn" should prevent the job from being submitted to the local scheduler. It can then be canceled once the StageIn-Hold state notification is received.
- remoteIoUrl has been removed. This was a hack for pre-ws GRAM involved with staging via GASS, and has no relevancy in the current implementation.
- File Staging related RSL attributes have been replaced with RFT file stransfer attributes/syntax.
- RSL substitution definitions and substitution references have been removed in order to be able to use standard XML parsing/serialization tools.
- RSL variables have been added. These are keywords denoted in the form of ${variable name} that can be found in certain RSL attributes.
- Explicit credential references have added, which, along with use of the new DelegationFactory service, replace the old implicit delegation model.
Technology dependencies
GRAM depends on the following GT components:
- Java WS Core
- Transport-Level Security
- Delegation Service
- RFT
- GridFTP
- MDS - internal libraries
GRAM depends on the following 3rd party software. The dependency exists only for the batch schedulers configured, thus making job submissions possible to the batch scheduling service:
- PBS
- Condor
- LSF
- other batch schedulers... (where the GRAM scheduler interface has been implemented)
Tested platforms
Tested platforms for GRAM:
- Linux
For More Information
Click here for more information about this component.