GT 4.2.1 Release Notes: GRAM4


1. Component Overview

The Web Services Grid Resource Allocation and Management (GRAM4) component comprises a set of WSRF-compliant Web services to locate, submit, monitor, and cancel jobs on Grid computing resources. GRAM4 is not a job scheduler, but rather a set of services and clients for communicating with a range of different batch/cluster job schedulers using a common protocol. GRAM4 is meant to address a range of jobs where reliable operation, stateful monitoring, credential management, and file staging are important.

[Note]Note

The GRAM server is typically deployed in conjunction with the Delegation and RFT services to address data staging, delegation of proxy credentials, and computation monitoring and management in an integrated manner.

2. Feature summary

New Features new since 4.2.0

  • New terminate method in the client-side GramJob API
  • Improved job lifetime management for users and admins
  • Added configuration for "default" Local Resource Managers

Other Standard Supported Features

  • Remote job execution and management
  • Uniform and flexible interface to batch scheduling systems
  • File staging before and after job execution
  • File / directory clean up after job execution (after file stage out)
  • Service auditing for each submitted

Deprecated Features

  • With the addition of the new terminate method in the GramJob API, the destroy method is no longer necessary. For backward compatibility, the destroy method was left in the GramJob API, but it simply calls the terminate method. During the 4.2.x series, clients using the destroy method should change to instead use terminate. In GT 4.4, the plan is to remove the destroy method.

3. Summary of Changes in GRAM4

There are significant enhancements in this release to improve reliabilty of GRAM4 audit logging. Database errors are handled better. If the audit database is unavailable, job records are written to local disk and later inserted. In addition, there are a number of other bug fixes.

4. Bug Fixes

  • Bug 5617: GRAM4 seg hangs with fork jobs
  • Bug 5831: Changes in GramJob in Gram 4.2
  • Bug 5977: Add user's DN to INFO logging statement for each job submission
  • Bug 5982: GRAM handles lack of local username badly(java.lang.NullPointerException)
  • Bug 6102: GRAM4 throughput tester for 4.2
  • Bug 6172: Bad error message for "file not found"
  • Bug 6272: Proxy cleanup doesn't check for authz callouts, uses grid-mapfile check
  • Bug 6283: globusrun-ws doesn't resolve IP addresses to host names
  • Bug 6320: Audit logging with PostgreSQL fails in 4.2
  • Bug 6327: Add support for Derby
  • Bug 6333: Provide support for a common user home for testing purposes
  • Bug 6349: Infinite loop and blocking in org.globus.exec.service.utils.DelegatedCredential
  • Bug 6357: Changes in current Gram4 audit logging

5. Known Problems

The following problems and limitations are known to exist for GRAM4 at the time of the 4.2.1 release:

5.1. Limitations

  • [list limitations]

5.2. Outstanding bugs

  • Bug 3384: Inconsistent jobType/count parameter semantics
  • Bug 3529: setup/postinstall fatal errors should be warnings
  • Bug 3575: SEG dependent on GLOBUS_LOCATION env var
  • Bug 3748: WS-GRAM Plugable Resource Manager Backend
  • Bug 3803: Default scratchDirectory doesn't exist
  • Bug 3892: Out of date performance data?
  • Bug 3948: Service must release all of its resources on deactivation
  • Bug 4181: Allow File Staging To/From globusrun-ws application without an external server
  • Bug 4182: Improve Condor/Fork Job Monitoring for reliability and security
  • Bug 4513: LD_LIBRARY_PATH should not be set if no library_path is specified
  • Bug 4550: Multijob code not checking for existence of job credential
  • Bug 4684: Loading persisted jobs with expired delegation resources causes stacktraces
  • Bug 4719: globus runs /usr/bin/env without checking for \u
  • Bug 4734: Missing wsa:Action for GRAM4 rendezvous register operations
  • Bug 4761: Scheduler Tutorial is missing WS-GRAM setup package
  • Bug 4778: WS-Fork job manager doesn't set environment up for mpi jobs
  • Bug 4787: no lifetime management for WS Rendezvous
  • Bug 4817: Condor OS and ARCH do not have dynamic defaults
  • Bug 4864: environment variables containing '=' get escaped
  • Bug 4918: user account details are cached even for unknown users
  • Bug 4944: Multijob resources never yield to memory pressure and can't be destroyed
  • Bug 5012: Container in livelock state for an incorrectly mapped DN
  • Bug 5017: gram[24] tests that need to be updated
  • Bug 5397: GRAM4 recovery of persisted job resources needs to be reviewed
  • Bug 5402: "invalid password" error messages
  • Bug 5433: public interface doc lists non-public/internal APIs
  • Bug 5471: GRAM Jobs Hang in Unsubmitted State
  • Bug 5484: Review and Update 4.2 GRAM doc
  • Bug 5515: Destruction of subscription resources in a container shutdown/restart
  • Bug 5516: Destruction of subscription resources in a container shutdown/restart
  • Bug 5611: GramJob API changes to improve performace and efficiency
  • Bug 5698: Allow a prologue/epilogue script for 'mpi' and 'multiple' jobs
  • Bug 5712: Gram auditing: local_job_id format variations
  • Bug 5713: GRAM auditing: Failed database connection loses audit records
  • Bug 5714: GRAM Auditing: additional data in audit records
  • Bug 5725: Gram auditing: housekeeping for the auditRecords database
  • Bug 5770: GRAM4 auditing: inconsistent data in job_description column of DB
  • Bug 5776: GRAM4 auditing: Need for an INFO log message
  • Bug 5777: GRAM2 audting: database connection times out
  • Bug 5778: GRAM2 audting: no error message on db update failure
  • Bug 5805: Change threadpools from Gram custom implementation to java.util.concurrent
  • Bug 5820: Improve Condor Logfile Processing in GRAM
  • Bug 5843: Swap custom threadpool with ExecutorServices from java.util.concurrent
  • Bug 5853: create automated tests for globus-job-*-ws programs
  • Bug 5859: Java GramJob getExitCode() always returns 0
  • Bug 5969: GRAM Job submission failed!!!
  • Bug 6002: globusrun-ws hangs indefinitely
  • Bug 6019: CEDPS: Add executable path to log statement
  • Bug 6043: Confusing JobID in timeout message
  • Bug 6065: Deleting delegated credential does not work
  • Bug 6069: JDD documentation issues
  • Bug 6072: specification of RAM per process in parallel jobs
  • Bug 6091: GRAM4 JDD substitution detection is too broad
  • Bug 6192: Apply relevant VDT patches
  • Bug 6204: Drain Mode for the GRAM service
  • Bug 6279: document streaming overhead in globusrun-ws
  • Bug 6289: ws-gram multiJob submissions fail with extensions element
  • Bug 6351: Infinite loop in org.globus.exec.service.utils.UserProxyCreator
  • Bug 6387: add service security descriptor checks in job submission

6. Technology dependencies

GRAM depends on the following GT components:

Other scheduler adapters available for GT 4.2.1 release:

7. Tested platforms

Tested platforms for GRAM4:

  • Linux

    • Fedora Core 1 i686
    • Fedora Core 3 i686
    • Fedora Core 3 yup xeon
    • RedHat 7.3 i686
    • RedHat 9 x86
    • Debian Sarge x86
    • Debian 3.1 i686

Tested containers for GRAM4:

  • Java WS Core container

8. Backward compatibility summary

Protocol changes since GRAM4 in the GT4.0 series:

  • The Java WS Core Framework has been updated from the draft versions of the WSRF/WSN and WS Addressing specifications to the final versions WSRF 1.2, WSN 1.3 and WS Addressing 1.0. There is no backward compatibility between this version and any previous versions.

9. Associated Standards

See the Java WS Core related standards

10. For More Information

See GRAM4 for more information about this component.

Glossary

J

job scheduler

See the term scheduler.