Information Services (MDS) : Key Concepts

Overview

Note: If you haven't done so already, we recommend reading the GT4 Key Concepts guide before continuing. Also keep in mind that in this documentation, the concepts of MDS are written for the latest version, WS MDS (also known as MDS4). The GT4 release includes the Pre-WS component, MDS2, for legacy purposes only - it will be deprecated at some future time as experience is gained with the WS implementation (for information about MDS2, click here.)

The Monitoring and Discovery System (MDS) component of Globus Toolkit V4 (GT4) can streamline the tasks of monitoring and discovering services and resources in a distributed system or Grid :

  • Monitoring is the process of observing Grid resources (e.g., computers and schedulers), for such purposes as fixing problems and tracking usage. For example, a user might use a monitoring system to identify resources that are running low on disk space, in order to take corrective action.
  • Discovery is the process of finding a suitable Grid resource to perform a task: for example, finding a compute host on which to run a job. This process may involve both finding which Grid resources are suitable (e.g., have the correct CPU architecture) and choosing a suitable member from that set (e.g., the one with the shortest submission queue).

Both monitoring and discovery applications require the ability to collect information about Grid resources from multiple, perhaps distributed, sources (referred to as information sources.) To meet this need, WS MDS provides:

  • so-called aggregator services (primarily the MDS-Index service) that collect recent state information from registered Grid resources, and then does something with it.
  • browser-based interfaces, command line tools, and Web service interfaces that allow users/machines to query and access the collected information.

WS MDS is based on the WSRF implementation. It makes heavy use of XML and Web service interfaces to simplify the tasks of registering information sources and locating and accessing information of interest. In particular, all information collected by aggregator services is maintained as XML, and can be queried via XPath queries (as well as other Web services mechanisms.)

WS MDS has similar features to, but does not interoperate with MDS2. The following is a breakdown of the differences:

  • More powerful query language: XPath instead of LDAP
  • Simpler and therefore more robust implementation, due to fewer components
  • GT 3.9.5 Final is intended to have simpler (indeed, in simple cases, totally automated) configuration, due to fewer components and tight integration with the GT4 implementation
  • Convenient interface to arbitrary information sources due to extensible architecture.
  • No requirement for pre-defined schema in information providers.
  • Different performance characteristics: in some cases, lower performance than MDS2 (such as raw query rate) due to the immaturity of the underlying technologies, while in other places higher performance due to architectural changes (such as removal of cascading queries.)

[TODO: mention differences with GT3's version of WS MDS - may make a table]

Conceptual details

This section will define the major players of WS MDS, then describe how they fit together in a basic framework:

Grid resources

Grid resources are essentially any entity in a virtual organization (VO) from which a user wants to obtain information: for example, a file, a program, a Web service, or another network-enabled service.

Information sources

Information sources are basically sources of information about a Grid resource you want to monitor. An information source collects the information from the Grid resource and formats it so that it is compatible with WS MDS. Information sources can either be Java classes in the Aggregator framework or executables.

Java classes are supported for WSRF-compliant Web services. Such services need simply to make status and state information available as WSRF resource properties . The following are provided in WS MDS (custom ones can be written):

  • the WSRF polling source, a Java class which polls WSRF services for resource properties
  • the WSRF subscription source, a Java class which receives resource properties from WSRF services through subscription/notification

Executables are user-supplied programs that allow information to be obtained from an arbitrary Grid resource (whether or not it's a WSRF-compliant Web service). The program runs periodically to obtain up-to-date data and can either generate the information locally or use a source-specific protocol to access the information remotely. The program must convert non-XML data into an appropriate XML representation.

GT4's built-in information sources

GT4 is configured to use WS MDS components for discovery and monitoring of GT4 services and provides built-in information sources, as follows:

  • Two GT4 services, GRAM and RFT, also publish a larger set of service-specific information. These values are documented in the service descriptions for GRAM and RFT [provide list/links] and also in the GT4 resource properties catalog.
  • The following are intended to be implemented for GT 3.9.5 Final:
    • Every GT4 service supports a minimal set of resource properties (an informal service name and a service startup time) and thus can be registered easily into one or more aggregator services for monitoring and discovery. [provide list/links]
    • The GT4 distribution also includes information source executables to enable registering GridFTP and RLS (which are not WSRF-compliant Web services) into aggregator services [provide list/links].

Aggregator services

A service that collects information using the WS-MDS aggregator framework, and then does something with it. WS-MDS includes two aggregator services:

  • MDS-Index, which is the main component for collecting structured data from information sources and making the information available via a Web Services interface.
  • MDS-Trigger, which passes this information to an executable (configured by the administrator), which may take an action such as sending email. 

A third aggregator service is planned: an MDS-Archive, which will maintain an archive of historical information.

WS MDS aggregator services are distinguished from a traditional static registry such as UDDI by their soft-state registration of information sources and periodic refresh of the structured data that they store. This dynamic behavior enables scalable discovery by allowing users to access “recent” information without accessing the information sources directly.

However, note that the information obtained may not be the absolute latest. Also, as aggregator services do not interpret policy information, there is no guarantee that a user will be allowed to access a service discovered in this way.

MDS-Index service

The MDS-Index service makes the collected data available as XML documents. More specifically, the data is maintained as WSRF resource properties, thus:

  • Users can write their own applications that collect information using Web services interfaces, namely the WSRF get-property and WS-Notification subscribe operations, for which GT4 provides C, Java, and Python APIs.
  • The command line tool wsrf-get-property can be used to retrieve resource properties, with the desired resource property specified via an XPath expression.
  • A tool called WebMDS presents MDS-Index information in a standard web browser. WebMDS is highly configurable, using XSLT transformations to describe how MDS-Index resource properties are converted to HTML. Standard transformations included in GT4 provide an interface that displays overview information, with hyperlinks giving the ability to drill down and view more detailed information about each monitored resource.

For more information, see the MDS-Index service documentation.

MDS-Trigger service

The MDS-Trigger service performs user-specified actions (such as sending email or generating a log-file entry) whenever collected information matches criteria determined by the users. MDS-Trigger defines:

  • a Web service interface that allows a client to register an XPath query; and
  • a program to be executed whenever a new value matches a user-supplied matching rule.

For more information, see the MDS-Trigger service documentation.

Basic framework

The key to understanding WS MDS is the aggregator service/information source framework. The basic process is as follows:

  1. Grid resources are explicitly registered with an aggregator service.
  2. The aggregator service periodically collects up-to-date state or status information from all registered Grid resources using specific information sources.
  3. The aggregator service then publishes this information to the user makes this information available to the user.

Registering Grid resources

An aggregator service registers with a Grid resource via a Web service (WS-ServiceGroup) Add operation. Registrations can be configured either at an MDS-Index service or at a Grid resource.

Registrations have a lifetime: if not renewed periodically, they expire. Thus, an aggregator service is self-cleaning: outdated entries disappear automatically when they do not renew their registrations.

Two registration modes are supported; each also defines the access mechanism for the associated Grid resource.

The more general registration mode allows information to be obtained from an arbitrary source (whether or not it's a WSRF-compliant Web service). In this mode, a Grid resource is registered by providing a user-supplied program that is run periodically to obtain up-to-date data . This user program can either generate the information locally or use a source-specific protocol to access the information remotely. The program must convert non-XML data into an appropriate XML representation.

A more streamlined form of registration is supported for WSRF-compliant Web services. Such services need simply to make status and state information available as WSRF resource properties. At registration time, the user specifies whether the aggregator service should either: use pull resource properties, using the WSRF “get resource property” interface; or subscribe to resource property changes so that values are pushed via WS-Notification subscription methods.


 

Collecting the information

The information source is the 'compatibility' link between the Grid resource and the aggregator service; its purpose is to ensure the information is formatted in a way the aggregator service understands (namely, well-formed XML). In the following diagram, RFT and WS GRAM, both WSRF services, simply pass their information using WSRF protocols to the appropriate Java class whereas GridFTP and RLS use an executable that "talks" to them and converts the information. All information is sent as XML to the MDS-Index service.

 

Publishing the information

Aggregator services publish the collected information for the user in many different formats. The following diagram show the user receiving information via WebMDS (in the form of a Web browser interface) and the Trigger service (in the form of an email triggered by conditions set by the user.)

Basic WS MDS Deployment

Every standard GT4 Web services hosting environment includes a default MDS-Index service. Any GT4 services running in that hosting environment (e.g., GRAM, CAS, RFT) are automatically registered. Thus, each installation on a platform has an MDS-Index that allows you to discover what services are available.

Since virtual organizations (VOs) often need to keep track of all available Grid resources, GT4 also provides a simple method for registering one or more default indexes to be a VO-wide MDS-Index. In this setup, each Grid resource registered to a default MDS-Index is also registered in the VO MDS-Index.

For a more detailed description of a basic deployment within a VO, see Deploying WS MDS in a Virtual Organization.

Performance Characteristics

We only have very preliminary performance data at this time. These data suggest that WS MDS aggregator services can support query rates of the order of tens of queries per second (depending on data sizes) and a few hundred information sources, depending on registration and information update rates. We emphasize that these are not accurate data. We welcome feedback on application requirements.

Note: As stated above, in some cases WS MDS has lower performance than MDS2 (such as raw query rate) due to the immaturity of the underlying technologies, while in other cases it has higher performance than MDS2 due to architectural changes (such as removal of cascading queries.)

Related Documents

The following links include internal or external documents that expand on some of these key concepts: