RLS: Key Concepts
The Replica Location Service (RLS) maintains and provides access to mapping information from logical names for data items to target names.
RLS was co-developed by the Globus team and Work Package 2 of the DataGrid project. The RLS prototype is currently available as an alpha release for testing and evaluation. The distributed RLS is intended to replace the centralized Globus replica catalog available in earlier releases of GT2.x. The distributed RLS provides higher performance, reliability and scalability.
Replication of data items can reduce access latency, improve data locality, and increase robustness, scalability and performance for distributed applications. An RLS typically does not operate in isolation, but functions as one component of a data grid architecture. (Other components include services that provide reliable file transfers, metadata management, reliable replication and workflow management.)
Our prototype RLS implementation is based on the following mechanisms:
- Consistent local state maintained in Local Replica
Catalogs (LRCs). Local
catalogs maintain mappings between arbitrary logical file names (LFNs) and
the physical file names (PFNs) associated with those LFNs on its storage system(s).
- Collective state with relaxed consistency maintained
in Replica Location Indices (RLIs). Each RLI contains a set of mappings from LFNs to LRCs. A variety of
index structures can be defined with different performance characteristics,
simply by varying the number of RLIs and amount of redundancy and partitioning
among the RLIs.
- Soft state maintenance of RLI state. LRCs send information about their
state to RLIs using soft state protocols. State information in RLIs times
out and must be periodically refreshed by soft state updates.
- Compression of state updates. This optional compression uses Bloom Filters
to summarize the content of a Local Replica Catalog before sending a soft
state update to a Replica Location Index Node.
- Membership and partitioning information maintenance. The current RLS implementation maintains static information about the LRCs and RLIs participating in the distributed system. As new implementations of the RLS are developed, they will use OGSA mechanisms for registration of services and for service lifetime management.
Relationship to Earlier Globus Replica Management Software
RLS is intended to replace replica management tools available in earlier versions of the Globus Toolkit, incuding the Replica Catalog API and the Replica Management API. The RLS differs from these earlier components in several important ways.
- As a distributed system, the RLS is designed to
provide greater reliability by avoiding single points of failure as well
as providing better load balancing, performance and scalability.
- The RLS implementation
is based on open source relational database technology.
- Finally, the RLS strictly separates replication information from other types of metadata; unlike the original replica catalog, the RLS does not include information about logical collections, but assumes such information is stored in a separate metadata service.