GT 4.1.0 RLS : User's Guide

1. Introduction

The Replica Location Service (RLS) maintains and provides access to mapping information from logical names for data items to target names.

RLS was co-developed by the Globus team and Work Package 2 of the DataGrid project. The distributed RLS is intended to replace the centralized Globus replica catalog available in earlier releases of GT2.x. The distributed RLS provides higher performance, reliability and scalability.

Replication of data items can reduce access latency, improve data locality, and increase robustness, scalability and performance for distributed applications. An RLS typically does not operate in isolation but functions as one component of a data grid architecture (other components include services that provide reliable file transfers, metadata management, reliable replication and workflow management).

The RLS implementation is based on the following mechanisms:

  • Consistent local state maintained in Local Replica Catalogs (LRCs). Local catalogs maintain mappings between arbitrary logical file names (LFNs) and the physical file names (PFNs) associated with those LFNs on its storage system(s).
  • Collective state with relaxed consistency maintained in Replica Location Indices (RLIs). Each RLI contains a set of mappings from LFNs to LRCs. A variety of index structures can be defined with different performance characteristics simply by varying the number of RLIs and the amount of redundancy and partitioning among the RLIs.
  • Soft state maintenance of RLI state. LRCs send information about their state to RLIs using soft state protocols. State information in RLIs times out and must be periodically refreshed by soft state updates.
  • Compression of state updates. This optional compression uses Bloom filters to summarize the content of a Local Replica Catalog before sending a soft state update to a Replica Location Index Node.
  • Membership and partitioning information maintenance. The current RLS implementation maintains static information about the LRCs and RLIs participating in the distributed system. As new implementations of the RLS are developed, they will use OGSA mechanisms for registration of services and for service lifetime management.

2. Command line tools

Please see the RLS Command Reference.

3. Usage scenarios

This section describes a few key usage scenarios and provides examples of using the RLS command-line tools.

3.1. Generate a valid proxy

Before using any of the tools, a user must generate a valid user proxy. Use grid-proxy-init.

% $GLOBUS_LOCATION/bin/grid-proxy-init
Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA.mymachine/OU=mymachine/CN=John Doe
Enter GRID pass phrase for this identity:
        

3.2. Ping the server

To check whether your server is active you may use the globus-rls-admin(1) ping command.

% $GLOBUS_LOCATION/bin/globus-rls-admin -p rls://localhost
ping rls://localhost: 0 seconds
        

3.3. Creating replica location mappings

When the RLS server is first installed its database of replica location information will be empty, as expected. To create a replica location mapping, use the globus-rls-cli(1) create command. Replica information in RLS is represented as mappings from logical names to target names. Typically, the logical name will be a unique identifier for a given replicated data set and the target name will be a URL identifying a particular replica of the data set.

% $GLOBUS_LOCATION/bin/globus-rls-cli create my-logical-name-1 url-for-target-name-1 rls://localhost
        
[Note]Note

The create command is intended for creating the initial replica mapping entry for a given logical name. If the user attempts to create another entry using an existing logical name, RLS will report a user error. To map additional target names to an existing logical name, see Section 3.4, “Adding replica location mappings”.

3.4. Adding replica location mappings

To map additional target names to a logical name created by the previously described create command, use the globus-rls-cli(1) add command.

% $GLOBUS_LOCATION/bin/globus-rls-cli add my-logical-name-1 url-for-target-name-2 rls://localhost
        

3.5. Querying replica location mappings

Once your RLS server is populated with replica location mappings, you can query the server for useful information using the globus-rls-cli(1) query command.

% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost
  my-logical-name-1: url-for-target-name-1
  my-logical-name-1: url-for-target-name-2
        

3.6. Deleting replica location mappings

To remove unwanted replica location mappings from your RLS server, use the globus-rls-cli(1) delete command. The delete operation works directly on the mapping and indirectly on the logical and target names. When the delete operation is performed by the RLS server the association between the specified logical name and the specified target name is eliminated. However, there may still be other target names associated with the logical name, and there could still be other logical names associated with the target name, though the latter scenario is less likely. Only when all mapping associations for a given logical name (or a given target name) are eliminated (i.e., the specified logical name has no target names associated with it) will the logical (or target) name be deleted from the RLS server.

% $GLOBUS_LOCATION/bin/globus-rls-cli delete my-logical-name-1 url-for-target-name-1 rls://localhost
% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost
  my-logical-name-1: url-for-target-name-2
% $GLOBUS_LOCATION/bin/globus-rls-cli delete my-logical-name-1 url-for-target-name-2 rls://localhost
% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost
globus_rls_client: LFN doesn't exist: my-logical-name-1
        

3.7. Using bulk operations

The globus-rls-cli(1) supports a variety of bulk operations that enhance productivity for users and reduce network connection overhead from making multiple, separate invocations of the client. The general pattern for bulk operation support as implemented by the client is a parameter list consisting of bulk command-name [command-modifiers] param-1 param-2 param-N, such as bulk query lrc lfn my-logical-name-1 my-logical-name-2 my-logical-name-3.

% $GLOBUS_LOCATION/bin/globus-rls-cli bulk create my-logical-name-1 url-for-target-name-1-1 my-logical-name-2 url-for-target-name-2-1 rls://localhost
% $GLOBUS_LOCATION/bin/globus-rls-cli bulk add my-logical-name-1 url-for-target-name-1-2 my-logical-name-2 url-for-target-name-2-2 rls://localhost
homer.isi.edu 51% $GLOBUS_LOCATION/bin/globus-rls-cli bulk query lrc lfn my-logical-name-1 my-logical-name-2 my-logical-name-3 rls://localhost
  my-logical-name-3: LFN doesn't exist
  my-logical-name-2: url-for-target-name-2-1
  my-logical-name-2: url-for-target-name-2-2
  my-logical-name-1: url-for-target-name-1-1
  my-logical-name-1: url-for-target-name-1-2
        

3.8. Using interactive mode

The globus-rls-cli(1) supports an interactive mode in addition to the general command-line mode. To enter the interactive mode, simply invoke the client without any command.

% $GLOBUS_LOCATION/bin/globus-rls-cli rls://localhost
rls> query lrc lfn my-logical-name-2
  my-logical-name-2: url-for-target-name-2-1
  my-logical-name-2: url-for-target-name-2-2
rls> query lrc lfn my-logical-name-1
  my-logical-name-1: url-for-target-name-1-1
  my-logical-name-1: url-for-target-name-1-2
rls> bulk delete my-logical-name-1 url-for-target-name-1-1 my-logical-name-1 
 url-for-target-name-1-2 my-logical-name-2 url-for-target-name-2-1 
 my-logical-name-2 url-for-target-name-2-2
rls> bulk query lrc lfn my-logical-name-2 my-logical-name-1
  my-logical-name-1: LFN doesn't exist
  my-logical-name-2: LFN doesn't exist
rls> exit
        

4. Graphical user interfaces

There is no support for this type of interface for RLS.

5. Troubleshooting

The following section described a few troubleshooting tips. Additional information on troubleshooting can be found in the FAQ.

5.1. Verbose error messages

When troubleshooting problems encountered during usage of the RLS client or server, verbose error messages may be enabled by setting the GLOBUS_ERROR_VERBOSE environment variable. Verbose error messages are particularly helpful when communicating on the rls-user@globus.org or gt-user@globus.org list or when reporting problems on the bugzilla.globus.org site.

5.2. Expired proxy

A security context failure may be experienced when using the client to connect to the RLS server. This is frequently caused by an expired credential.

% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name rls://localhost
connect(rls://localhost): globus_rls_client: Globus I/O error: globus_xio_gsi: gss_init_sec_context failed.
globus_gsi_gssapi: Error with GSI credential
globus_gsi_gssapi: Error with gss credential handle
globus_credential: Error with credential: The proxy credential: /tmp/x509up_u4191
      with subject: /C=US/O=My Org/OU=User/CN=Me/CN=1234
      expired 350 minutes ago.
        

5.3. Unable to connect

There are a variety of reasons which may prevent the client from establishing a connection with the RLS server. Among the more obvious of reasons is a wrong address or port number in the RLS connection URL. Among the less obvious reasons is a firewall configuration that prevents connections to the target host for a particular port. In the latter case, you may need to consult the system administrator.

% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name rls://localhost:1234
connect(rls://localhost:1234): globus_rls_client: Globus I/O error: globus_xio:
Unable to connect to localhost:1234
globus_xio: System error in connect: Connection refused
globus_xio: A system call failed: Connection refused
        

5.4. Operation timeouts

At times, a client may experience a connection timeout when interacting with the RLS server due to a variety of reasons. One reason could simply be due to wide-area network latency or congestion. Another situation that users eventually encounter is due to scaling of the system. As the RLS server's database of replica location mappings grows in size, some query operations, such as bulk queries involving large quantities of mappings or wildcard queries that result in a large subset of mappings, will begin to take more time both to process the query and to return the large results set to the client over the network. If timeouts are experienced with increasing frequency, increase the RLS server's timeout configuration parameter found in the $GLOBUS_LOCATION/var/globus-rls-server.conf file. You may also use the -t timeout option of the globus-rls-cli(1).

6. Usage statistics collection by the Globus Alliance

The following usage statistics are sent by RLS Server by default in a UDP packet:

  • Component identifier
  • Usage data format identifier
  • Time stamp
  • Source IP address
  • Source hostname (to differentiate between hosts with identical private IP addresses)
  • Version number
  • Uptime
  • LRC service indicator
  • RLI service indicator
  • Number of LFNs
  • Number of PFNs
  • Number of Mappings
  • Number of RLI LFNs
  • Number of RLI LRCs
  • Number of RLI Senders
  • Number of RLI Mappings
  • Number of threads
  • Number of connections

The RLS sends the usage statistics at server startup, server shutdown, and once every 24 hours when the service is running.

If you wish to disable this feature, you can set the following environment variable before running the RLS:

export GLOBUS_USAGE_OPTOUT=1

By default, these usage statistics UDP packets are sent to usage-stats.globus.org:4180 but can be redirected to another host/port or multiple host/ports with the following environment variable:

export GLOBUS_USAGE_TARGETS="myhost.mydomain:12345 myhost2.mydomain:54321"

You can also dump the usage stats packets to stderr as they are sent (although most of the content is non-ascii). Use the following environment variable for that:

export GLOBUS_USAGE_DEBUG=MESSAGES

Also, please see our policy statement on the collection of usage statistics.