- Doc Structure
- A Globus Primer
- Globus Is Modular!
- Installing GT
- Platform Notes
- Migrating from GT2
- Migrating from GT3
- PDF version
- Best Practices
- Coding Guidelines
- API docs
- Public Interfaces
- Resource Properties
- Performance Studies
Table of Contents
- 1. Introduction
- 2. Command line tools
- 3. Usage scenarios
- 4. Graphical user interfaces
- 5. Troubleshooting
- 6. Usage statistics collection by the Globus Alliance
RLS was co-developed by the Globus team and Work Package 2 of the DataGrid project. The distributed RLS is intended to replace the centralized Globus replica catalog available in earlier releases of GT2.x. The distributed RLS provides higher performance, reliability and scalability.
Replication of data items can reduce access latency, improve data locality, and increase robustness, scalability and performance for distributed applications. An RLS typically does not operate in isolation but functions as one component of a data grid architecture (other components include services that provide reliable file transfers, metadata management, reliable replication and workflow management).
The RLS implementation is based on the following mechanisms:
- Consistent local state maintained in Local Replica Catalogs (LRCs). Local catalogs maintain mappings between arbitrary logical file names (LFNs) and the physical file names (PFNs) associated with those LFNs on its storage system(s).
- Collective state with relaxed consistency maintained in Replica Location Indices (RLIs). Each RLI contains a set of mappings from LFNs to LRCs. A variety of index structures can be defined with different performance characteristics simply by varying the number of RLIs and the amount of redundancy and partitioning among the RLIs.
- Soft state maintenance of RLI state. LRCs send information about their state to RLIs using soft state protocols. State information in RLIs times out and must be periodically refreshed by soft state updates.
- Compression of state updates. This optional compression uses Bloom filters to summarize the content of a Local Replica Catalog before sending a soft state update to a Replica Location Index Node.
- Membership and partitioning information maintenance. The current RLS implementation maintains static information about the LRCs and RLIs participating in the distributed system. As new implementations of the RLS are developed, they will use OGSA mechanisms for registration of services and for service lifetime management.
Please see the RLS Commands.
This section describes a few key usage scenarios and provides examples of using the RLS command-line tools.
Before using any of the tools, a user must generate a valid user proxy. Use grid-proxy-init.
% $GLOBUS_LOCATION/bin/grid-proxy-init Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA.mymachine/OU=mymachine/CN=John Doe Enter GRID pass phrase for this identity:
To check whether your server is active you may use the globus-rls-admin(1) ping command.
% $GLOBUS_LOCATION/bin/globus-rls-admin -p rls://localhost ping rls://localhost: 0 seconds
When the RLS server is first installed its database of replica location information will be empty, as expected. To create a replica location mapping, use the globus-rls-cli(1) create command. Replica information in RLS is represented as mappings from logical names to target names. Typically, the logical name will be a unique identifier for a given replicated data set and the target name will be a URL identifying a particular replica of the data set.
% $GLOBUS_LOCATION/bin/globus-rls-cli create my-logical-name-1 url-for-target-name-1 rls://localhost
The create command is intended for creating the initial replica mapping entry for a given logical name. If the user attempts to create another entry using an existing logical name, RLS will report a user error. To map additional target names to an existing logical name, see Section 3.4, “Adding replica location mappings”.
To map additional target names to a logical name created by the previously described create command, use the globus-rls-cli(1) add command.
% $GLOBUS_LOCATION/bin/globus-rls-cli add my-logical-name-1 url-for-target-name-2 rls://localhost
Once your RLS server is populated with replica location mappings, you can query the server for useful information using the globus-rls-cli(1) query command.
% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost my-logical-name-1: url-for-target-name-1 my-logical-name-1: url-for-target-name-2
To remove unwanted replica location mappings from your RLS server, use the globus-rls-cli(1) delete command. The delete operation works directly on the mapping and indirectly on the logical and target names. When the delete operation is performed by the RLS server the association between the specified logical name and the specified target name is eliminated. However, there may still be other target names associated with the logical name, and there could still be other logical names associated with the target name, though the latter scenario is less likely. Only when all mapping associations for a given logical name (or a given target name) are eliminated (i.e., the specified logical name has no target names associated with it) will the logical (or target) name be deleted from the RLS server.
% $GLOBUS_LOCATION/bin/globus-rls-cli delete my-logical-name-1 url-for-target-name-1 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost my-logical-name-1: url-for-target-name-2 % $GLOBUS_LOCATION/bin/globus-rls-cli delete my-logical-name-1 url-for-target-name-2 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name-1 rls://localhost globus_rls_client: LFN doesn't exist: my-logical-name-1
The globus-rls-cli(1) supports a variety of bulk operations that enhance productivity for users and reduce network connection overhead from making multiple, separate invocations of the client. The general pattern for bulk operation support as implemented by the client is a parameter list consisting of
bulk command-name [command-modifiers] param-1 param-2 param-N, such as
bulk query lrc lfn my-logical-name-1 my-logical-name-2 my-logical-name-3.
% $GLOBUS_LOCATION/bin/globus-rls-cli bulk create my-logical-name-1 url-for-target-name-1-1 my-logical-name-2 url-for-target-name-2-1 rls://localhost % $GLOBUS_LOCATION/bin/globus-rls-cli bulk add my-logical-name-1 url-for-target-name-1-2 my-logical-name-2 url-for-target-name-2-2 rls://localhost homer.isi.edu 51% $GLOBUS_LOCATION/bin/globus-rls-cli bulk query lrc lfn my-logical-name-1 my-logical-name-2 my-logical-name-3 rls://localhost my-logical-name-3: LFN doesn't exist my-logical-name-2: url-for-target-name-2-1 my-logical-name-2: url-for-target-name-2-2 my-logical-name-1: url-for-target-name-1-1 my-logical-name-1: url-for-target-name-1-2
The globus-rls-cli(1) supports an interactive mode in addition to the general command-line mode. To enter the interactive mode, simply invoke the client without any command.
% $GLOBUS_LOCATION/bin/globus-rls-cli rls://localhost rls> query lrc lfn my-logical-name-2 my-logical-name-2: url-for-target-name-2-1 my-logical-name-2: url-for-target-name-2-2 rls> query lrc lfn my-logical-name-1 my-logical-name-1: url-for-target-name-1-1 my-logical-name-1: url-for-target-name-1-2 rls> bulk delete my-logical-name-1 url-for-target-name-1-1 my-logical-name-1 url-for-target-name-1-2 my-logical-name-2 url-for-target-name-2-1 my-logical-name-2 url-for-target-name-2-2 rls> bulk query lrc lfn my-logical-name-2 my-logical-name-1 my-logical-name-1: LFN doesn't exist my-logical-name-2: LFN doesn't exist rls> exit
The following section described a few troubleshooting tips. Additional information on troubleshooting can be found in the FAQ.
When troubleshooting problems encountered during usage of the RLS client or server, verbose error messages may be enabled by setting the
GLOBUS_ERROR_VERBOSE environment variable. Verbose error messages are particularly helpful when communicating on the email@example.com or firstname.lastname@example.org list or when reporting problems on the bugzilla.globus.org site.
A security context failure may be experienced when using the client to connect to the RLS server. This is frequently caused by an expired credential.
% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name rls://localhost connect(rls://localhost): globus_rls_client: Globus I/O error: globus_xio_gsi: gss_init_sec_context failed. globus_gsi_gssapi: Error with GSI credential globus_gsi_gssapi: Error with gss credential handle globus_credential: Error with credential: The proxy credential: /tmp/x509up_u4191 with subject: /C=US/O=My Org/OU=User/CN=Me/CN=1234 expired 350 minutes ago.
There are a variety of reasons which may prevent the client from establishing a connection with the RLS server. Among the more obvious of reasons is a wrong address or port number in the RLS connection URL. Among the less obvious reasons is a firewall configuration that prevents connections to the target host for a particular port. In the latter case, you may need to consult the system administrator.
% $GLOBUS_LOCATION/bin/globus-rls-cli query lrc lfn my-logical-name rls://localhost:1234 connect(rls://localhost:1234): globus_rls_client: Globus I/O error: globus_xio: Unable to connect to localhost:1234 globus_xio: System error in connect: Connection refused globus_xio: A system call failed: Connection refused
At times, a client may experience a connection timeout when interacting with the RLS server due to a variety of reasons. One reason could simply be due to wide-area network latency or congestion. Another situation that users eventually encounter is due to scaling of the system. As the RLS server's database of replica location mappings grows in size, some query operations, such as bulk queries involving large quantities of mappings or wildcard queries that result in a large subset of mappings, will begin to take more time both to process the query and to return the large results set to the client over the network. If timeouts are experienced with increasing frequency, increase the RLS server's
timeout configuration parameter found in the
$GLOBUS_LOCATION/var/globus-rls-server.conf file. You may also use the
-t timeout option of the globus-rls-cli(1).
The following usage statistics are sent by RLS Server by default in a UDP packet:
- Component identifier
- Usage data format identifier
- Time stamp
- Source IP address
- Source hostname (to differentiate between hosts with identical private IP addresses)
- Version number
- LRC service indicator
- RLI service indicator
- Number of LFNs
- Number of PFNs
- Number of Mappings
- Number of RLI LFNs
- Number of RLI LRCs
- Number of RLI Senders
- Number of RLI Mappings
- Number of threads
- Number of connections
The RLS sends the usage statistics at server startup, server shutdown, and once every 24 hours when the service is running.
If you wish to disable this feature, you can set the following environment variable before running the RLS:
By default, these usage statistics UDP packets are sent to
but can be redirected to another host/port or multiple host/ports
with the following environment variable:
export GLOBUS_USAGE_TARGETS="myhost.mydomain:12345 myhost2.mydomain:54321"
You can also dump the usage stats packets to stderr as they are sent (although most of the content is non-ascii). Use the following environment variable for that:
Also, please see our policy statement on the collection of usage statistics.