GT 3.9.5 RLS : User's Guide
Introduction
The Replica Location Service (RLS) maintains and provides access to mapping information from logical names for data items to target names.
RLS was co-developed by the Globus team and Work Package 2 of the DataGrid project. The RLS prototype is currently available as an alpha release for testing and evaluation. The distributed RLS is intended to replace the centralized Globus replica catalog available in earlier releases of GT2.x. The distributed RLS provides higher performance, reliability and scalability.
Replication of data items can reduce access latency, improve data locality, and increase robustness, scalability and performance for distributed applications. An RLS typically does not operate in isolation, but functions as one component of a data grid architecture. (Other components include services that provide reliable file transfers, metadata management, reliable replication and workflow management.)
The RLS implementation is based on the following mechanisms:
- Consistent local state maintained in Local Replica Catalogs (LRCs).
Local catalogs maintain mappings between arbitrary logical file names (LFNs)
and the physical file names (PFNs) associated with those LFNs on its storage
system(s).
- Collective state with relaxed consistency maintained in Replica
Location Indices (RLIs). Each RLI contains a set of mappings from LFNs to
LRCs. A variety of index structures can be defined with different performance
characteristics, simply by varying the number of RLIs and amount of redundancy
and partitioning among the RLIs.
- Soft state maintenance of RLI state. LRCs send information about
their state to RLIs using soft state protocols. State information in RLIs
times out and must be periodically refreshed by soft state updates.
- Compression of state updates. This optional compression uses
Bloom Filters to summarize the content of a Local Replica Catalog before
sending a soft state update to a Replica Location Index Node.
- Membership and partitioning information maintenance. The current RLS implementation maintains static information about the LRCs and RLIs participating in the distributed system. As new implementations of the RLS are developed, they will use OGSA mechanisms for registration of services and for service lifetime management.
Command-line tools
Command line tools include the Server, Admin Tool, and Client Tool.
RLS Server (globus-rls-server)
The RLS server (globus-rls-server) can be configured as either
or both of the following:
- Location Replica Catalog (LRC) server, which
manages Logical FileName (LFN) to Physical
FileName (PFN) mappings in a database.
Note: Ifglobus-rls-serveris configured as an LRC server, the RLI servers that it sends updates to should be added to the database usingglobus-rls-admin.
- Replica Location Index (RLI) server, which manages mappings of LFNs to LRC servers.
Clients wishing to locate one or more physical filenames associated with a logical filename should first contact an RLI server, which will return a list of LRCs that may know about the LFN. The LRC servers are then contacted in turn to find the physical filenames.
Note: RLI information may be out of date, so clients should be prepared to get a negative response when contacting an LRC (or no response at all if the LRC server is unavailable.)
This page contains the following topics:
Synopsis
[ -B lrc_update_bf ] [ -b maxbackoff ] [ -C rlscertfile ] [ -c conffile ] [ -d ] [ -e rli_expire_int ] [ -F lrc_update_factor ] [ -f maxfreethreads ] [ -I true|false [ -i idletimeout ] [ -K rlskeyfile ] [ -L loglevel ] [ -l true|false ] [ -M maxconnections ] [ -m maxthreads ] [ -N ] [ -o lrc_buffer_time ] [ -p pidfiledir ] [ -r true|false ] [ -S rli_expire_stale ] [ -s startthreads ] [ -t timeout ] [ -U myurl ] [ -u lrc_update_ll ] [ -v ]
LRC to RLI Updates
Two methods exist for LRC servers to inform RLI servers of their LFNs.
- By default, the LFNs are sent from the LRC to the RLI.
This can be time consuming if the number of LFNs is large, but does give the RLI an exact list of the LFNs known to the LRC, and it allows wildcard searching of the RLI.
- Alternatively, Bloom filters may be sent, which are highly compressed summaries of the LFNs; however they do not allow wildcard searching, and they will generate more "false positives" when querying an RLI.
Please see below for more on Bloom filters.
globus-rls-admin can
be used to manage the list of RLIs that an LRC server updates.
This includes partitioning LFNs among multiple RLI servers.
A softstate algorithm is used in both update modes: periodically the LRC server sends its state (LFN information) to the RLI servers it updates.
The RLI servers add these LFNs to their index, or update a timestamp if the LFNs were already known. RLI servers expire information about LFN,LRC mappings if they haven't been updated for a period longer than the softstate update interval.
The following options in the Configuration file control the softstate algorithm when an LRC updates an RLI by sending LFNs:
Updates to an LRC (new LFNs or deleted LFNs) normally don't propagate
to RLI servers until the next softstate update (controlled by options lrc_update_ll and lrc_update_bf.
However,
by enabling "immediate update" mode (set lrc_update_immediate to
true),
an LRC will send updates to an RLI within lrc_buffer_time seconds.
If updates are done with LFN lists, then only the LFNs that have been added or deleted to the LRC are sent. If Bloom filters are used, then the entire Bloom filter is sent.
When immediate updates are enabled, the interval between softstate
updates is multiplied by lrc_update_factor as long
as no updates have failed (LRC and RLI are considered to be in sync).
This can greatly reduce the number of softstate updates an LRC needs
to send to an RLI.
Incremental updates are buffered by the LRC server
until either 200 updates have accumulated (when LFN lists are used),
or lrc_buffer_time seconds have passed since
the last update.
Bloom filter updates
A Bloom filter is an array of bits. Each LFN is hashed multiple times and the corresponding bits in the Bloom filter are set.
Querying an RLI to verify if an LFN exists is done by performing the same hashes, and checking if the bits in the filter are on. If not, then the LFN is known not to exist. If they're all on, then all that's known is that the LFN probably exists.
The size of the Bloom filter (as a multiple of the number of LFNs) and the number of hash functions, control the false positive rate. The default values of 10 and 3 give a false positive rate of approximately 1%.
The advantage of Bloom filters is their efficiency. For example, if the LRC has 1,000,000 LFNs in its database, with an average length of 20 bytes, then 20,000,000 bytes must be sent to an RLI during a softstate update (assuming no partitioning). The RLI server must perform 1,000,000 updates to its database to create new LFN,LRC mappings, or update timestamps on existing entries. With Bloom filters, only 1,250,000 bytes are sent (10 x 1,000,000 bits / 8), and there are no database operations on the RLI (Bloom filters are maintained entirely in memory). A comparison of the time to perform a 1,000,000 LFN update took 20 minutes sending all the LFNs, and less than 1 second using a Bloom filter. However as noted before, Bloom filters do not support wild card searches of an RLI.
Note: An LRC server can update some RLIs with Bloom filters, and others with LFNs. However an RLI server can only be updated using one method.
The following options in the Configuration file control Bloom filter updates:
rli_bloomfilter true|falserli_bloomfilter_dir none|default|pathnamelrc_bloomfilter_numhash Nlrc_bloomfilter_ratio Nlrc_update_bf seconds
Log messages
globus-rls-server uses syslog to log errors and other
information (facility LOG_DAEMON) when it's running in normal (daemon)
mode.
If the -d option (debug) is specified, then log messages
are written to stdout.
Signals
The server will reread its configuration file if it receives a HUP signal.
It will wait for all current requests to complete and shut down cleanly
if sent any of the following signals: INT, QUIT or TERM.
Options (globus-rls-server)
The following table describes the command line options available for globus-rls-server:
-b maxbackoff |
Maximum time (in seconds) that globus-rls-server will
attempt to reopen the socket it listens on after an I/O error. |
-C rlscertfile |
Name of X.509 certificate file that identifies the
server; sets environment variable X509_USER_CERT. |
-c conffile |
Name of configuration file for server. The default is |
-d |
Enable debugging. Server will not detach from controlling terminal
and log messages will be written to stdout rather than syslog.
For additional logging verbosity set loglevel (see |
-e rli_expire_int |
Interval (seconds) at which an RLI server should expire stale entries. |
-F lrc_update_factor |
If lrc_update_immediate mode is on, and the LRC server
is in sync with an RLI server (an LRC and RLI are synced if there
have been no failed updates since the last full softstate update),
then the interval between RLI updates for this server ( lrc_update_ll )
is multipled by lrc_update_factor. |
-f maxfreethreads |
Maximum number of idle threads server will leave running. Excess threads are terminated. |
-I true|false |
Turns LRC to RLI immediate update mode on ( The default value is
|
-i idletimeout |
Seconds after which idle client connections are timed out. |
-K rlskeyfile |
Name of X.509 key file. Sets environment variable X509_USER_KEY. |
-L loglevel |
Sets log level. By default this is 0, which means only errors
will be logged. Higher values mean more verbose logging. |
-l true|false |
Configure whether server is an LRC server. Default is |
-M maxconnections |
Maximum number of active connections. Should be small enough to prevent server from running out of open file descriptors. The default
value is |
-m maxthreads |
Maximum number of threads server will start up to support simultaneous requests. |
-N |
Disable authentication checking. This option is intended for debugging. Clients
should use the URL |
-o lrc_buffer_time |
LRC to RLI updates are buffered until either the buffer is full or this much time (in seconds) has elapsed since the last update. The default value is
|
-p pidfiledir |
Directory where PID files should be written. |
-r |
Configure whether server is an RLI server. The default value is |
-S rli_expire_stale |
Interval (in seconds) after which entries in the RLI database are considered stale (presumably because they were deleted in the LRC.) Stale entries are not returned in queries. |
-s startthreads |
Number of threads to start up initially. |
-t timeout |
Timeout (in seconds) for calls to other RLS servers (in other words, for LRC calls to send an update to an RLI.) A
value of The default value is |
-U myurl |
URL for this server. |
-u lrc_update_ll |
Interval (in seconds) between lfn-list LRC to RLI updates. |
-v |
Show version and exits. |
Admin Tool (globus-rls-admin) for RLS
The RLS Administration Tool (globus-rls-admin) performs administrative operations on a RLS server
Synopsis
-A|-a|-C option value|-c option|-d|-e|-p|-q|-s|-t timeout|-u|-v [ rli ] [ pattern ] [ server ]
Options
-A |
Adds Note: Partitions are not supported with Bloom filters, the LRC server maintains one Bloom filter for all LFNs in its database, which is sent to all RLI servers configured to receive Bloom filter updates with this option. |
-a |
Adds If If |
-C option value |
Sets server Important: This does not update the configuration file. Tthe next time the server is restarted, the configuration change will be lost. |
-c option |
Retrieves configuration value for specified option from server. If |
-d |
Removes If Note: If
all patterns are removed separately, then |
-e |
Clears LRC database. Removes all lfn,pfn mappings. |
-p |
Verifies that the server is responding. |
-q |
Causes RLS server to exit. |
-S |
Shows statistics and other information gathered by RLS server. Intended to be input into GRIS. |
-s |
Shows list of RLI servers and patterns being sent updates by the LRC server. If |
-t timeout |
Sets timeout (in seconds) for RLS server requests. The default
value is |
-u |
Causes LRC server to immediately start full softstate updates to any RLI servers previously added with the -a option. |
-v |
Shows version and exit. |
Client Tool (globus-rls-cli) for RLS
The RLS Client Tool (globus-rls-cli) provides a command line interface to some of the functions supported by RLS. It also supports an interactive interface (if command is not specified). In interactive move double quotes may be used to encode an argument that contains white space.
Synopsis
command [ -c ] [ -h ] [ -l reslimit ] [ -s ] [ -t timeout ] [ -u ] [ command ] rls-server
Options
The client command tool uses getopt for command line parsing.
Note: Some versions
will continue scanning for options (works that begin with a hyphen) for the
entire command, line, which makes it impossible to specify negative integer
or floating point value for an attribute. The workaround for this problem
is to tell getopt() that
there are no more options, by including 2 hyphens. For example, to
specify the value -2 you must enter --
-2.
-c |
Sets "clearvalues" flag when deleting an attribute (will remove any attribute value records when an attribute is deleted). |
-h |
Shows usage. |
-l reslimit |
Sets limit on number of results returned by wildcard query at a time. Zero means no limit. |
-s |
Uses SQL style wildcards (% and _). |
-t timeout |
Sets timeout (in seconds) for RLS server requests. Default is 30 seconds. |
-u |
Uses Unix style wildcards ( and ?). |
-v |
Shows version. |
Commands
add <lfn> <pfn> |
Adds pfn to mappings of lfn in an LRC catalog. |
attribute add <object> <attr> <obj-type> <attr-type> |
Adds an attribute to an object, object should be the lfn or pfn name. obj-type should be one of lfn or pfn. attr-type should be one of date, float int, or string. If <value> is of type date then it should be in the form "YYYY-MM-DD HH:MM:DD". |
attribute bulk add <object> <attr> <obj-type> |
Bulk adds attribute values. |
attribute bulk delete <object> <attr> <obj-type> |
Bulk deletes attributes. |
attribute bulk query <attr> <obj-type> <object> |
Bulk queries attributes. |
attribute define <attr> <obj-type> <attr-type> |
Defines new attribute. |
attribute delete <object> <attr> <obj-type> |
Removes attribute from object. |
attribute modify <object> <attr> <obj-type> <attr-type> |
Modifies the value of an attribute. |
attribute query <object> <attr> <obj-type> |
Retrieves value of specified attribute for object. |
attribute search <attr> <obj-type> <operator> <attr-type> |
Searches for objects which have the specified attribute matching operator and value. operator should be one of =, !=, >, >=, <, <=, like. |
attribute show <attr> <obj-type> |
Shows attribute definition. If attr is a hypen (-) then all attributes are shown. |
attribute undefine <attr> <obj-type> |
Deletes an attribute definition. Will return an error if any objects possess this attribute. |
bulk add <lfn> <pfn> [<lfn> <pfn> |
Bulk adds lfn, pfn mappings. |
bulk create <lfn> <pfn> [<lfn> <pfn> |
Bulk creates lfn, pfn mappings. |
bulk delete <lfn> <pfn> [<lfn> <pfn> |
Bulk deletes lfn, pfn mappings. |
bulk query lrc lfn [<lfn> ...] |
Bulk queries lrc for lfns. |
bulk query lrc pfn [<pfn> ...] |
Bulk queries lrc for pfns. |
bulk query rli lfn [<lfn> ...] |
Bulk queries rli for lfns. |
create <lfn> <pfn> |
Creates a new lfn, pfn mapping in an LRC catalog. |
delete <lfn> <pfn> |
Deletes a lfn, pfn mapping from an LRC catalog. |
exit |
Exits interactive session. |
help |
Prints help message. |
query lrc lfn <lfn> |
Queries an LRC server for mappings of lfn. |
query lrc pfn <pfn> |
Queries an LRC server for mappings to pfn. |
query rli lfn <lfn> |
Queries an RLI server for mappings of lfn. |
query wildcard lrc lfn <lfn-pattern> |
Performs a wildcarded query of an LRC server for mappings of lfn-pattern. Patterns use the standard Unix wildcard characters, an asterisk () matches 0 or more characters, and a question mark (?) matches any single character. |
query wildcard lrc pfn <pfn-pattern> |
Queries an LRC server for mappings to pfn-pattern. Patterns use the standard Unix wildcard characters, an asterisk () matches 0 or more characters, and a question mark (?) matches any single character. |
query wildcard rli lfn <lfn-pattern> |
Queries an RLI server for mappings of lfn-pattern. Patterns use the standard Unix wildcard characters, an asterisk () matches 0 or more characters, and a question mark (?) matches any single character. |
set reslimit <limit> |
Sets limit on number of results returned at a time by a wildcard query. Zero means no limit. |
set timeout <timeout> |
Sets timeout (in seconds) on calls to the RLS server. The
default value is |
version |
Shows version and exit. |
Graphical user interfaces
There is no support for this type of interface for RLS.
Troubleshooting
Information on troubleshooting can be found in the FAQ.