GT 4.1.1 RLS : Developer's Guide

1. Introduction

This guide contains information of interest to developers working with the RLS. It provides reference information for application developers, including APIs, architecture, procedures for using the APIs and code samples.

2. Before you begin

2.1. Feature summary

Features New in GT 4.1.1

  • A Web service interface for RLS has been developed and is seperately distributed as WS-RLS (see GT 4.1.1 WS RLS for more information).

Other Supported Features

  • Comprehensive C and Java library for replica registration, replica lookup, replica attributes, index queries, and administrative tasks.
  • Command line (globus-rls-cli) tool for client operations on catalogs and indexes.
  • Command line (globus-rls-admin) tool for administrative tasks.

Deprecated Features

  • None

2.2. Tested platforms

Tested platforms for RLS include most 32-bit flavors of Linux and UNIX, including RedHat, Solaris, and others.

2.3. Backward compatibility summary

Protocol changes since GT 4.0.4

  • None

API changes since GT 4.0.4

  • None

Exception changes since GT 4.0.4

  • None

Schema changes since GT 4.0.4

  • None

2.4. Technology dependencies

RLS depends on the following GT components:

  • globus_core
  • globus_common
  • globus_io
  • globus_gssapi_gsi
  • globus_usage

RLS depends on the following 3rd party software:

  • RDBMS: MySQL, PostgreSQL, or Oracle
  • ODBC manager: unixODBC, iODBC
  • ODBC driver: MyODBC, psqlODBC, or Oracle

2.5. Security considerations

Security recommendations include:

  • Dedicated User Account: It is recommended that users create a dedicated user account for installing and running the RLS service (e.g., globus as recommended in the general GT installation instructions). This account may be used to install and run other services from the Globus Toolkit.
  • Key and Certificate: It is recommended that users do not use their hostkey and hostcert for use by the RLS service. Create a containerkey and containercert with permissions 400 and 644 respectively and owned by the globus user. Change the rlskeyfile and rlscertfile settings in the RLS configuration file ($GLOBUS_LOCATION/etc/globus-rls-server.conf) to reflect the appropriate filenames.
  • LRC and RLI Databases: Users must ensure security of the RLS data as maintained by their chosen database management system. Appropriate precautions should be made to protect the data and access to the database. Such precautions may include creating a user account specifically for RLS usage, encrypting database users' passwords, etc.
  • RLS Configuration: It is recommended that the RLS configuration file ($GLOBUS_LOCATION/etc/globus-rls-server.conf) be owned by and accessible only by the dedicated user account for RLS (e.g., globus account per above recommendations). The file contains the database user account and password used to access the LRC and RLI databases along with important settings which, if tampered with, could adversely affect the RLS service.

3. Architecture and design overview

The Replica Location Service design consists of two components. Local Replica Catalogs (LRCs) maintain consistent information about logical-to-physical mappings on a site or storage system. The Replica Location Indexes (RLIs) aggregate state information contained in one or more LRCs and build a global, hierarchical distributed index to support discovery of replicas at multiple sites. LRCs send summaries of their state to RLIs using soft state update protocols. The server consists of a multi-threaded front end server and a back-end relational database, such as MySQL or PostgreSQL. The front end server can be configured to act as an LRC server and/or an RLI server. Clients access the server via a simple string-based RPC protocol. The client APIs support C, Java and Python. The APIs contain operations to create and delete mappings, associate attributes with mappings, and perform queries.

Detailed information on the architecture and design can be found in A Framework for Constructing Scalable Replica Location Services and Performance and Scalability of a Replica Location Service.

4. Public interface

4.1. Semantics and syntax of APIs

4.1.1. Programming Model Overview

The RLS provides a Client API for C and Java based clients. The RLS Client C API is provided in the form of a library (e.g., .so file). Any installation of RLS will include the shared library as part of the $GLOBUS_LOCATION/include and $GLOBUS_LOCATION/lib directories. The RLS Client Java API depends on the shared library, which it links to via the Java Native Interface (JNI).

4.2. Semantics and syntax of the WSDL

There is no support for this type of interface for RLS.

4.3. Semantics and syntax of non-WSDL protocols

[describe other protocols. if none, state so.]

4.4. Command line tools

Please see the RLS Command Reference.

4.5. Overview of Graphical User Interface

There is no support for this type of interface for RLS.

4.6. Semantics and syntax of domain-specific interface

There is no support for this type of interface for RLS.

4.7. Configuration interface

4.7.1. Configuration overview

RLS configuration involves statically-defined, system settings as defined in the RLS configuration file (see $GLOBUS_LOCATION/etc/globus-rls-server.conf), settings changed temporarally at run-time using the RLS Admin tool (see globus-rls-admin(1) -C option value command), and finally LRC-to-RLI and RLI-to-RLI updates configured using the RLS Admin tool (see globus-rls-admin(1) -a, -A, -d commands).

4.7.2. Server configuration file (globus-rls-server.conf)

Configuration settings for the RLS are specified in the globus-rls-server.conf file. If the configuration file is not specified on the command line (see the -c option) then it is looked for in both:

  • $GLOBUS_LOCATION/etc/globus-rls-server.conf
  • /usr/local/etc/globus-rls-server.conf if GLOBUS_LOCATION is not set
[Note]Note

Command line options always override items found in the configuration file.

The configuration file is a sequence of lines consisting of a keyword, whitespace, and a value. Comments begin with # and end with a newline.

4.7.3. Basic configuration

Review the server configuration file $GLOBUS_LOCATION/etc/globus-rls-server.conf and change any options you want. The server man page globus-rls-server(8) has complete details on all the options. The complete details are also provided later in this section.

A minimal configuration file for both an LRC and RLI server would be:

# Configure the database connection info
  db_user       dbuser
  db_pwd        dbpassword
   
# If the server is an LRC server
  lrc_server    true
  lrc_dbname    lrc1000
   
# If the server is an RLI server
  rli_server    true
  rli_dbname    rli1000 # Not needed if updated by Bloom filters
   
# Configure who can make requests of the server
  acl .*: all

# RE matching grid-mapfile users or DNs from x509 certs
...
    

4.7.4. Host key and certificate configuration

The server uses a host certificate to identify itself to clients. By default this certificate is located in the files /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem. Host certificates have a distinguished name of the form /CN=host/FQDN. If the host you plan to run the RLS server on does not have a host certificate, you must obtain one from your Certificate Authority. The RLS server must be run as the same user who owns the host certificate files (typically root). The location of the host certificate files may be specified in $GLOBUS_LOCATION/etc/globus-rls-server.conf:

rlscertfile     path-to-cert-file   # default /etc/grid-security/hostcert.pem
rlskeyfile      path-to-key-file    # default /etc/grid-security/hostkey.pem
    

It is possible to run the RLS server without authentication, by starting it with the -N option, and using URL's of the form rlsn://server to connect to it. Notice that the URL scheme is rlsn as opposed to rls.

It is generally recommended to run the server with a user account other than root for added security. In order to do so, you will need to create complimentary key and certificate files owned by a designated user account, globus for instance.

  1. Begin by copying the /etc/grid-security/hostcert.pem and /etc/grid-security/hostkey.pem to /etc/grid-security/containercert.pem and /etc/grid-security/constainerkey.pem. Note that we use the prefix "container" to conform with the recommended naming scheme for other services distributed with the Globus Toolkit.

    % cp /etc/grid-security/hostcert.pem /etc/grid-security/containercert.pem
    % cp /etc/grid-security/hostkey.pem /etc/grid-security/containerkey.pem
                
  2. Then change ownership of the files to the designated user account, globus in our example.

    % chown globus /etc/grid-security/containercert.pem
    % chown globus /etc/grid-security/containerkey.pem
                
  3. Change the rlskeyfile and rlscertfile settings in the RLS configuration file ($GLOBUS_LOCATION/etc/globus-rls-server.conf) to reflect the appropriate filenames.

    rlscertfile     /etc/grid-security/containercert.pem
    rlskeyfile      /etc/grid-security/containerkey.pem
                
  4. Finally, bear in mind that your certificate and key files must always have file permissions 644 and 400 respectively.

    % ls -l /etc/grid-security/*.pem
    -rw-r--r--    1 globus  gridstaff      818 Dec  8  2005 /etc/grid-security/containercert.pem
    -r--------    1 globus  gridstaff      887 Dec  8  2005 /etc/grid-security/containerkey.pem
    -rw-r--r--    1 root     root          818 Dec  8  2005 /etc/grid-security/hostcert.pem
    -r--------    1 root     root          887 Dec  8  2005 /etc/grid-security/hostkey.pem
                

If authentication is enabled, RLI servers must include acl configuration options that match the identities of LRC servers that update it and that grant the rli_update permission to the LRCs.

4.7.5. Configuring LRC to RLI updates

One of the key benefits to using the RLS for managing replica location information is its distributed architecture. In a distributed deployment, one or more Local Replica Catalog (LRC) services will send updates of its contents to one or more Replica Location Index (RLI) services.

By default the installed LRC is not configured to send updates to any RLI, even the local RLI co-located with the local LRC. Use the globus-rls-admin(1) tool to configure the LRC to send updates to one or more RLI services.

  • To configure the LRC to send uncompressed lists of its logical names to a RLI, use the following command:

    % $GLOBUS_LOCATION/bin/globus-rls-admin -a rls://rli_host rls://lrc_host
                
  • To configure the LRC to send compressed bitmaps (using Bloom filters) of its logical names to a RLI, use the following command:

    % $GLOBUS_LOCATION/bin/globus-rls-admin -A rls://rli_host rls://lrc_host
                
  • To configure the LRC to stop sending updates to a RLI, use the following command:

    % $GLOBUS_LOCATION/bin/globus-rls-admin -d rls://rli_host rls://lrc_host
                
[Note]Note

While any given LRC is capable of sending uncompressed or compressed updates to any RLI. The RLI service must be configured to accept either uncompressed or compressed updates but not both. See the rli_bloomfilter setting of the RLS configuration file for more details.

There are tradeoffs between using uncompressed and compressed updates in your configuration. The advantage of using compressed updates, not surprisingly, is a significant reduction in network overhead and memory usage. As replica location mappings grow into the 10's of millions or more, the savings of using compressed updates becomes important. On the other hand, due to the compressed nature of the Bloom filter bitmap used to represent the logical names in the LRC, the wildcard query at the RLI cannot be supported when update compression is used.

4.7.6. Configuring the RLS Server for the MDS4 Index Service

The server package includes a script $GLOBUS_LOCATION/libexec/aggrexec/globus-rls-aggregatorsource.pl that may be used as an Execution Aggregator Source by MDS4. See GT 4.1.1 Index Services for more information on setting up and using the Execution Aggregator Source scripts in MDS4. The script may be invoked as follows and will generate output in the format as depicted.

% $GLOBUS_LOCATION/libexec/aggrexec/globus-rls-aggregatorsource.pl rls://mysite
<?xml version="1.0" encoding="UTF-8"?>
<rlsStats>
  <site>rls://mysite</site>
  <version>4.0</version>
  <uptime>03:08:15</uptime>
  <serviceList>
    <service>lrc</service>
    <service>rli</service>
  </serviceList>
  <lrc>
    <updateMethodList>
      <updateMethod>lfnlist</updateMethod>
      <updateMethod>bloomfilter</updateMethod>
    </updateMethodList>
    <updatesList>
      <updates>
        <site>rls://myothersite:39281</site>
        <method>bloomfilter</method>
        <date>08/01/05</date>
        <time>16:16:38</time>
      </updates>
    </updatesList>
    <numlfn>283902</numlfn>
    <numpfn>593022</numpfn>
    <nummap>593022</nummap>
  </lrc>
  <rli>
    <updatedViaList>
      <updatedVia>bloomfilters</updatedVia>
    </updatedViaList>
    <updatedByList>
      <updatedBy>
        <site>rls://myothersite:39281</site>
        <date>08/01/05</date>
        <time>10:03:21</time>
      </updatedBy>
    </updatedByList>
  </rli>
</rlsStats>
    

4.7.7. Configuring the RLS Server for the MDS2 GRIS

The server package includes a program called globus-rls-reporter that will report information about an RLS server to the MDS2 GRIS. Use this procedure to enable this program:

  1. To enable Index Service reporting, add the contents of the file $GLOBUS_LOCATION/setup/globus/rls-ldif.conf to the MDS2 GRIS configuration file $GLOBUS_LOCATION/etc/grid-info-resource-ldif.conf.
  2. If necessary, set your virtual organization (VO) name in $GLOBUS_LOCATION/setup/globus/rls-ldif.conf . The default value is local. The VO name is referenced twice, on the lines beginning dn: and args:.
  3. You must restart your MDS (GRIS) server after modifying $GLOBUS_LOCATION/etc/grid-info-resoruce-ldif.conf You can use the following commands to do so:
$GLOBUS_LOCATION/sbin/SXXgris stop
$GLOBUS_LOCATION/sbin/SXXgris start
    

4.7.8. Complete RLS Server settings (globus-rls-server.conf)

This section describes the complete details of the RLS Server configuration settings.

Table 1. Complete RLS Server settings (globus-rls-server.conf)

acl user: permission [permission]

acl entries may be a combination of DNs and local usernames. If a DN is not found in the gridmap file then the file is used to search the acl list.

A gridmap file may also be used to map DNs to local usernames, which in turn are matched against the regular expressions in the acl list to determine the user's permissions.

user is a regular expression matching distinguished names (or local usernames if a gridmap file is used) of users allowed to make calls to the server.

There may be multiple acl entries, with the first match found used to determine a user's privileges.

[permission] is one or more of the following values:

  • lrc_read Allows client to read an LRC.
  • lrc_update Allows client to update an LRC.
  • rli_read Allows client to read an RLI.
  • rli_update Allows client to update an RLI.
  • admin Allows client to update an LRC's list of RLIs to send updates to.
  • stats Allows client to read performance statistics.
  • all Allows client to do all of the above.
authentication true|false

Enable or disable GSI authentication.

The default value is true.

If authentication is enabled (true), clients should use the URL schema rls: to connect to the server.

If authentication is not enabled (false), clients should use the URL schema rlsn:.

db_pwd password

Password to use to connect to the database server.

The default value is changethis.

db_user databaseuser

Username to use to connect to database server.

The default value is dbperson.

idletimeout seconds

Seconds after which idle connections close.

The default value is 900.

loglevel N Sets loglevel to N (default is 0). Higher levels mean more verbosity.
lrc_bloomfilter_numhash N

Number of hash functions to use in Bloom filters.

The default value is 3.

Possible values are 1 through 8.

This value, in conjunction withlrc_bloomfilter_ratio, will determine the number of false positives that may be expected when querying an RLI that is updated via Bloom filters.

Note: The default values of 3 and 10 give a false positive rate of approximately 1%.

lrc_bloomfilter_ratio N

Sets ratio of bloom filter size (in bits) to number of LFNs in the LRC catalog (in other words, size of the Bloom filter as a multiple of the number of LFNs in the LRC database.) This is only meaningful if Bloom filters are used to update an RLI. Too small a value will generate too many false positives, while too large a value wastes memory and network bandwidth.

The default value is 10.

Note: The default values of 3 and 10 give a false positive rate of approximately 1%.

lrc_buffer_time N

LRC to RLI updates are buffered until either the buffer is full or this much time in seconds has elapsed since the last update.

The default value is 30.

lrc_dbname

Name of LRC database.

The default value is lrcdb.

lrc_server true|false

If LRC server, the value should be true.

The default value is false.

lrc_update_bf seconds

Interval in seconds between LRC to RLI updates when the RLI is updated by Bloom filters. In other words, how often an LRC server does a Bloom filter soft state update.

This can be much smaller than the interval between updates without using Bloom filters (lrc_update_ll).

The default value is 300.

lrc_update_factor N If lrc_update_immediate mode is on, and the LRC server is in sync with an RLI server (an LRC and RLI are synced if there have been no failed updates since the last full soft state update), then the interval between RLI updates for this server (lrc_update_ll) is multiplied by the value of this option.
lrc_update_immediate true|false

Turns LRC to RLI immediate mode updates on (true) or off (false).

The default value is false.

lrc_update_ll seconds

Number of seconds before an LRC server does an LFN list soft state update.

The default value is 86400.

lrc_update_retry seconds

Seconds to wait before an LRC server will retry to connect to an RLI server that it needs to update.

The default value is 300.

maxbackoff seconds

Maximum seconds to wait before re-trying listen in the event of an I/O error.

The default value is 300.

maxfreethreads N

Maximum number of idle threads. Excess threads are killed.

The default value is 5.

maxconnections N

Maximum number of simultaneous connections.

The default value is 100.

maxthreads N

Maximum number of threads running at one time.

The default value is 30.

myurl URL

URL of server.

The default value is rls://<hostname>:port

odbcini filename

Sets environment variable ODBCINI.

If not specified, and ODBCINI is not already set, then the default value is $GLOBUS_LOCATION/var/odbc.ini.

pidfile filename

Filename where pid file should be written.

The default value is $GLOBUS_LOCATION/var/<programname>.pid.

port N

Port the server listens on.

The default value is 39281.

result_limit limit

Sets the maximum number of results returned by a query.

The default value is 0 (zero), which means no limit.

If a query request includes a limit greater than this value, an error (GLOBUS_RLS_BADARG) is returned.

If the query request has no limit specified, then at most result_limit records are returned by a query.

rli_bloomfilter true|false

RLI servers must have this set to accept Bloom filter updates.

If true, then only Bloom filter updates are accepted from LRCs.

If false, full LFN lists are accepted.

Note: If Bloom filters are enabled, then the RLI does not support wildcarded queries.

rli_bloomfilter_dir none|default|pathname

If an RLI is configured to accept bloom filters (rli_bloomfilter true), then Bloom filters may be saved to this directory after updates.

This directory is scanned when an RLI server starts up and is used to initialize Bloom filters for each LRC that updated the RLI.

This option is useful when you want the RLI to recover its data immediately after a restart rather than wait for LRCs to send another update.

If the LRCs are updating frequently, this option is unnecessary and may be wasteful in that each Bloom filter is written to disk after each update.

  • none

    Bloom filters are not saved to disk.

    This is the default.

  • default

    Bloom filters are saved to the default directory:

    • $GLOBUS_LOCATION/var/rls-bloomfilters if GLOBUS_LOCATION is set
    • else, /tmp/rls-bloomfilters
  • pathname

    Bloom filters are saved to the named directory.

    Any other string is used as the directory name unchanged.

    The Bloom filter files in this directory have the name of the URL of the LRC that sent the Bloom filter, with slashes(/) changed to percent signs (%) and ".bf" appended.

rli_dbname database

Name of the RLI database.

The default value is rlidb.

rli_expire_int seconds

Interval (in seconds) between RLI expirations of stale entries. In other words, how often an RLI server will check for stale entries in its database.

The default value is 28800.

rli_expire_stale seconds

Interval (in seconds) after which entries in the RLI database are considered stale (presumably because they were deleted in the LRC).

The default value is 86400.

This value should be no smaller than lrc_update_ll.

Stale RLI entries are not returned in queries.

Note: If the LRC server is responding, this value is not used. Instead the value of lrc_update_ll or lrc_update_bf is retrieved from the LRC server, multiplied by 1.2, and used as the value for this option.

rli_server true|false

If an RLI server, the value should be true.

The default value is false.

rlscertfile filename

Name of the X.509 certificate file identifying the server.

This value is set by setting environment variable X509_USER_CERT.

rlskeyfile filename

Name of the X.509 key file for the server.

This value is set by setting environment variable X509_USER_KEY.

startthreads N

Number of threads to start initially.

The default value is 3.

timeout seconds Timeout (in seconds) for calls to other RLS servers (e.g., for LRC calls to send an update to an RLI).

4.8. Environment variable interface

There is no support for this type of interface for RLS.

5. Usage scenarios

This section provides examples illustrating the basic usage of the client interfaces supported by the RLS. Using the client API, developers may create client applications that interact with the RLS server to perform replica location operations.

Developing in C

Client applications developed in C must do both of the following:

  1. Include the client header file at $GLOBUS_LOCATION/include/globus_rls_client.h.
  2. Link to the client shared library at $GLOBUS_LOCATION/lib/libglobus_rls_client_gcc32dbgpthr.

For C language example code, click here.

Developing in Java

Client applications developed in Java must do all of the following:

  1. Include the RLS Jar, $GLOBUS_LOCATION/lib/rls.jar, in the CLASSPATH.
  2. Import the RLS Package org.globus.replica.rls.*.
  3. Depend on the client shared library via the Java Native Interface (JNI).

For Java language example code, click here.

6. Tutorials

There are no tutorials available at this time.

7. Debugging

To run the RLS server in debug mode, use the -d option along with the -L num option (e.g., $GLOBUS_LOCATION/bin/globus-rls-server -d -L 3). The -d option instructs the RLS server to direct log output to stdout, while the -L num option sets the log level where a higher num results in more detailed output.

8. Troubleshooting

Information on troubleshooting can be found in the FAQ.

9. Related Documentation

For additional details, see the RPC Protocol Description.