- Doc Structure
- A Globus Primer
- Globus Is Modular!
- Installing GT
- Platform Notes
- Migrating from GT2
- Migrating from GT3
- PDF version
- Best Practices
- Coding Guidelines
- API docs
- Public Interfaces
- Resource Properties
- Performance Studies
Table of Contents
- 1. Introduction
- 2. Building and Installing
- 3. Configuring
- 4. Using MySQL
- 5. Deploying
- 6. Testing
- 7. Security Considerations
- 8. Troubleshooting
- 9. Usage statistics collection by the Globus Alliance
This guide contains advanced configuration information for system administrators working with RFT. It provides references to information on procedures typically performed by system administrators, including installation, configuring, deploying, and testing the installation.
This information is in addition to the basic Globus Toolkit prerequisite, overview, installation, security configuration instructions in GT 4.1.2 System Administrator's Guide. Read through this guide before continuing!
RFT is used to perform third-party transfers across GridFTP servers. It uses a database to store its state periodically so the transfers can be recovered from any failures. RFT uses standard grid security mechanisms for authorization and authentication of the users. In order to effectively use RFT you should have installed and configured a database with RFT database schemas and have the necessary security infrastructure in place to perform a 3rd party transfer.
RFT is built and installed as part of a default GT 4.1.2 installation. No extra installation steps are required for this component.
The following are specialized instructions for advanced developers who want to deploy latest code from CVS:
Build RFT from CVS:
Configure your CVSROOT to point to the globus CVS location.
cvs co ws-transfer
Set GLOBUS_LOCATION to point to your globus installation.
RFT has the following prerequisites:
- Java WS Core - This is built and installed in a GT 4.1.2 System Administrator's Guide.
- A host certificate (see GT 4.1.2 System Administrator's Guide).
- GridFTP - GridFTP performs the actual file transfer and is built and installed in a GT 4.1.2 System Administrator's Guide.
- PostgreSQL - PostgreSQL is used to store the state of the transfer to allow for restart after failures. The interface to PostgreSQL is JDBC, so any DBMS that supports JDBC can be used, although no others have been tested. For instructions on configuring the PostgreSQL database for RFT, see below. .
The security of the service can be configured by modifying the security descriptor. It allows for configuring the credentials that will be used by the service, type of authentication and authorization that needs to be enforced. By default, the following security configuration is installed:
- Credentials set for use by the container are used. If they arenot specified, default credentials are used.
- GSI Secure conversation authentication is enforced for all methods.
Note: Changing the required authentication and authorization method will require suitable changes to the clients that contact this service.
To alter the security descriptor configuration, refer to security descriptor. The file to be altered is
PostgreSQL (version 7.1 or greater) needs to be installed and configured for RFT to work. You can either use the packages which came with your operating system (RPMs, DEBs, ...) or build from source. We used PostgreSQL version 7.3.2 for our testing and the following instructions are good for the same.
Install PostgreSQL. Instructions on how to install/configure PostgreSQL can be found here.
Configure the postmaster daemon so that it accepts TCP connections. This can be done by adding the "-o -i" switch to the postmaster script (This is either the init.d script found in /etc/init.d/postgresql or /var/lib/, depending on how you installed PostgreSQL). Follow the instructions here to start the postmaster with the -i option.
You will now need to create a PostgreSQL user that will connect to the database. This is usually the account under which the container is running. You can create a PostgreSQL user by running the following command:
su postgres; createuser globus. If you get the following error:
psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"?this generally means that either your postmaster is not started with the -i option or you didn't restart the postmaster after the above mentioned step.
Now you need to set security on the database you are about to create. You can do it by following the steps below:
sudo vi /var/lib/pgsql/data/pg_hba.confand append the following line to the file:
host rftDatabase "username" "host-ip" 255.255.255.255 md5Note: use crypt instead of md5 if you are using PostgreSQL 7.3 or earlier.
sudo /etc/init.d/postgresql restart
To create the database that is used for RFT run (as user globus):
To populate the RFT database with the appropriate schemas run:
psql -d rftDatabase -f $GLOBUS_LOCATION/share/globus_wsrf_rft/rft_schema.sql. Now that you have created a database to store RFT's state, the following steps configure RFT to find the database:
dbConfigurationsection under the
connectionStringto point to the machine on which you installed PostgreSQL and to the name of the database you used in step 2. If you installed PostgreSQL on the same machine as your Globus install, the default should work fine for you.
userNameto the name of the user who owns/created the database and do the same for the password (it also depends on how you configured your database).
Don't worry about the other parameters in the section. The defaults should work fine for now.
Edit the configuration section under
ReliableFileTransferService. There are two values that can be edited in this section:
backOff: Time in seconds you want RFT to backoff before a failed transfer is retried by RFT. The default should work fine for now.
maxActiveAllowed: This is the number of transfers the container can do at given point. The default should be fine for now.
With a default GT 4.1.2 installation, the RFT service is automatically registered with the default WS MDS Index Service running in the same container for monitoring and discovery purposes.
There is a jndi resource defined in
$GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml as follows :
<resource name="mdsConfiguration" type="org.globus.wsrf.impl.servicegroup.client.MDSConfiguration"> <resourceParams> <parameter> <name>reg</name> <value>true</value> </parameter> <parameter> <name>factory</name> <value>org.globus.wsrf.jndi.BeanFactory</value> </parameter> </resourceParams> </resource>
To configure the automatic registration of RFT to the default WS MDS Index Service, change the value of the parameter
<reg> as follows:
trueturns on auto-registration; this is the default in GT 4.1.2.
falseturns off auto-registration.
By default, the following resource properties (from the RFT Factory Resource) are sent to the default Index Service:
ActiveResourceInstances: A dynamic resource property of the total number of active RFT resources in the container at a given point of time.
TotalNumberOfTransfers: A dynamic resource property of the total number of transfers/deletes performed since the RFT service was deployed in this container.
TotalNumberOfActiveTransfers: A dynamic resource property of the number of active transfers across all rft resources in a container at a given point of time.
TotalNumberOfBytesTransferred: A dynamic resource property of the total number of bytes transferred by all RFT resources created since the deployment of the service.
RFTFactoryStartTime: Time when the service was deployed in the container. Used to calculate uptime.
DelegationServiceEPR: The end point reference of the Delegation resource that holds the delegated credential used in executing the resource.
You can configure which resource properties are sent in RFT's registration.xml file,
The following is the relevant section of the file:
<Content xsi:type="agg:AggregatorContent" xmlns:agg="http://mds.globus.org/aggregator/types"> <agg:AggregatorConfig xsi:type="agg:AggregatorConfig"> <agg:GetMultipleResourcePropertiesPollType xmlns:rft="http://www.globus.org/namespaces/2004/10/rft"> <!-- Specifies that the index should refresh information every 60000 milliseconds (once per minute) --> <agg:PollIntervalMillis>60000</agg:PollIntervalMillis> <!-- specifies that all Resource Properties should be collected from the RFT factory --> <agg:ResourcePropertyNames>rft:TotalNumberOfBytesTransferred</agg:ResourcePropertyNames> <agg:ResourcePropertyNames>rft:TotalNumberOfActiveTransfers</agg:ResourcePropertyNames> <agg:ResourcePropertyNames>rft:RFTFactoryStartTime</agg:ResourcePropertyNames> <agg:ResourcePropertyNames>rft:ActiveResourceInstances</agg:ResourcePropertyNames> <agg:ResourcePropertyNames>rft:TotalNumberOfTransfers</agg:ResourcePropertyNames> </agg:GetMultipleResourcePropertiesPollType> </agg:AggregatorConfig> <agg:AggregatorData/> </Content>
If a third party needs to register an RFT service manually, see Registering with mds-servicegroup-add in the WS MDS Aggregator Framework documentation.
RFT in 4.1.2 works with MySQL database. A MySQL schema file is provided at $GLOBUS_LOCATION/share/globus_wsrf_rft/rft_schema_mysql.sql. You will need to download MySQL drivers (MySQL connector/J 3.1 and not 3.0) from here and copy the driver jar to $GLOBUS_LOCATION/lib. You will also need to make following changes :
Create a RFT Database and populate it with mysql schema.
mysqladmin -h hostname create rftDatabase -p mysql -h hostname -D rftDatabase source share/globus_wsrf_rft/rft_schema_mysql.sql
If you are using older ( earlier than 4.1 ) versions of MySQL, you will need to use the schema
$GLOBUS_LOCATION/share/globus_wsrf_rft/rft_schema_mysql_pre4.0.sqlto make RFT work. See Bug 3633 for more details.
$GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xmland change the following values:
for connectionString, change
for driverName, change
and change the userName and password to whatever was set when users were created for MySQL.
RFT is deployed as part of a standard toolkit installation.
RFT has been tested to work without any additional setup when deployed into Tomcat. Please follow these basic instructions to deploy GT4 services into Tomcat.
You need to configure the GT4 install with the needed RFT configuration (like database configuration, etc) before you deploy into Tomcat.
$GLOBUS_LOCATIONto point to your Globus install.
Start a gridftp server on the machine you are running the tests on the default port. This can be done by running:
$GLOBUS_LOCATION/sbin/globus-gridftp-server -p 2811 &
Start the container with RFT deployed in it.
$GLOBUS_LOCATION/share/globus_wsrf_rft_test/test.properties. Put in appropriate values for properties like:
authzValue (self or host),
HOST (host IP of container),
PORT (port on which the container is listening),
sourceHost and destinationsHost (hostnames of GridFTP servers).
The default values will work fine if you are running the tests with a standard stand-alone container started with user credentials (self authorization is done in this case).
If the container is started using host credentials, change authzValue to host.
If the GridFTP servers you are using for your testing are started as user, you need to supply subject names of the users in sourceSubject and destinationSubject for authorization with GridFTP servers.
If both the source and destination servers are started as one user, you can just fill in the user's subject in the subject field of test.properties.
If you are getting Authentication/Authorization Failures because of mismatched subject names, then your authzVal and authType (uses transport security by default) need to be changed, depending on how you started the container. If you started the container with the -nosec option, then you need to change authType to GSI_MESSAGE, PROTOCOL to http and PORT to 8080.
The *.xfr files in
$GLOBUS_LOCATION/share/globus_wsrf_rft_test/are the transfer files that will be used in the tests. Again, the default values work fine if you followed the instructions so far.
Run the following command, which will run all the RFT unit tests:
ant -Dtests.jar=$GLOBUS_LOCATION/lib/globus_wsrf_rft_test.jar -f share/globus_wsrf_rft_test/runtests.xml
Run the following command to generate the test reports in html form:
ant -f share/globus_wsrf_rft_test/runtests.xml generateTestReport
The service configuration files such as
etc/<gar>/ directory) contain private information such as database passwords and usernames. Ensure that these configuration files are only readable by the user that is running the container.
The deployment process automatically sets the permissions of
server-config.wsdd as user readable only. However, this might not work correctly on all platforms and this does not apply to any other configuration files.
RFT stores the transfer requests in a database. Proper security measures need to be taken to protect the access of the data by granting/revoking appropriate permissions on tables that are created for RFT use and other steps that are appropriate and consistent with site specific security measures.
Problem: If RFT is not configured properly to talk to a PostgreSQL database, you will see this message displayed on the console when you start the container:
"Error creating RFT Home: Failed to connect to database ... Until this is corrected all RFT request will fail and all GRAM jobs that require staging will fail".
Solution: The usual cause is that Postmaster is not accepting TCP connections, which means that you must restart Postmaster with the -i option (see Configuring RFT).
Problem: Make RFT print more verbose error messages
Solution: Edit $GLOBUS_LOCATION/container-log4j.properties
and add the following line to it:
log4j.category.org.globus.transfer=DEBUG. For more verbosity add
log4j.category.org.globus.ftp=DEBUG, which will print out Gridftp
RFT uses PostgreSQL to check-point transfer state in the form of restart markers and recover from transient transfer failures, using retry mechanism with exponential backoff, during a transfer. RFT has been tested to recover from source and/or destination server crashes during a transfer, network failures, container failures (when the machine running the container goes down), file system failures, etc. RFT Resource is implemented as a PersistentResource, so ReliableFileTransferHome gets initialized every time a container gets restarted. Please find a more detailed description of fault-tolerance and recovery in RFT below:
- Source and/or destination GridFTP failures: In this case RFT retries the transfer for a configurable number of maximum attempts with exponential backoff for each retry (the backoff time period is configurable also). If a failure happens in the midst of a transfer, RFT uses the last restart marker that is stored in the database for that transfer and uses it to resume the transfer from the point where it failed, instead of restarting the whole file. This failure is treated as a container-wide backoff for the server in question. What this means is that all other transfers going to/from that server, across all the requests in a container, will be backed off and retried. This is done in order to prevent further failures of the transfers by using knowledge available in the database.
- Network failures: Sometimes this happens due to heavy load on a network or for any other reason packets are lost or connections get timed out. This failure is considered a transient failure and RFT retries the transfer with exponential backoff for that particular transfer (and not the whole container, as with the source and/or destination GridFTP failures).
- Container failures: These type of failures occur when the machine running the container goes down or if the container is restarted with active transfers. When the container is restarted, it restarts ReliableTransferHome, which looks at the database for any active RFT resources and restarts them.
The following usage statistics are sent by default in a UDP packet at the end of life time of each RFT Resource (or when a RFT resource is destroyed).
- Total number of files transferred by RFT since RFT was installed
- Total number of bytes transferred by RFT since RFT was installed
- Total number of files transferred in this RFT Resource
- Total number of bytes transferred in this RFT Resource
- Creation time of this RFT Resource
- Factory Start Time
We have made a concerted effort to collect only data that is not too intrusive or private, and yet still provides us with information that will help improve the GRAM component. Nevertheless, if you wish to disable this feature, please see the "Usage Statistics Configuration" section of Configuring Java WS Core for instructions.
Also, please see our policy statement on the collection of usage statistics.