GT 3.9.5 GridFTP : User's Guide
- Introduction
- Command-line tools
- Graphical user interfaces
- Troubleshooting
- Usage statistics collection by the Globus Alliance
Introduction
[End user-friendly introduction AND references to the Toolkit-level User's Guide where they can find general end user-oriented information]
Command-line tools
globus-url-copy for GridFTP
Tool description
globus-url-copy is a scriptable, command line tool, that can do multi-protocol data movement. It supports gsiftp:// (GridFTP), ftp://, http://, https://, and file:/// protocol specifiers in the URL. For GridFTP, globus-url-copy supports all implemented functionality. Versions from GT3.2 and later support file globbing and directory moves.
Before you begin
YOU MUST HAVE A CERTIFICATE TO USE globus-url-copy!
| 1 | First, as with all things Grid, you must have a valid proxy certificate to run globus-url-copy. If you do not have a certificate, you must obtain one. If you are doing this for testing in your own environment, the Simple CA provided with the Globus Tookit should suffice. If not, you must contact the Virtual Organization (VO) with which you are associated to see from whom you should request a certificate. One common source is the DOE Science Grid CA, although you must confirm whether or not the resources you wish to access will accept their certificates. Instructions for proper installation of the certificate should be provided from the source of the certificate. |
| 2 | Now that you have a certificate, you must generate a temporary proxy. Do this by running: grid-proxy-init Further documentation for grid-proxy-init can be found here. |
| 3 | You are now ready to use globus-url-copy! See the following
sections for syntax and command line options. |
Command syntax
The basic syntax for globus-url-copy is:
globus-url-copy [optional command line switches] Source_URL Destination_URL
where:
[optional command line switches] |
See Command line options below for a list of available options. |
<sourceURL> |
Specifies the original URL of the file(s) to be copied. If this is a directory, all files within that directory will be copied. |
<destURL> |
Specifies the URL where you want to copy the files. If you want to copy multiple files, this must be a directory. |
Note: Any url specifying a directory must end with /
URL prefixes
As of GT 3.2, we support the following URL prefixes:
file://(on a local machine only)ftp://-
gsiftp:// http://https://
By default, globus-url-copy is expecting the same kind of host
certificates that globusrun expects from gatekeepers.
Note: We do not provide an interactive client similar to the generic FTP client provided with Linux. See Interactive Client for information on an interactive client developed by NCSA / NMI / TeraGrid .
URL formats
URLs can be any valid URL as defined by RFC 1738 that have a protocol we support. In general, they have the following format:
protocol://[host]:[port]/path
For example:
gsiftp://myhost.mydomain.com:2812/data/foo.dat |
Fully specified. |
http://myhost.mydomain.com/mywebpage/default.html |
Port not specified so uses protocol default, 80 in this case. |
file:///foo.dat |
Host not specified so it uses your local host, port not specified as before. |
file:/foo.dat |
This is also valid, but is not recommended because...? |
Note: For FTP URLs, it is legal to specify a user name and password in the URL as follows:
ftp://myname:mypassword@myhost.mydomain.com/foo.dat
This is highly discouraged as you will be sending your username and password in plain text over the network. For servers provided in the Globus Toolkit, username and password is not a permitted authentication method and so this format will result in an error. The exception to this is anonymous FTP access (how does this work in globus-url-copy).
Command line options
Notes about globus-url-copy
- A
globus-url-copyusing thegsiftpprotocol, with no options (using all the defaults) will do a binary, stream mode (which implies no parallelism) transfer, with whatever the host default TCP buffer size is, <feel like there should be a verb here> encrypted and checksummed control channel, and authenticated data channel.
- GridFTP (as well as normal FTP) defines multiple wire protocols, or MODES,
for the data channel.
Most normal FTP servers only implement stream mode, i.e. the bytes flow in order over a single TCP connection. GridFTP defaults to this mode so that it is compatible with normal FTP servers.
However, GridFTP has another MODE, called Extended Block Mode, or MODE E. This mode sends the data over the data channel in blocks. Each block consists of 8 bits of flags, a 64 bit integer indicating the offset from the start of the transfer, and a 64 bit integer indicating the length of the block in bytes, followed by a payload of length bytes. Because the offset and length are provided, out of order arrival is acceptable, i..e, the 10th block could arrive before the 9th because you know explicitly where it belongs. This allows us to use multiple TCP channels. If you use the -p | -parallelism option, globus-url-copy automatically puts the servers into MODE E.
Note: Putting-p 1is not the same as no-pat all. Both will use a single stream, but the default will use stream mode and-p 1will use MODE E.
- For more information on TCP buffer sizes and related information, try <here>.
- If you run a GridFTP server by hand, you will need to
explicitly specify the subject name to expect. You can use the
-ssflag to set the sourceURL subject, and-dsto set the destURL subject. If you use-salone, it will set both to be the same. You can see an example of this usage under the Verification section of this guide. Please note: This is the unusual case of using this client. Most times you only need to specify both URLs.
Limitations
There are no limitations for globus-url-copy in GT 3.9.5.
Interactive clients for GridFTP
The Globus Project does not provide an interactive client for GridFTP. Any normal FTP client will work with a GridFTP server, but it cannot take advantage of the advanced features of GridFTP. The interactive clients listed below take advantage of the advanced features of GridFTP.
There is no endorsement implied by their presence here. We make no assertion as to the quality or approriateness of these tools, we simply provide this for your convenience. We will not answer questions, accept bugs, or in any way shape or form be responsible for these tools, although they should have mechanisms of their own for such things.
UberFTP was developed at the NCSA under the auspices of NMI and TeraGrid. It is available through NMI (a convenient place to get Globus and other tools as well, btw), or directly from NCSA:
- NMI Download: http://nsf-middleware.org/
- NCSA Uberftp only download: http://dims.ncsa.uiuc.edu/set/uberftp/download/index.html
- UberFTP User's Guide: http://teragrid.ncsa.uiuc.edu/Doc/Data/uberftp.html
Graphical user interfaces
Globus does not provide any interactive client for GridFTP, either GUI or text based. However, NCSA, as part of there TeraGrid activity, produces a text based interactive client called UberFTP, which you may want to check out. See Interactive clients for more information.
Troubleshooting
[user-friendly help on common problems they may encounter]
Usage statistics collection by the Globus Alliance
The following GridFTP-specific usage statistics are sent in a UDP packet at the end of each transfer, in addition to the standard header information described in the Usage Stats section.
- Start time of the transfer
- End time of the transfer
- Version string of the server
- TCP buffer size used for the transfer
- Block size used for the transfer
- Total number of bytes transferred
- Number of parallel streams used for the transfer
- Number of stripes used for the transfer
- Type of transfer (STOR, RETR, LIST)
- FTP response code -- Success or failure of the transfer
We have made a concerted effort to collect only data that is not too intrusive or private, and yet still provides us with information that will help improve and gauge the usage of the GridFTP server. Nevertheless, if you wish to disable this feature for GridFTP only, see the Logging section of the GridFTP configuration and command line options. Note that you can disable transmission of usage statistics globally for all C components by setting "GLOBUS_USAGE_OPTOUT=1" in your environment.
Also, please see our policy statement on the collection of usage statistics.