GT 3.9.5 Component Guide to Public Interfaces: GridFTP
Semantics and syntax of APIs
Programming Model Overview
The Globus FTP Client library provides a convenient way of accessing files
on remote FTP servers. In addition to supporting the basic FTP protocol, the
FTP Client library supports several security and performance extensions to
make FTP more suitable for Grid applications. These extensions are described
in the Grid FTP Protocol document.
In addition to protocol support for grid applications, the FTP Client library
provides a plugin architecture for installing application or grid-specific
fault recovery and performance tuning algorithms within the library. Application
writers may then target their code toward the FTP Client library, and by simply
enabling the appropriate plugins, easily tune their application to run it on
a different grid.
All applications which use the Globus FTP Client API must include the header
file "globus_ftp_client.h" and activate the GLOBUS_FTP_CLIENT_MODULE .
To use the Globus FTP Client API, one must create an FTP Client handle . This
structure contains context information about FTP operations which are being
executed, a cache of FTP control and data connections, and information about
plugins which are being used. The specifics of the connection caching and plugins
are found in the "Handle Attributes" section of the API documentation.
Once the handle is created, one may begin transferring files or doing other
FTP operations by calling the functions in the "FTP Operations" section of the API documentation. In addition to whole-file transfers, the API supports partial file transfers, restarting transfers from a known point, and various FTP directory management commands. All FTP operations may have a set of attributes, defined in the operationattr section, associated with them to tune various FTP parameters. The data structures and functions needed to restart a file transfer are described in the "Restart Markers" section of the API documentation. For operations which require the user to send to or receive data from an FTP server the must call the functions in the "globus_ftp_client_data" section
of the manual.
The globus_ftp_control library provides low-level services needed to implement
FTP client and servers. The API provided is protocol specific. The data transfer
portion of this API provides support for the standard data methods described
in the FTP Specification as well as extensions for parallel, striped, and
partial data transfer.
Component API
For information on the internationalization API, see
the C
Common Libraries Public Interface.
Semantics and syntax of the WSDL
GridFTP has no WSDL as it is not Web Service based at this time.
Command-line tools
globus-url-copy for GridFTP
Tool description
globus-url-copy is a scriptable, command line tool, that can do multi-protocol data movement. It supports gsiftp:// (GridFTP), ftp://, http://, https://, and file:/// protocol specifiers in the URL. For GridFTP, globus-url-copy supports all implemented functionality. Versions from GT3.2 and later support file globbing and directory moves.
Before you begin
YOU MUST HAVE A CERTIFICATE TO USE globus-url-copy!
| 1 |
First, as with all things Grid, you must have
a valid proxy certificate to run globus-url-copy.
If you do not have a certificate, you must obtain one.
If you are doing this for testing in your own environment, the Simple
CA provided with the Globus Tookit should suffice.
If not, you must contact the Virtual Organization (VO) with which
you are associated to see from whom you should request a certificate.
One common source is the DOE Science Grid CA, although you must confirm
whether or not the resources you wish to access will accept their certificates.
Instructions for proper installation of the certificate should be
provided from the source of the certificate. |
| 2 |
Now that you have a certificate, you must generate a
temporary proxy. Do this by running:
grid-proxy-init
Further documentation for grid-proxy-init can be found here. |
| 3 |
You are now ready to use globus-url-copy! See the following
sections for syntax and command line options. |
Command syntax
The basic syntax for globus-url-copy is:
globus-url-copy [optional command line switches] Source_URL Destination_URL
where:
[optional command line switches] |
See Command line options below for
a list of available options. |
<sourceURL> |
Specifies the original URL of the file(s) to be copied.
If this is a directory, all files within that directory will be copied. |
<destURL> |
Specifies the URL where you want to copy the files.
If you want to copy multiple files, this must be a directory. |
Note: Any url specifying a directory must end with /
URL prefixes
As of GT 3.2, we support the following URL prefixes:
file:// (on a local machine only)
ftp://
-
gsiftp://
http://
https://
By default, globus-url-copy is expecting the same kind of host
certificates that globusrun expects from gatekeepers.
Note: We do not provide an interactive client
similar to the generic FTP client provided with Linux. See Interactive
Client for information on an interactive client developed by NCSA / NMI
/ TeraGrid .
URL formats
URLs can be any valid URL as defined by RFC 1738 that have a protocol we
support. In general, they have the following format:
protocol://[host]:[port]/path
For example:
gsiftp://myhost.mydomain.com:2812/data/foo.dat |
Fully specified. |
http://myhost.mydomain.com/mywebpage/default.html |
Port not specified so uses protocol default, 80 in this case. |
file:///foo.dat |
Host not specified so it uses your local host, port not specified as
before. |
file:/foo.dat |
This is also valid, but is not recommended because...? |
Note: For FTP URLs, it is legal to specify a user name and
password in the URL as follows:
ftp://myname:mypassword@myhost.mydomain.com/foo.dat
This is highly discouraged as you will be sending your username
and password in plain text over the network. For servers provided in
the Globus Toolkit, username and password is not a permitted authentication
method and so this format will result in an error. The
exception to this is anonymous FTP access (how does this work in globus-url-copy).
Command line options
| Informational
Options |
-help | -usage |
Prints help. |
-version |
Prints the version of this program. |
-versions |
Prints the versions of all modules that this program
uses. |
-q | -quiet |
Suppresses all output for successful operation. |
-vb | -verbose |
During the transfer, displays:
- number of bytes transferred
- performance since the last update (currently every 5 seconds)
- average performance for the whole transfer.
|
-dbg | -debugftp |
Debugs FTP connections and prints the entire control
channel protocol exchange to STDERR.
Very useful for debugging. Please provide this any time you
are requesting assistance with a globus-url-copy problem. |
| Utility / Ease
of Use Options |
-a | -ascii |
Converts the file to/from ASCII format to/from local
file format. |
-b | -binary |
Does not apply any conversion to the files. This option
is turned on by default. |
-f <filename> |
Reads a list of URL pairs from a filename.
Each line should contain:
<sourceURL> <destURL>
Enclose URLs with spaces in double quotes ("). Blank lines and lines
beginning with # will be ignored. |
-r | -recurse |
Copies files in subdirectories |
-notpt | -no-third-party-transfers |
Turns third-party transfers off (on by default).
Site firewall and/or software configuration may prevent a connection
between the two servers (a third party transfer). If this is
the case, globus-url-copy will "relay" the data. It will do a
GET from the source and a PUT to the destination.
This obviously causes a performance penalty, but will allow you to
complete a transfer you otherwise could not do. |
| Reliability Options |
-rst | -restart |
Restarts failed FTP operations. |
-rst-retries <retries> |
Specifies the maximum number of times to retry the operation
before giving up on the transfer.
Use 0 for infinite.
The default value is 5. |
-rst-interval <seconds> |
Specifies the interval in seconds to wait after a failure
before retrying the transfer.
Use 0 for an exponential backoff.
The default value is 0. |
-rst-timeout <seconds> |
Specifies the maximum time after a failure to keep retrying.
Use 0 for no timeout.
The default value is 0. |
| Performance
Options |
-tcp-bs <size> | -tcp-buffer-size <size> |
Specifies the size (in bytes) of the TCP buffer to be
used by the underlying ftp data channels.
This is critical to good performance over the WAN. Use the bandwidth-delay
product as your buffer size. |
-p <parallelism> | -parallel <parallelism> |
Specifies the number of parallel data connections that
should be used.
This is one of the most commonly used options. |
-bs <block size> | -block-size <block size> |
Specifies the size (in bytes) of the buffer to be used
by the underlying transfer methods. |
| Security Related
Options |
-s <subject> | -subject <subject> |
Specifies a subject to match with both the source and
destination servers . |
-ss <subject> | -source-subject <subject> |
Specifies a subject to match with the source server. |
-ds <subject> | -dest-subject <subject> |
Specifies a subject to match with the destination server. |
-nodcau | -no-data-channel-authentication |
Turns off data channel authentication for FTP transfers
(the default is to authenticate the data channel).
We do not recommend this option as it is a security
risk. |
-dcsafe | -data-channel-safe |
Sets data channel protection mode to SAFE.
Otherwise known as integrity or checksumming.
Guarantees that the data channel has not been altered, though a malicious
party may have observed the data.
Rarely used as there is a substantial performance penalty. |
-dcpriv | -data-channel-private |
Sets data channel protection mode to PRIVATE.
The data channel is encrypted and checksummed.
Guarantees that the data channel has not been altered and, if observed,
it won't be understandable.
VERY rarely used due to the VERY substantial performance penalty. |
Notes about globus-url-copy
- A
globus-url-copy using the gsiftp protocol,
with no options (using all the defaults) will do a binary, stream mode (which
implies no parallelism) transfer, with whatever the host default TCP buffer
size is, <feel like there should be a verb here> encrypted and checksummed
control channel, and authenticated data channel.
- GridFTP (as well as normal FTP) defines multiple wire protocols, or MODES,
for the data channel.
Most normal FTP servers only implement stream mode, i.e. the bytes
flow in order over a single TCP connection. GridFTP defaults to this
mode so that it is compatible with normal FTP servers.
However, GridFTP has another MODE, called Extended Block Mode, or MODE
E. This mode sends the data over the data channel in blocks. Each
block consists of 8 bits of flags, a 64 bit integer indicating the offset
from the start of the transfer, and a 64 bit integer indicating the length
of the block in bytes, followed by a payload of length bytes. Because
the offset and length are provided, out of order arrival is acceptable, i..e,
the 10th block could arrive before the 9th because you know explicitly where
it belongs. This allows us to use multiple TCP channels. If you
use the -p | -parallelism option, globus-url-copy automatically puts the
servers into MODE E.
Note: Putting -p 1 is not the same as no -p at
all. Both will use a single stream, but the default will use stream
mode and -p 1 will use MODE E.
- For more information on TCP buffer sizes and related information, try <here>.
- If you run a GridFTP server by hand, you will need to
explicitly specify the subject name to expect. You can use the
-ss flag
to set the sourceURL subject, and -ds to set the destURL subject. If
you use -s alone, it will set both to be the same. You
can see an example of this usage under the Verification section of this guide. Please
note: This is the unusual case of using this
client. Most times you only need to specify both URLs.
Limitations
There are no limitations for globus-url-copy in GT 3.9.5.
Interactive clients for GridFTP
The Globus Project does not provide an interactive client
for GridFTP. Any normal FTP client will work with a GridFTP server, but
it cannot take advantage of the advanced features of GridFTP. The interactive
clients listed below take advantage of the advanced features of GridFTP.
There is no endorsement implied by their presence here. We make no
assertion as to the quality or approriateness of these tools, we simply provide
this for your convenience. We will not answer questions,
accept bugs, or in any way shape or form be responsible for these tools, although
they should have mechanisms of their own for such things.
UberFTP was developed at the NCSA under the auspices of NMI
and TeraGrid. It is available through NMI (a convenient place to get
Globus and other tools as well, btw), or directly from NCSA:
Overview of Graphical User Interface
Globus does not provide any interactive client for GridFTP, either GUI or text
based. However, NCSA, as part of there TeraGrid activity, produces a text based
interactive client called UberFTP, which you may want to check out. See Interactive
clients for more information.
Semantics and syntax of domain-specific interface
Interface introduction
The Globus implementation of the GridFTP server draws on:
- three IETF RFCs:
- RFC 959
- RFC 2228
- RFC 2389
- an IETF Draft: MLST-16
- the GridFTP protocol
specification, which is a Global Grid Forum (GGF) Standard: GFD.020
Syntax of the interface
The command line tools and the client library completely hide the details of the protcol from the user and developer. Unless you choose to use the control library, it is not necessary to have a detailed knowledge of the protocol.
Configuration interface
GridFTP server configuration overview
Note: Command line options and configuration file options
may both be used but the command line overrides the config
file.
The configuration file is read from the following locations,
in the given order. Only the first found will be loaded.
- Path specified with the
-c <configfile> command line option.
- $GLOBUS_LOCATION/etc/gridftp.conf
- /etc/grid-security/gridftp.conf
Options are allowed one per line, with the format:
<option> <value>
If the value contains spaces, they should be enclosed in double-quotes (")
Flags or boolean options should only have a value of 0 or 1
Blank lines and lines begining with # are ignored.
For example:
port 5000
allow_anonymous 1
anonymous_user bob
banner "Welcome!"
GridFTP server configuration options
The table below lists config file options, associated command line options (if available) and descriptions. Note that any boolean option can be negated on the command line by preceding the specified option with '-no-' or '-n'. example: -no-cas or -nf.
| Informational Options |
|
|
Show usage information and exit. Default value: FALSE |
longhelp <0|1> |
-hh
-longhelp |
|
Show more usage information and exit. Default value: FALSE |
version <0|1> |
-v
-version |
|
Show version information for the server and exit. Default value: FALSE |
versions <0|1> |
-V
-versions |
|
Show version information for all loaded globus libraries and exit. Default value: FALSE |
| Modes of Operation |
|
|
Run under an inetd service. Default value: FALSE |
|
|
Run as a daemon. All connections will fork off a new process and setuid if allowed. Default value: TRUE |
|
|
Run as a background daemon detached from any controlling terminals. Default value: FALSE |
exec <string> |
-exec <string> |
|
For staticly compiled or non-GLOBUS_LOCATION standard binary locations, specify the full path of the server binary here. Only needed when run in daemon mode. Default value: not set |
|
|
Change directory when the server starts. This will change directory to the dir specified by the chdir_to option. Default value: TRUE |
chdir_to <string> |
-chdir-to <string> |
|
Directory to chdir to after starting. Will use / if not set. Default value: not set |
|
|
Server will fork for each new connection. Disabling this option is only recommended when debugging. Default value: TRUE |
|
|
Exit after a single connection Default value: FALSE |
| Authentication, Authorization, and Security Options |
auth_level <number> |
-auth-level <number> |
|
0 = Disables all authorization checks. 1 = Authorize identity only. 2 = Authorize all file/resource accesses. If not set uses level 2 for front ends and level 1 for data nodes. Default value: not set |
allow_from <string> |
-allow-from <string> |
|
Only allow connections from these source ip addresses. Specify a comma seperated list of ip address fragments. A match is any ip address that starts with the specified fragment. Example: '192.168.1.' will match and allow a connection from 192.168.1.45. Note that if this option is used any address not specifically allowed will be denied. Default value: not set |
deny_from <string> |
-deny-from <string> |
|
Deny connections from these source ip addresses. Specify a comma seperated list of ip address fragments. A match is any ip address that starts with the specified fragment. Example: '192.168.2.' will match and deny a connection from 192.168.2.45. Default value: not set |
|
|
Enable CAS authorization. Default value: TRUE |
secure_ipc <0|1> |
-si
-secure-ipc |
|
Use GSI security on ipc channel. Default value: TRUE |
ipc_auth_mode <string> |
-ia <string>
-ipc-auth-mode <string> |
|
Set GSI authorization mode for the ipc connection. Options are: none, host, self or subject: Default value: host |
allow_anonymous <0|1> |
-aa
-allow-anonymous |
|
Allow cleartext anonymous access. If server is running as root anonymous_user must also be set. Disables ipc security. Default value: FALSE |
anonymous_names_allowed <string> |
-anonymous-names-allowed <string> |
|
Comma seperated list of names to treat as anonymous users when allowing anonymous access. If not set, the default names of 'anonymous' and 'ftp' will be allowed. Use '*' to allow any username. Default value: not set |
anonymous_user <string> |
-anonymous-user <string> |
|
User to setuid to for an anonymous connection. Only applies when running as root. Default value: not set |
anonymous_group <string> |
-anonymous-group <string> |
|
Group to setgid to for an anonymous connection. If unset, the default group of anonymous_user will be used. Default value: not set |
pw_file <string> |
-password-file <string> |
|
Enable cleartext access and authenticate users against this /etc/passwd formatted file. Default value: not set |
connections_max <number> |
-connections-max <number> |
|
Maximum concurrent connections allowed. Only applies when running in daemon mode. Unlimited if not set. Default value: not set |
connections_disabled <0|1> |
-connections-disabled |
|
Disable all new connections. Does not affect ongoing connections. This would have be set in the configuration file and then the server issued a SIGHUP in order to reload that config. Default value: FALSE |
| Logging Options |
log_level <string> |
-d <string>
-log-level <string> |
|
Log level. A comma seperated list of levels from: 'ERROR, WARN, INFO, DUMP, ALL'. Example: error,warn,info. You may also specify a numeric level of 1-255. Default value: ERROR |
log_module <string> |
-log-module <string> |
|
globus_logging module that will be loaded. If not set, logfile options apply. Default value: not set |
log_single <string> |
-l <string>
-logfile <string> |
|
Path of a single file to log all activity to. If neither this option or log_unique is set, logs will be written to stderr unless the execution mode is detached or inetd, in which case logging will be disabled. Default value: not set |
log_unique <string> |
-L <string>
-logdir <string> |
|
Partial path to which 'gridftp.(pid).log' will be appended to construct the log filename. Example: -L /var/log/gridftp/ will create a seperate log ( /var/log/gridftp/gridftp.xxxx.log ) for each process (which is normally each new client session). If neither this option or log_single is set, logs will be written to stderr unless the execution mode is detached or inetd, in which case logging will be disabled. Default value: not set |
log_transfer <string> |
-Z <string>
-log-transfer <string> |
|
Log netlogger style info for each transfer into this file. Default value: not set |
log_filemode <number> |
-log-filemode <number> |
|
File access permissions of log files. Should be an octal number such as 0644 (the leading 0 is required). Default value: not set |
disable_usage_stats <0|1> |
-disable-usage-stats |
|
Disable transmission of per-transfer usage statistics. See the Usage Statistics section in the online documentation for more information. Default value: FALSE |
usage_stats_target <string> |
-usage-stats-target <string> |
|
Comma seperated list of contact strings for usage statistics listeners. Default value: not set |
| Single and Striped Remote Data Node Options |
remote_nodes <string> |
-r <string>
-remote-nodes <string> |
|
Comma seperated list of remote node contact strings. Default value: not set |
data_node <0|1> |
-dn
-data-node |
|
This server is a backend data node. Default value: FALSE |
stripe_blocksize <number> |
-sbs <number>
-stripe-blocksize <number> |
|
Size in bytes of sequential data that each stripe will transfer. Default value: 1048576 |
stripe_layout <number> |
-sl <number>
-stripe-layout <number> |
|
Stripe layout. 1 = Partitioned, 2 = Blocked. Default value: 2 |
stripe_blocksize_locked <0|1> |
-stripe-blocksize-locked |
|
Do not allow client to override stripe blocksize with the OPTS RETR command Default value: FALSE |
stripe_layout_locked <0|1> |
-stripe-layout-locked |
|
Do not allow client to override stripe layout with the OPTS RETR command Default value: FALSE |
| Disk Options |
blocksize <number> |
-bs <number>
-blocksize <number> |
|
Size in bytes of data blocks to read from disk before posting to the network. Default value: 262144 |
sync_writes <0|1> |
-sync-writes |
|
Flush disk writes before sending a restart marker. This attempts to ensure that the range specified in the restart marker has actually been committed to disk. This option will probably impact performance, and may result in different behavior on different storage systems. See the manpage for sync() for more information. Default value: FALSE |
| Network Options |
port <number> |
-p <number>
-port <number> |
|
Port on which a frontend will listend for client control channel connections, or on which a data node will listen for connections from a frontend. If not set a random port will be chosen and printed via the logging mechanism. Default value: not set |
control_interface <string> |
-control-interface <string> |
|
Hostname or IP address of the interface to listen for control connections on. If not set will listen on all interfaces. Default value: not set |
data_interface <string> |
-data-interface <string> |
|
Hostname or IP address of the interface to use for data connections. If not set will use the current control interface. Default value: not set |
ipc_interface <string> |
-ipc-interface <string> |
|
Hostname or IP address of the interface to use for ipc connections. If not set will listen on all interfaces. Default value: not set |
hostname <string> |
-hostname <string> |
|
Effectively sets the above control_interface, data_interface and ipc_interface options. Default value: not set |
ipc_port <number> |
-ipc-port <number> |
|
Port on which the frontend will listen for data node connections. Default value: not set |
| Timeouts |
control_preauth_timeout <number> |
-control-preauth-timeout <number> |
|
Time in seconds to allow a client to remain connected to the control channel without activity before authenticating. Default value: 120 |
control_idle_timeout <number> |
-control-idle-timeout <number> |
|
Time in seconds to allow a client to remain connected to the control channel without activity. Default value: 600 |
ipc_idle_timeout <number> |
-ipc-idle-timeout <number> |
|
Idle time in seconds before an unused ipc connection will close. Default value: 600 |
ipc_connect_timeout <number> |
-ipc-connect-timeout <number> |
|
Time in seconds before cancelling an attempted ipc connection. Default value: 60 |
| User Messages |
banner <string> |
-banner <string> |
|
Message to display to the client before authentication. Default value: not set |
banner_file <string> |
-banner-file <string> |
|
File to read banner message from. Default value: not set |
banner_terse <0|1> |
-banner-terse |
|
When this is set, the minimum allowed banner message will be displayed to unauthenticated clients. Default value: FALSE |
login_msg <string> |
-login-msg <string> |
|
Message to display to the client after authentication. Default value: not set |
login_msg_file <string> |
-login-msg-file <string> |
|
File to read login message from. Default value: not set |
| Module Options |
load_dsi_module <string> |
-dsi <string> |
|
Data Storage Interface module to load. file and remote modules are defined by the server. Defaults to file unless the 'remote' option is specified, in which case the remote DSI is loaded. Default value: file |
allowed_modules <string> |
-allowed-modules <string> |
|
Comma seperated list of ERET/ESTO modules to allow, and optionally specify an alias for. Example: module1,alias2:module2,module3 (module2 will be loaded when a client asks for alias2). Default value: not set |
| Other |
configfile <string> |
-c <string> |
|
Path to configuration file that should be loaded. Otherwise will attempt to load $GLOBUS_LOCATION/etc/gridftp.conf and /etc/grid-security/gridftp.conf. Default value: not set |
use_home_dirs <0|1> |
-use-home-dirs |
|
Set the startup directory to the authenticated users home dir. Default value: TRUE |
|
|
Sets options that make server easier to debug. Not recommended for production servers. Default value: FALSE |
Configuring the GridFTP server to run under xinetd/inetd
Note: The service name used (gsiftp in this case) should
be defined in /etc/services with the desired port.
Here is a sample gridftp server xinetd config entry:
service gsiftp
{
instances = 100
socket_type = stream
wait = no
user = root
env += GLOBUS_LOCATION=(globus_location)
env += LD_LIBRARY_PATH=(globus_location)/lib
server = (globus_location)/sbin/globus-gridftp-server
server_args = -i
log_on_success += DURATION
nice = 10
disable = no
}
Here is a sample gridftp server inetd config entry: (read as a single line)
gsiftp stream tcp nowait root /usr/bin/env env \
GLOBUS_LOCATION=(globus_location) \
LD_LIBRARY_PATH=(globus_location)/lib \
(globus_location)/sbin/globus-gridftp-server -i
Environment variable interface
The GridFTP server or client libraries do not read any environment variable
directly, but the security and networking related variables described below
may be useful.