The Lonely API: You Can Help!

I remain surprised that more grid developers and architects who rely on GridFTP for moving bits around have not tightly integrated data movement into their applications and infrastructure using the GridFTP Client API/library. A lot of folks spend a great deal of time wrapping scripts around the default GridFTP client globus-url-copy (guc) and begging the GridFTP developers to add features and functionality to guc.
Having a strong default client like guc is great but it cannot be the perfect tool for every scenario. I am convinced that a lot of workflow taking place on the grid could benefit from having applications and data placement more tightly integrated and one slick way to do it is to write custom GridFTP clients or add the functionality directly to existing applications and workflows.
I think grid developers and architects don't leverage the GridFTP client API/library because they think it is difficult. It is also hard to find comprehensive but straightforward examples (a GridFTP client API cookbook sure would be nice...). The code for the guc client is too hard for me to really get my head around because it is so entangled with some of the older Globus legacy code that isn't used much anymore (think GASS).
So below is a simple but useful client in C. This is all you need if you want to move a file from one GridFTP server to another (a third-party transfer) with some parallel data streams and some tuned TCP buffers.
The structure is simple. You need a function that will be used as a "callback" when the transfer is complete. It needs to take the form shown.
You need to create a "handle" and probably set some attributes on it (I like to set the cache_all attribute so that when moving multiple files from site to site the control connection stays open between files).
You need to create an "operation" attribute and decorate it with the details for data stream parallelism and TCP buffers sizes.
Then there is some boilerplate to initialize the library and wait for the callback to be called and the transfer to finish but it is all pretty straightforward.
So here a simple but useful client that you could easily extend for yourself:
#include "globus_ftp_control.h"
static globus_mutex_t lock;
static globus_cond_t cond;
static globus_bool_t done;
static globus_bool_t error = GLOBUS_FALSE;
/* this function will be called by the library
when the transfer completes */
static void fileTransferCompleteCallback(
void *user_arg,
globus_ftp_client_handle_t *handle,
globus_object_t *err)
{
char * tmpstr;
if(err){
fprintf(stdout,
"Transfer completed with error: %s\n",
tmpstr,
globus_object_printable_to_string(err));
}
globus_mutex_lock(&lock);
done = GLOBUS_TRUE;
globus_cond_signal(&cond);
globus_mutex_unlock(&lock);
}
int main(int argc, char **argv)
{
globus_ftp_client_handle_t handle;
globus_ftp_client_handleattr_t handle_attr;
globus_ftp_client_operationattr_t attr;
globus_result_t result;
globus_ftp_control_parallelism_t parallelism;
globus_ftp_control_tcpbuffer_t tcpbuffer;
/* source and destination URLs for the 3rd
party transfer */
char *src = "gsiftp://server1.com/file1";
char *dst = "gsiftp://server2.com/file2";
globus_module_activate(GLOBUS_FTP_CLIENT_MODULE);
globus_ftp_client_handleattr_init(&handle_attr);
globus_ftp_client_operationattr_init(&attr);
globus_mutex_init(&lock, GLOBUS_NULL);
globus_cond_init(&cond, GLOBUS_NULL);
/* cache control channel connections for efficiency */
globus_ftp_client_handleattr_set_cache_all(
&handle_attr, (globus_bool_t) GLOBUS_TRUE);
globus_ftp_client_handle_init(&handle,&handle_attr);
globus_ftp_client_operationattr_set_mode(
&attr, GLOBUS_FTP_CONTROL_MODE_EXTENDED_BLOCK);
/* use 4 parallel data streams for high throughput */
parallelism.mode = GLOBUS_FTP_CONTROL_PARALLELISM_FIXED;
parallelism.fixed.size = 4;
globus_ftp_client_operationattr_set_parallelism(
&attr,¶llelism);
/* use large TCP windows for WANs with latency */
tcpbuffer.mode = GLOBUS_FTP_CONTROL_TCPBUFFER_FIXED;
tcpbuffer.fixed.size = 1024 * 1024 * 1;
globus_ftp_client_operationattr_set_tcp_buffer(
&attr, &tcpbuffer);
done = GLOBUS_FALSE;
/* tell the servers to transfer the file */
result = globus_ftp_client_third_party_transfer(
&handle,
src,
&attr,
dst,
&attr,
GLOBUS_NULL,
fileTransferCompleteCallback,
0
);
/* wait until notified the transfer is complete */
globus_mutex_lock(&lock);
while(!done){
globus_cond_wait(&cond, &lock);
}
globus_mutex_unlock(&lock);
/* shut it all down and clean up nicely */
globus_ftp_client_handle_destroy(&handle);
globus_module_deactivate_all();
return 0;
}
Before compiling this you need to generate a file to be included with your Makefile. Just do
export GLOBUS_LOCATION=/path/to/globus/installation
source $GLOBUS_LOCATION/etc/globus-user-env.sh
globus-makefile-header --flavor=gcc32pthr globus_ftp_client globus_ftp_control > makefile.include
And finally here is a simple Makefile for compiling your custom client:
include makefile.include
myClient: myClient.o
$(GLOBUS_CC) -o myClient myClient.o \
$(GLOBUS_LDFLAGS) $(GLOBUS_LIBS) $(GLOBUS_PKG_LIBS)
myClient.o: myClient.c
$(GLOBUS_CC) $(GLOBUS_CFLAGS) $(GLOBUS_INCLUDES) \
-c -o myClient.o myClient.c
That's all it takes to create a custom but high performance GridFTP client--less than 100 lines of C code (even less in Python or Java).
So if you have to move bits around your grid and you ever thought 'it sure would be nice if globus-url-copy worked exactly how I need it to work' then try creating your own customized client. It's easier than you might have thought.

Thanks to
This entry was written by guest blogger Jeff Squyres from 





Checkpointing is one of the most useful features that