« September 2007 | Main | November 2007 »

October 2007

October 30, 2007

How to Decipher Grid Engine Statuses – Part I

status-board_000004506559XSmall.jpg

In all likelihood most of the Grid Engine (GE) end users and administrators have at some point invoked the qstat command and found themselves wondering what do some of the resulting queue and job status letters mean. While some of those letters are pretty intuitive (e.g., ‘E’ stands for error), some are not entirely trivial to decipher. Unfortunately, it does not seem to be very easy to find explanation for these statuses. One usually has to resort to digging through the qstat man pages or through the various GE software manuals that one can find on the web. So, I’ve compiled below information about possible queue statuses:

• a (alarm) – At least one of the load thresholds defined in the load_thresholds list of the queue configuration is currently exceeded. This state prevents GE from scheduling further jobs to that queue. You can find the reason for the alarm state using the qstat command with “-explain a” option.

• A (Alarm) – At least one of the suspend thresholds of the queue is currently exceeded. This state causes jobs running in that queue to be successively suspended until no threshold is violated. You can see the reason for this state using the qstat command with “-explain A” option.

• c (configuration ambiguous) – The queue instance configuration (specified in GE configuration files) is ambiguous. The state resolves when the configuration becomes unambiguous again. This state prevents you from scheduling further jobs to that queue instance. You can find detailed reasons why a queue instance entered this state in the sge_qmaster messages file, or by using the qstat command with “-explain c” option. For queue instances in this state, the cluster queue's default settings are used for the ambiguous attribute.

• C (Calendar suspended) – The queue has been suspended automatically using the GE calendar facility.

• d (disabled) – Queues are disabled and released using the qmod command. Disabling a queue will prevent new jobs to be scheduled for execution in that queue, but it will not affect jobs that are already running there.

• D (Disabled) – The queue has been disabled automatically using the GE calendar facility.

• E (Error) – The queue is in the error state. You can find the reason for this state using the qstat command with “-explain E” option.  Check that daemon's error log for information on how to resolve the problem, and clear the queue state afterwards using the qmod command with the -cq option.

• o (orphaned) – The current cluster queue's configuration and host group configuration no longer needs this queue instance. The queue instance is kept because unfinished jobs are still associated with it. The orphaned state prevents you from scheduling further jobs to that queue instance. It disappears from qstat output when these jobs finish. To help resolve an orphaned queue instance associated with a job, you use the qdel command. You can revive an orphaned queue instance by changing the cluster queue configuration so that the configuration covers that queue instance.

• s (suspended) – Queues are suspended and un-suspended using the qmod command. Suspending a queue suspends all jobs executing in that queue.

• S (Subordinate) – The queue has been suspended due to subordination to another queue. When queue is suspended, regardless of the cause, all jobs executing in that queue are suspended too.

• u (unknown) – The corresponding GE execution daemon (sge_execd) cannot be contacted.

I hope that those who are new to Grid Engine find the above descriptions useful. In Part II of this article I will cover possible job statuses.

October 29, 2007

Avoiding YAGMF (Yet Another Grid-Mapfile)

file-stack_000004145220XSmall.jpg

The grid infrastructure I work with daily is deploying more and more services based on the Globus Toolkit, and in particular on the Globus Toolkit Java Web services. Each of these services requires users to be authorized to invoke the service operations.

Most often the authorization is managed using our old Globus friend the static grid-mapfile. These grid-mapfiles work fine during development but as we scale out during production we hear the moans from the site administrators of "not another grid-mapfile!"

You can easily Google and find an entire zoo of projects aimed at helping production grids manage authorization for services. Each community seems to have its own effort and we can only hope at some point for a clear winner (I didn't say standard...and yes interoperability is nice but I still would like just a few "best of breed" tools that interoperate. I am naive in that way.)

What if, however, you are a grid architect or developer and you need to tie authorization to grid services into an existing authorization infrastructure? Does the solution necessarily have to involve pulling out authorization details from the legacy infrastrcuture, creating grid-mapfiles, and then having to manage all those grid-mapfiles?

No. A better approach might be to write your own authorization plugin for your Globus Toolkit Java Web services. It is surprisingly simple to do. Your approach might be as simple as writing one or two Java classes representing a Policy Decision Point (PDP) and/or a Policy Information Point (PIP).

Tim Freeman and Rachana Anathakrishnan have written a great tutorial on how to do just that.  If you are wondering how you can tie together Globus grid services and a legacy authorization infrastructure do give it a read before you add one more grid-mapfile to your grid fabric.

October 26, 2007

Dream Big, Dream Grid

oil_000004501460XSmall.jpg

Last time we talked about two similar yet different benefits of using grids. Today we will expand on that list with other benefits you might not have yet thought about. Just to be clear, we’re purely talking about technical benefits here, the business benefits are left for a whole other column.

Let’s first review what we found last time. The obvious benefits revolve around speedup of your parallel applications and higher throughput of your batch jobs. A typical example of the former is a crash-simulation with PAM-CRASH and MPI, a typical example of the latter is doing virtual high-throughput screening with applications such as LigandFit from Accelrys, where many potential drug targets are screened against a single protein target. But there are other less obvious use-cases for grid that can benefit you.

Imagine running a simulation that has many tweakable parameters that you’ve always set to a pre-set value. When you now move your computations to a grid, you might not need to get your results back any faster, so you could now opt to increase the accuracy of your computation by running the same simulation with different parameter sweeps on different nodes. Further expansion of your grid will suddenly increase the validity and accuracy of your results, rather than decrease runtime. An example of such computation can be found in the Oil and Gas industry where a more refined and accurate computational model of an oil-field can prevent costly dry holes.

One could assert that Monte Carlo situations are in fact also "accuracy-increasing" applications of grid, but there are two subtle differences. First, Monte Carlo simulations run usually on a much more massive scale, with thousands of very short simulations, where parameter sweep modeling typically utilizes larger models on a limited (less than a hundred) number of iterations. Second, typical Monte Carlo simulations only end once a pre-set certain resolution has been achieved,  regardless of the number of grid nodes to your disposal. As such, it is better to categorize Monte Carlo simulations in the "throughput" category.

Once you understand these three basic benefits (speed-up, throughput and accuracy), there’s really no limit to what your imagination can come up with in terms of new applications of grid. Take the Ligandfit example that I mentioned earlier. United Devices' recently retired grid.org looked at the throughput use-case and took it to the extreme by simply taking a protein crucial to the internal workings of cancer cells and running every single possible potential drug target in the library against that protein. It took a leap of imagination to dream up six years of running billions of drug targets against multiple proteins.

The most rewarding moment during a consulting engagement is when I see that users "get" the basic use-cases and start dreaming big. Can you dream big?  What can the grid do for you?

October 25, 2007

Why Model Scheduling Policies?

IMG_3767.jpg

Modeling is a very effective means in which to accurately measure the advantages of one scheduling policy over another in specific environments. High level abstraction models can be developed rapidly in order to observe efficiency benefits. In this type of an environment the most meaningful measurements that you would observe are the queue wait time of the jobs that have been submitted to the system as well as expansion factors that are partially derived from queue wait times. Although utilization is another measurement to observe, in a fully loaded system, high utilization is an already known fact and squeezing efficiency out of the system is more important, this is done by reducing queue wait times.

A prerequisite to accurate modeling is retrieving accurate job accounting data for the past year or more. This data is good for a number of reasons but the following two are most important. First, a modeler does not have to develop a distribution dataset of what is thought of as an accurate job data flow. Secondly, the data that is used is accurate as to job submission and run times, priority, and resources utilized. Expansion factor data can also be derived from part of this accounting data as well. All jobs are bounded in this environment and would eliminate any reservation slipping. In this modeling environment, you are attempting to improve on numbers that have already been produced in order to implement more efficient policies for the future.

Future segment: Developing an architecture for modeling a scheduling process that utilizes a priority queue policy with normal backfill algorithms.

October 22, 2007

No CPU Left Behind

playground_000002664739XSmall.jpg

For some time now, I've been really interested in the potential applications of grid computing in higher education and, possibly, in secondary education. So, I was really intrigued when I read about Google and IBM's computing cloud for students. Just looking at the headline, my first impression was that students anywhere would be able to have their own computing cloud to use as a playground for learning and experimentation. As it turns out, Google and IBM's computing cloud will be initially used by only five universities, with the goal of giving students a platform in which to learn about parallel programming and Internet-scale applications. Although still a very cool project, I thought this would be a good opportunity to share some ideas of how grid computing could end up benefiting education. Like fellow gridguru Tim Freeman, I'm a part of the Globus Virtual Workspaces project, so my ideas are biased towards how grid computing and workspaces could benefit education.

I have talked with many Computer Science and Engineering lecturers and professors at small colleges and universities who cannot teach certain courses for lack of computing resources. For example, while teaching an introductory programming course requires minimal computing resources (such as a computer lab), teaching a course on parallel programming or distributed systems may require more expensive resources. To get students to practice parallel programming in a somewhat realistic setting, you would like them to have access to a properly configured and maintained cluster. If, furthermore, you wanted to teach students how to set up a cluster, you would need a couple of clusters (ideally, one cluster per student) that the students could have unfettered access to.

There are two main issues with the above scenario. First of all, clusters aren't generally cheap, and some institutions can't afford one. Of course, you can easily build a cluster out of commodity hardware, but you also need someone to actually set it up and jiggle the handle whenever something goes awry. In one specific case, a department built a cluster with off-the-shelf PCs, and used it successfully... until the grad student charged with keeping the cluster running graduating. Apparently, that cluster has been sitting idly in a room for years now. Second, even if the institution can afford a cluster and a sysadmin, no sysadmin in his right mind is going to give root access to that cluster to undergrads, specially if that cluster is also used by researchers.

Enter virtual workspaces. In a nutshell, a virtual workspaces is an execution environment that you can dynamically and securely deploy on the grid with exactly the hardware and software you need. You need a 32-node dual CPU Linux cluster for a couple of hours to teach a parallel programming lab, with a very specific version of libfoobar installed on it? Just request a workspace for it, and that hardware will be allocated somewhere on the grid for you, and the software will be set up thanks to software contextualization, which Tim will discuss in his posts. There's no need for the institution to keep a cluster running 24/7, or even spend any time configuring a cluster (requiring a sysadmin, or burdening the lecturer or a grad student with this task). From a repository of ready-made workspaces, simply choose the one you want (or pay a one-time fee to have someone configure a workspace exactly the way you want it), deploy it on the grid ever Monday from 2pm to 4pm, and start teaching.

Unfortunately, we're not quite there yet, but virtual workspaces are being actively researched (yes, right now, even as you read this blog post!). Currently, virtual machines are the most promising vehicle to automagically stand up these custom execution environments on a grid. The Globus Virtual Workspaces Service, which uses the Xen VMM to instantiate workspaces, is still in a Technology Preview phase so, although you can still do a number of very cool things with it, you can't deploy arbitrary workspaces on arbitrary grids... yet. However, we're getting much closer, and in future blog posts I'll explain what progress we're making towards that goal.

When we do get there, I believe that workspaces stand to make really exciting contributions to Computer Science and Engineering education. Not only can they facilitate access to computational resources by underprivileged institutions, they can also enhance existing curriculums by enabling students to gain more practical experience than before (e.g., by giving each student their own cluster). In fact, workspaces will enable the creation of more complex "playgrounds", from virtual clusters to virtual grids, that students can use to learn and experiment.

October 19, 2007

Building Software Against Binary Globus Toolkit Releases

squarepeg_000004493197XSmall.jpg

Today I read about GridWay winning the “Best Demo Prize” at the EGEE 2007 Conference in Budapest (Congratulations to the GridWay Team!), and this reminded me about the problem of building applications against the binary Globus Toolkit (GT) releases. Namely, building software like GridWay against the binary GT install usually fails with link errors. The problem is that the .la files in the $GLOBUS_LOCATION/lib directory have hardcoded the original build path for the dependency libraries. This issue has been known for some time (see, e.g. GT bug #174), and it persists in the 4.0.x releases of the Toolkit. The easiest solution is to build and install your GT from sources. However, if this is not an option, one can use a script that modifies the hardcoded paths in the binary GT install (do not worry, the script does not modify binary files :-)):

#!/bin/sh

# fix_paths.sh
# Script for modifying hardcoded library dependency paths in the binary
# Globus Toolkit installation.

# Usage.
usage() {
  echo "Usage: $0 [oldPath] [newPath]"
}

oldPath=$1
newPath=$2

if [ $# -ne 2 ]; then
  usage 
  exit 1
fi

if [ "$GLOBUS_LOCATION" = "" ]; then
  echo "\$GLOBUS_LOCATION is not defined."
  exit 1
fi

echo "Replacing $oldPath by $newPath in various ASCII files."
cd $GLOBUS_LOCATION
# Try to avoid header files, *.gar and *.jar files, config xml files, etc.
fileList=`find . -type f ! -name '*.h' -a ! -name '*.gar' -a ! -name '*.xml' -a ! -name '*.jar' -a ! -name '*LICENSE*'`
cnt=0
for f in $fileList; do
  isAscii=`file $f | grep ASCII`
  if [ "$isAscii" != "" ]; then
    cmd="cat $f | sed 's?$oldPath?$newPath?g' > $f.tmp"
    eval $cmd
    diffPath=`diff $f.tmp $f`
    if [ "$diffPath" != "" ]; then
      echo "Fixing: $f"
      mv $f.tmp $f
      cnt=`expr $cnt + 1`
    else
      rm -f $f.tmp
    fi
  fi
done
echo "Fixed $cnt files."
exit 0

In order to use the above script, one has to determine the hardcoded paths by looking into one of the .la files in the $GLOBUS_LOCATION/lib directory. For example:

$ export GLOBUS_LOCATION=/scratch/veseli/devel/lib/globus-4.0.5/$ cd $GLOBUS_LOCATION/lib$ pwd/scratch/veseli/devel/lib/globus-4.0.5/lib$ grep dependency_libs libxmlsec1_openssl_gcc32.ladependency_libs=' -L/home/condor/execute/dir_22100/userdir/install/lib'$ ~/fix_paths.sh /home/condor/execute/dir_22100/userdir/install/lib /scratch/veseli/devel/lib/globus-4.0.5/libReplacing /home/condor/execute/dir_22100/userdir/install/lib by /scratch/veseli/devel/lib/globus-4.0.5/lib in various ASCII files.…Fixed 330 files.$ grep dependency_libs libxmlsec1_openssl_gcc32.ladependency_libs=' -L/scratch/veseli/devel/lib/globus-4.0.5/lib'

Once you correct the library dependency paths using this script, you should be able to compile and link external software packages against your binary GT installation.

October 17, 2007

Does your grid make Fords or Volvos?

volvo_000004230954XSmall.jpgAsk a user why they use a grid, a cluster, or any other type of distributed system and you’ll hear, “Why, to get my work done faster, of course.” But that’s an ambiguous statement at best, since it can mean two things: faster runtimes or higher throughput. And although they might seem similar, they’re really not.

Runtime is defined as the wallclock time it takes to complete one task. If you parallelize a task, for instance with MPI, or by taking advantage of the data splitting capabilities of Grid MP, you can get your job back in less time. If you can parallelize your job into 10 parallel sub-jobs and run it on 10 nodes, you can expect that job to complete on average in 1/10th of the time. Plus a bit of overhead of course, but let’s keep it simple for now.  In Volvo’s innovative Uddevalla plant, groups of workers assemble entire automobiles in less time than it takes for one worker to complete a whole car. So with 10 workers in a group, you could potentially make a car in 1/10th of the time.

However, sometimes your task cannot be parallelized any further, but you might have lots of them pending. Grids can still help since they can increase the throughput of your jobs. Queuing theory states that with 10 nodes and 10 jobs, you can still expect a unique job to complete on average in 1/10th of the runtime of a single job, without using any parallelism. In a traditional American automotive plant, the car advances on the assembly line and at no point more than one operator is working on one car, so there’s no parallelism involved. It might take up to a day before one car is completed from start to finish, but a new car rolls off the end of the line every few minutes.

So next time when a user brags about his fancy new cluster, ask him whether he’s producing Fords or Volvos.

October 11, 2007

Virtual Grid Nodes: The Tension

handstand_000004002888XSmall.jpg

Lately I have been putting a lot of thought into the challenges that grid managers face in building an enterprise grid.  Primarily they must support the various stakeholders throughout the enterprise, each of whom has their own sets of application workflows used to meet their business needs. 

The software packages that each interested group uses may have a significant overlap with one another, but the similarity stops there.  Because each group ostensibly has a different goal, the usage patterns are almost guaranteed to be unique.  This implies that the community as a whole will demand any of the following:

  • A wide range of operating systems including Linux, Microsoft Windows, or any of the varied flavors of Unix;
  • Support for multiple versions of the same software package; and
  • A wide range of operating environments particularly with respect to memory, CPU performance, network usage, and storage.

When you consider users’ needs in more detail, you will recognize that a number of implications further complicate things:

  • The set of applications that users wish to run will likely run under a two or more different major OS revisions (e.g. Linux kernel 2.4 versus 2.6 or Windows XP versus Vista);
  • Similarly, there are applications that steadfastly refuse to run under a specific patch level.  For example, a minor revision of the Linux kernel that is lacking a specific security patch might be required.  You might be able to force the software to install but then the software is likely to no longer be supported;
  • Off-the-shelf installations which seek to upgrade rather than coexist with a previous version;
  • Custom software that expects a very specific behavior from a package that has changed in its most recent update;
  • Software which requires particular kernel tuning which is not appropriate for general operation; and
  • Software packages which have 32/64-bit library compatibility issues;

Meanwhile, grid managers will most likely be focused on providing a stable, secure, and easy to maintain infrastructure that is both cost-effective and capable of meeting the users’ core requirements.  Clearly the priorities between the individual groups and the support team will be at odds much of the time.

The most elegant solution to these issues is to build a grid whose execution environments are all virtualized.  In this situation, each usage pattern would have its own environment tailored to its own unique needs while the core OS would be under the complete control of the infrastructure staff.  Clearly there would be a stakeholder driven set of virtual servers available for use on each node in the grid. 

It seems simple enough: rather than creating a complicated infrastructure that will not accommodate all of the situations your users will require, you simply will give them their own isolated operating environments.  As you might expect, nothing is that straightforward.  The standard tools that you use for grid and virtualization management do not work well in this architecture.

In future posts, we will explore the challenges and possible solutions in detail. In particular we will focus on:

-    Networking
-    Virtual Server Management
-    Job Scheduling
-    Performance Monitoring
-    Security
-    Data Lifecycle

Globus in Seattle Next Week

Next week is OGF21, where grid gurus from around the world assemble to discuss technologies, applications, standards, and how gray the weather is in Seattle.

We have organized a full day of Globus material on Wednesday October 17. We'll have overviews of old favorites such as GridFTP, RLS, OGSA-DAI and the GT4 distribution, as well as introductions to some of our many new Incubator projects: Shannon Hastings, OSU, discussing the service authoring tool Introduce, Steve Tuecke of UnivaUD discussing Data Catalyst, their open source higher level data solution, and Stephan Erberich who will overview the Internet2 IDEA Award-winning MEDICUS medical data tool, among others. Come hear about the latest updates and where Globus is going to next, and/or to talk to Globus architects and developers about things like:

  • Your applications and how you can apply Globus technologies
  • Problems or questions with Globus technologies
  • Your wish list for future Globus features
  • How to contribute your software to the dev.globus community

If you'd like to meet with someone from the Globus team in Seattle, please email us: we'll see you there!

October 08, 2007

Better Know a VM: Part 1 of 435

robot-handshake_000003470462XSmall.jpg

Every day we wake up to a new barrage of virtualization articles.  I can't even read them all anymore, instead scanning headlines guided by statistical sampling (or is that stochastic?).

The hype is thick in the air, but it's not entirely unfounded. Somewhere in there we can see grid computing's going to be affected long term by OS virtualization in one way or another.

In this series we'll look at what's happening with various grid-VM efforts, often through a Globus lens (I work on the Globus Virtual Workspaces project so it's almost going to be impossible to avoid that).

There's a tradeoff between application performance improvements and developer time.  Developers are expensive, development is time consuming.  Perhaps it's worth waiting an extra few hours for results if it means you can start right now and stop paying those fine people.  Obviously any particular calculation is going to be more nuanced than this, but I just wanted to set up an analogy.

In a similar vein, with virtualization you can take your prepared application+environment and get going on a new platform in minutes, not months.  Cycles can be acquired and the exact compute environments can be provisioned out to the provider site's nodes.  Resource consumption can be quantified well by the site (and even enforced at a fine grain).  Less of the client's and site's administrators time (someone's money) needs to be spent on setup, environment conflicts, etc.

For all this you may take a small performance hit, but sometimes that's just worth it.

It sounds perfect, maybe.  It's not quite, and we will look at a few problems, many of which only look temporary.  A lot of progress is being made to get rid of the complexity, encapsulate it better, or factor it in such a way that the person/role who should be handling that complexity actually does (instead of it being unecessarily multiplied or divided across many people/roles).

Part 2?  I'd like to talk about coordinating many VMs to work together, something being called contextualization.  The fightin' Contextualization!

(Apologies to Stephen Colbert)