Personal tools
You are here: Home Local Grid Users
Document Actions

Using Nebraska Grid Interfaces

by admin last modified 2007-02-01 13:50

The documentation explains how to use the grid interfaces here at Nebraska.

The idea behind grids is that you submit to a cluster of clusters.  To do this, some new paradigms for authentication and application/data movement must be learned.  It is difficult at first, but well worth it when you realize how many idle processors you will have access to.

Authentication


For authentication, we use Public Key Infrastructure with Globus.  To use this, you must first have a user certificate.  The application process is explained here:

Applying for a UNL grid certificate


Once you have recieved your certificate, you will need to gain access to a submission machine.  Instead of submitting jobs from a headnode for the grid, one submits to a "User Interface" (UI), which has the necessary grid software installed, but is not attached to any particular cluster on the grid.  The local UI is osg-test2.unl.edu.  In order to use this, you must get a shell account and install your certificate.  Instructions here:

Installing your certificate on the Grid UI


Once you have your grid certificate installed, you must be added to our local Virtual Organization.  Right now, this must be done manually by a site administrator, but will be done through a separate website at a later date.

Simple Jobs and Data Movement

 Here are some simple globus-related commands.

Initializing the Globus proxy:

[brian@red ~]$ grid-proxy-init 
Your identity: /DC=org/DC=doegrids/OU=People/CN=Brian Bockelman 504307
Enter GRID pass phrase for this identity:
Creating proxy ........................................................... Done
Your proxy is valid until: Fri Feb  2 07:40:29 2007
[brian@red ~]$

Testing authentication:

[brian@osg-test1 ~]$ globusrun -a -r red.unl.edu

GRAM Authentication test successful
[brian@osg-test1 ~]$

Initializing the VOMS proxy:

Try one, errors out:

[brian@gpn-husker ~]$ voms-proxy-init --voms gpn:/gpn
VOMS Server for gpn not known!
[brian@gpn-husker ~]$

Add GPN line to /opt/glite/etc/vomses file:

"gpn" "t2.unl.edu" "15002" "/DC=org/DC=doegrids/OU=Services/CN=voms/t2.unl.edu" "gpn"

Second try, work:

[brian@gpn-husker ~]$ voms-proxy-init --voms gpn:/gpn
Your identity: /DC=org/DC=doegrids/OU=People/CN=Brian Bockelman 504307
Enter GRID pass phrase:
Your proxy is valid until Fri Feb  2 07:47:14 2007

Creating temporary proxy ................................................................................ Done
Contacting  t2.unl.edu:15002 [/DC=org/DC=doegrids/OU=Services/CN=voms/t2.unl.edu] "gpn"
 Done
Creating proxy ................................................ Done
Your proxy is valid until Fri Feb  2 07:47:14 2007

[brian@gpn-husker ~]$

Running a simple command on the host machine:

globus-job-run red.unl.edu:/jobmanager-fork /bin/hostname

Notes:

Somethings to note with globus-job-run:

  • The full path must be specified on the command.  Just putting 'hostname' will return an error.
  • jobmanager-fork runs commands as a process on the headnode of the cluster NOT in the batch queue.  For most circumstances this is highly undesirable.  To launch a job into the batch queue use jobmanager-pbs

Transferring files to the data area on red:

globus-url-copy file:////home/USERNAME/filename.txt  gsiftp://red.unl.edu/opt/data/remote_file.txt

It's important to specify the filename on the remote machine.  Just leaving a trailing directory will result in an error.

Example:

A full example of running a grid job can be found here.

Manage Jobs with Condor-G

While the globus tools are excellent for putting together and submitting single jobs at a time, they can present a significant headache in managing large numbers of jobs at a time.

In order to solve this problem, we turn back to an old friend - the batch scheduler.  In this case, an extension of Condor, Condor-G, handles most of the grunt work of moving data and executables for us.  Condor-G is one of the most extensively used tools on the Open Science Grid. 

We have adopted a tutorial that originated at UW-Madison, and updated it for our use.  Part I covers simple uses of Condor-G.  Part II covers recovering from common errors.

This tutorial taken from the UW-Madison website and updated for our use.

Metascheduling: Managing Complex Jobs with DAGMan

We will be examining this topic further in future workshops


Powered by Plone, the Open Source Content Management System