Personal tools
You are here: Home Documentation cmsSync
Document Actions

cmsSync

by admin last modified 2008-06-10 10:13

How to use the cmsSync software to make sure that your file system is consistent with dCache

The cmsSync software allows you to determine the difference between what is on disk at your site and what PhEDEx has as registered at your site.


Installation

  1. Install the Nebraska YUM repository as documented here.
  2. Install the cmsSync RPM with the following command:
yum install cmsSync

Running the client

The client does the following things:
  • Gets a list of all blocks registered for your SE.
  • Retrieve your site's TFC.
  • Builds a list of file names from these blocks (long process).
  • Spiders a given base directory (long process).
  • Compares the contents of dCache and the list of files which should be at your site.
This requires PNFS to be mounted at your site and for you to know the SE name you use with PhEDEx.
The command line usage goes like this:
cmsSync --se sename.unl.edu /pnfs/path/to/cms/store
This takes around 10 minutes to run at Nebraska.

Known Bug

There is a known bug in some versions of dCacheNebraska/CherryPy where a file fails on the line "import cherrypy._cpengine".  Remove this line from the referenced file, and it will not adversely the running of the cmsSync script (or any other dCacheNebraska application).

This is fixed in future versions of dCacheNebraska

Output

The client writes several files (as documented on the output of the utility).  They are:
  • not_lfns.txt: A list of all files in the base directory which are not in the CMS namespace at all
  • user_lfns.txt: A list of all user files at your site.
  • registered_lfns.txt: All LFNs which should be at your site
  • blocks.txt: All blocks which should be at your site.
  • missing_lfns.txt: A list of all files which are registered to be at your site, but are not in dCache
  • extra_lfns.txt: A list of all files which are at your site, but not registered in PhEDEx.

What to do with the output

  • The missing_lfns.txt can be attached to a Savannah ticket.  Ask the dataops folks to remove the replicas at your site, but not the subscriptions.  This means any datasets your site is still subscribed to are re-downloaded.
  • The extra_lfns.txt should be examined carefully.  Delete any files which are not in one of these categories:
    • Have not been recently transferred (the synchronization is not immediate, meaning any in-transit files from when you ran your script might be falsely marked as extra).
    • Are not unmerged files.
    • Are not load test files.
The extra files are basically a list of all files which are located at your site, but will not be grid-accessible.  It is often easier to delete (or re-subscribe to the dataset and not delete) than it is to hand-register the files in the global DBS.

Powered by Plone, the Open Source Content Management System