dCacheNebraska Scripts
Nebraska uses several scripts to help analyze the state of its cluster. We provide them in a package named dCacheNebraska, which is available for wider usage.
Installation - Current Method
To install dCacheNebraska, follow the directions on this page. This will guide you through adding the Nebraska YUM repository to your computer, then installing the commands referenced below via yum.Installation - OLD
dCacheNebraska is available solely through the Nebraska subversion repository.That said, installation simply depends on downloading the files from subversion, having a modern version of python, and a working ssh. To checkout, run this command:
svn co svn://t2.unl.edu/brian/dCacheNebraskaIf you do not have the svn binary and use SL4, you may install it with yum install svn. If you use SL3, you may have to install subversion from source.
You will find the desired scripts in the scripts/ subdirectory.
Configuration
The dCacheNebraska scripts utilize the PhEDEx DBParam format to connect to your site's dCache install. Create a file called DBParam and add an entry which looks like this:Section dCache/NEBRASKAYou may omit the Password line if ssh keys have been configured for your site. If you have multiple dCache instances, you may add multiple sections to the same DBParam file.
Interface dCache
Username admin
Password ****
Port 22223
AdminHost dcache-head.unl.edu
SRMHost srm.unl.edu
Cipher blowfish
All dCacheNebraska scripts depend on being able to find this configuration file. They look for the file DBParam in $CWD, $HOME, then /etc. When found, they automatically look to the first section unless told otherwise.
Alternately, they all can take a command line argument of this format:
-config <file>:<section>For the DBParam example above, assuming it is located in the $HOME directory, the command line argument would be:
-config $HOME/DBParam:dCache/NEBRASKANote how <section> was replaced with the value of Section from the DBParam file.
Pool Cleaner
The pool_cleaner.py script analyzes all the pools in the dCache system, looking for physical files on pools which are not in PNFS. It does not look for logical files in PNFS which are not physically present in dCache.The pool cleaner script makes all its queries through the dCache admin interface. While this is beneficial in its simplicity, it can be very slow to verify the logical existence of files. If you have a large dCache install, you might be interested in altering it to take advantage of a mounted /pnfs directory. This will result in a large speedup.
Running the script is simple:
pool_cleaner.py -config DBParam:dCache/NEBRASKAThere are two options:
- -quiet: This suppresses most of the output
- -delete: This triggers the actual deletion of the files on the pool nodes. Potentially dangerous for obvious reasons. We are not responsible if this script eats your install!
- -safefile: A file which will never be deleted. If the script thinks this file is not present, it decides that the system is in a bad state and immediately aborts.
- -maxdelete: A float between 0 and 1; the max percentage of files which may be deleted on any given run. This is to protect it from mistakenly deleting all files.
- -checktrash: If the script is running on the PNFS server, it can check the trash files of the PNFS daemon to see if the file is already scheduled for deletion.
- -badlist: Instead of deleting files, create a file with a list of all the "bad" files and their locations.
SRM Transfer Rate Query
The query_rate.py looks up the rate of data transfer for a single specific srmCopy transfer. This script does not work for sites with pools behind NATs or with srmPut or srmGet transfers.To run the script, use this command:
query_rate -config DBParam:dCache/NEBRASKA -url <dest URL>Alternately, in lieu of passing SRM URLs one-by-one, you may also pass an entire copyjob (although this option has not been heavily tested).
The query_rate.py script can be used programmatically to insure that a single transfer is making sufficient progress. Here are the relevant options:
- -quiet: Suppress unnecessary output.
- -timeout: If the rate does not meet the desired criteria, return nonzero exit code.
- -grace <minutes>: The grace period in minutes during which the file won't timeout.
- -max <minutes>: Maximum amount of time a transfer should take.
- -rate <KB/s>: Minimum rate to be tolerated after grace period.
Query All GridFTP Transfer Rates
The query_all_rates looks up the rates of all GridFTP-based transfers it can find. This should work for sites with pools behind NATs.To run the script, use this command:
query_all_ratesfrom the scripts/ directory.
Restore Cleaner
The restore_cleaner script analyzes and potentially retries files which are stuck at the PoolManager. There are many different reasons that this may happen; for tapeless sites, it is most often due to the fact that there is no on-disk replica for a logical file.To run the script, use this command:
restore_cleaner -config DBParam:dCache/NEBRASKA -analyzeThe analyze flag prints out the results of various analyses of error messages. Organizes the problems at your site better than the raw list of problem files does.
In addition, you may pass the following options:
- -retry_all: This retries all stuck transfers
- (In the future, we will add the capability to retry transfers which fail specific analyses).
Space Usage Analyzer
The space_usage script is an external way to measure usage of pools in dCache. It prints out the total space used, including replicas, and the distribution of disk space utilized throughout the namespace.Here's an example usage:
space_usage -config DBParam:dCache/NEBRASKA -base /pnfs/unl.edu/data4 -count_replicas
Required Arguments:
- -config <filename>:<section>: Config file listing parameters needed to connect to dCache.
- -base <directory>: The base directory to start in.
Optional Arguments:
- -threshold <%>: Percentage of disk space used a directory must contain in order to be displayed. Defaults to 5%.
- -count_replicas: Account for the space used by multiple replicas of the same file (uses the admin interface).
Pool Retire Script
The retire_pool script is an external way to safely remove a pool from dCache. It takes an inventory of all the files in all pools, and determines all files which are unique on the pool to be retired. It then starts P2P transfers of just the unique files. If any in-process P2P transfers do not appear to be valid (too-slow or does not appear to have a valid source/destination), they will be cancelled. Valid in-process P2P transfers will be accounted for and multiple P2P transfers for the same file will not be started.dCache admin interface access and a PNFS mount are required.
One of the design goals of this script is that the last line of output will inform the admin whether or not it is safe to turn the pool off. If the pool is not safe to turn off, the approximate number of in-progress plus started transfers will be printed out.
Here's an example usage:
retire_pool <poolname> -dryrunWith the -dryrun flag, no files will be copied. By default, this script will use the pfm_config file (if it exists) as a configuration file; the one which comes with the dCacheNebraska distribution contains sane defaults.
PNFS ID Lookup
The dcache_pnfs_idfinder script uses the admin interface to look up the PNFS ID corresponding to a particular file. This does not require PNFS to be mounted locally, but it does require dCache admin interface access.Here is an example usage:
dcache_pnfs_idfinder /pnfs/unl.edu/data4/test/testfile.unl.3The script must be able to find the DBParam config file or have it passed via command line.
Path Lookup
The dcache_pnfs_pathfinder script uses the admin interface to look up the filename associated with a particular PNFS ID. This does not require PNFS to be mounted locally.Here is an example usage:
dcache_pnfs_pathfinder 0004000000000000000AA5C8The script must be able to find the DBParam config file or have it passed via command line.
Pool Lookup
The dcache_path2server script uses the admin interface to find pools containing replicas of a specified file. This does not require PNFS to be mounted locally. The PnfsManager will be queried for cache info, then the presence of the file will be confirmed with the individual pools.Here are some example use cases:
dcache_path2server 0004000000000000000AA5C8Note either the PNFS ID or path is an acceptable input.
dcache_path2server /pnfs/unl.edu/data4/test/testfile.unl.3
dcache_path2server 0004000000000000000AA5C8 -n
The -n flag instructs the script to not confirm with the pools. Often the -n flag can be used to see where a file has disappeared from.
Authentication Checker
The auth_check utility is designed to let site admins check authentication at their site. It requires dCache 1.8 and must be run on the SRM node. auth_check will search through the credentials which have recently used the SRM server and try to use these credentials to determine if any GridFTP nodes fail authentication. This allows the site admin to test the GridFTP server with the user's proxy certificate.If auth_check is given an argument, it will only use certificates matching that argument; if it is not passed an argument, it will check all certificates.
Here is an example usage:
$ auth_check XinIf given the -v flag, additional information will be printed out. The auth_check script requires access to the admin interface, meaning the DBParam file must be configured.
Checking authentication for /DC=org/DC=doegrids/OU=People/CN=Xin Zhao 102397/CN=1151626830/CN=1381624632
Door failed: srm.unl.edu:2811