desitransfer API

desitransfer

DESI data transfer infrastructure.

desitransfer.common

Code needed by all scripts.

desitransfer.common.empty_rsync(out)[source]

Scan rsync output for files to be transferred.

Parameters:

out (str) – Output from rsync.

Returns:

True if there are no files to transfer.

Return type:

bool

desitransfer.common.ensure_scratch(directories)[source]

Try an alternate temporary directory if the primary temporary directory is unavilable.

Parameters:

directories (list) – A list of candidate directories.

Returns:

The first available temporary directory found.

Return type:

str

desitransfer.common.exclude_years(start_year)[source]

Generate rsync --exclude statements of the form --exclude 2020*.

Parameters:

start_year (int) – First year to exclude.

Returns:

A list suitable for appending to a command.

Return type:

list

desitransfer.common.idle_time(start=8, end=12, tz=None)[source]

Determine whether we are in an idle time during the day.

Parameters:
  • start (int, optional) – Start time in hours.

  • end (int, optional) – End time in hours.

  • tz (str, optional) – Time zone to use.

Returns:

Number of seconds to wait until the end of the idle period. If outside the idle period, this number will be negative.

Return type:

int

desitransfer.common.new_exposures(out)[source]

Scan rsync output for exposures to be transferred.

Parameters:

out (str) – Output from rsync.

Returns:

The unique exposure numbers detected in out.

Return type:

set

desitransfer.common.rsync(s, d, test=False, config='dts', reverse=False)[source]

Set up rsync command.

Parameters:
  • s (str) – Source directory.

  • d (str) – Destination directory.

  • test (bool, optional) – If True, add --dry-run to the command.

  • config (str, optional) – Pass this configuration to the ssh command.

  • reverse (bool) – If True, attach config to d instead of s.

Returns:

A list suitable for passing to subprocess.Popen.

Return type:

list

desitransfer.common.stamp(zone='US/Pacific')[source]

Simple timestamp.

Parameters:

zone (str, optional) – Operational timezone.

Returns:

A nicely-formatted timestamp.

Return type:

str

desitransfer.common.today()[source]

Today’s date in DESI “NIGHT” format, YYYYMMDD.

This formulation, with the offset 7/24+0.5, is inherited from previous nightwatch transfer scripts.

desitransfer.common.yesterday()[source]

Yesterday’s date in DESI “NIGHT” format, YYYYMMDD.

desitransfer.daemon

Entry point for desi_transfer_daemon.

class desitransfer.daemon.TransferDaemon(options)[source]

Manage data transfer configuration, options, and operations.

Parameters:

options (argparse.Namespace) – The parsed command-line options.

_configure_log(debug)[source]

Re-configure the default logger returned by desiutil.log.

Parameters:

debug (bool) – If True set the log level to DEBUG.

backup(d, night, status)[source]

Final sync and backup for a specific night.

Parameters:
Returns:

True indicates the backup ran to completion and the the transfer status should be updated to reflect that.

Return type:

bool

Notes

  • 12:00 MST = 19:00 UTC, plus one hour just to be safe, so after 20:00 UTC.

catchup(d, night, status, backup=False)[source]

Do a “catch-up” transfer to catch delayed files in the morning, rather than at noon.

Parameters:

Notes

  • 07:00 MST = 14:00 UTC.

  • This script can do nothing about exposures that were never linked into the DTS area at KPNO in the first place.

checksum(checksum_file, status)[source]

Verify checksum associated with checksum_file and report status.

The status is reported via log messages and messages passed to the status object, not via a return value.

Parameters:
checksum_lock()[source]

See if checksums are being computed at KPNO.

Returns:

True if checksums are being computed.

Return type:

bool

directory(d)[source]

Data transfer operations for a single destination directory.

Parameters:

d (collections.namedtuple()) – Configuration for the destination directory.

exposure(d, link, status)[source]

Data transfer operations for a single exposure.

This method will unconditionally install an exposure directory in the destination, regardless of any transfer or checksum errors.

Parameters:
hpss_status()[source]

Check HPSS availability.

Returns:

True if HPSS is available.

Return type:

bool

transfer()[source]

Loop over and transfer all configured directories.

desitransfer.daemon._options()[source]

Parse command-line options for desi_transfer_daemon.

Returns:

The parsed command-line options.

Return type:

argparse.Namespace

desitransfer.daemon._popen(command)[source]

Simple wrapper for subprocess.Popen to avoid repeated code.

Parameters:

command (list) – Command to pass to subprocess.Popen.

Returns:

The returncode, standard output and standard error.

Return type:

tuple()

desitransfer.daemon.lock_directory(directory, test=False)[source]

Set a directory and its contents read-only.

Parameters:
  • directory (str) – Directory to lock.

  • test (bool, optional) – If True, only print the commands.

desitransfer.daemon.main()[source]

Entry point for desi_transfer_daemon.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

desitransfer.daemon.rsync_night(source, destination, night, test=False)[source]

Run an rsync command on an entire night, for example, to pick up delayed files.

Parameters:
  • source (str) – Source directory.

  • destination (str) – Destination directory.

  • night (str) – Night directory.

  • test (bool, optional) – If True, only print the commands.

desitransfer.daemon.unlock_directory(directory, test=False)[source]

Set a directory and its contents user-writeable.

Parameters:
  • directory (str) – Directory to unlock.

  • test (bool, optional) – If True, only print the commands.

desitransfer.daemon.verify_checksum(checksum_file)[source]

Verify checksums supplied with the raw data.

Parameters:

checksum_file (str) – The checksum file.

Returns:

An string that describes the errors encountered while verifying the checksum. In addition to mismatches, there can be missing files, extraneous files, etc. An empty string indicates no errors.

Return type:

str

desitransfer.daily

Entry point for desi_daily_transfer.

class desitransfer.daily.DailyDirectory(source, destination, extra=[], dirlinks=False)[source]

Simple object to hold daily transfer configuration.

Parameters:
  • source (str) – Source directory.

  • destination (str) – Desitination directory.

  • extra (list, optional) – Extra rsync arguments to splice into command.

  • dirlinks (bool, optional) – If True, convert source links into linked directory.

lock()[source]

Make a directory read-only.

permission()[source]

Set permissions for DESI collaboration access.

In theory this should not change any permissions set by lock().

Returns:

The status returned by fix_permissions.sh.

Return type:

int

transfer(permission=True)[source]

Data transfer operations for a single destination directory.

Parameters:

permission (bool, optional) – If True, set permissions for DESI collaboration access.

Returns:

The status returned by rsync.

Return type:

int

desitransfer.daily._config(timeframe)[source]

Wrap configuration so that module can be imported without environment variables set.

Parameters:

timeframe (str) – Return the set of directories associated with timeframe.

Returns:

A list of directories to transfer.

Return type:

list

desitransfer.daily._options()[source]

Parse command-line options for desi_daily_transfer.

Returns:

The parsed command-line options.

Return type:

argparse.Namespace

desitransfer.daily.main()[source]

Entry point for desi_daily_transfer.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

desitransfer.nightwatch

Sync KPNO nightwatch. Due to differences in timing and directory structure, this is kept separate from the raw data transfer daemon.

A cronjob running as desi@dtn01.nersc.gov ensures that this daemon is running.

Catchup on a specific night:

NIGHT=20200124 && rsync -rlvt --exclude-from ${DESITRANSFER}/py/desitransfer/data/desi_nightwatch_transfer_exclude.txt         dts:/exposures/nightwatch/${NIGHT}/ /global/cfs/cdirs/desi/spectro/nightwatch/kpno/${NIGHT}/

By-hand startup sequence (bash shell):

source /global/common/software/desi/desi_environment.sh datatran
module load desitransfer
nohup nice -19 ${DESITRANSFER}/bin/desi_nightwatch_transfer &> /dev/null &
tail -f ${DESI_ROOT}/spectro/nightwatch/desi_nightwatch_transfer.log
desitransfer.nightwatch._configure_log(debug)[source]

Re-configure the default logger returned by desiutil.log.

Parameters:

debug (bool) – If True set the log level to DEBUG.

desitransfer.nightwatch._options()[source]

Parse command-line options for desi_nightwatch_transfer.

Returns:

The parsed command-line options.

Return type:

argparse.Namespace

desitransfer.nightwatch.main()[source]

Entry point for desi_nightwatch_transfer.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

desitransfer.spacewatch

Download Spacewatch data from a server at KPNO.

Notes

  • Spacewatch data rolls over at 00:00 UTC = 17:00 MST.

  • The data relevant to the previous night, say 20231030, would be downloaded on the morning of 20231031.

  • Therefore to obtain all data of interest, just download the files that have already appeared in 2023/10/31/ (Spacewatch directory structure) the morning after DESI night 20231030.

class desitransfer.spacewatch.SpacewatchHTMLParser(*args, **kwargs)[source]

Extract JPG files from an HTML index.

handle_starttag(tag, attrs)[source]

Process HTML tags, in this case targeting anchor tags.

desitransfer.spacewatch._options()[source]

Parse command-line options for desi_nightwatch_transfer.

Returns:

The parsed command-line options.

Return type:

argparse.Namespace

desitransfer.spacewatch.download_jpg(files, destination, overwrite=False, test=False)[source]

Download files to destination.

Parameters:
  • files (list) – A list of URLs to download.

  • destination (str) – A local directory to hold the files.

  • overwrite (str, optional) – If True, overwrite any existing files.

  • test (bool, optional) – If True, do not download any files.

Returns:

The number of files downloaded.

Return type:

int

desitransfer.spacewatch.jpg_list(index)[source]

Obtain a list of JPEG files from an HTML index.

Parameters:

index (str) – The URL of an HTML index.

Returns:

A list of JPEG files found in index. The index URL is attached to the file names.

Return type:

list

desitransfer.spacewatch.main()[source]

Entry point for desi_spacewatch_transfer.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

desitransfer.status

Entry point for desi_transfer_status.

class desitransfer.status.TransferStatus(directory, install=False, year=None)[source]

Simple object for interacting with desitransfer status reports.

Parameters:
  • directory (str) – Retrieve and store JSON-encoded transfer status data in directory.

  • install (bool, optional) – If True, install HTML and JS files.

  • year (str or int) – Update records belonging to year. If not set, the current year is assumed.

_handle_malformed()[source]

Handle malformed JSON files.

This function will save the malformed file to a .bad file for later analysis, and write an empty array to a new status file.

find(night, exposure=None, stage=None)[source]

Find status entries that match night, etc.

Parameters:
  • night (str) – Night of observation.

  • exposure (str, optional) – Exposure number.

  • stage (str, optional) – Stage of data transfer (‘rsync’, ‘checksum’, ‘backup’, …).

Returns:

:class:`list` or class – If only night is set, return a dict containing information on all exposures for that night. If exposure is not set, return a dict keyed by exposure containing all data matching stage for that night. If stage is not set, return a list containing indexes pointing to all data about that exposure. If both exposure and stage are set, return a list of indexes pointing to the data for exposure filtered on stage.

Return type:

dict

Raises:

KeyError – If night is not yet defined.

update(night, exposure, stage, failure=False)[source]

Update the transfer status.

Parameters:
  • night (str) – Night of observation.

  • exposure (str) – Exposure number.

  • stage (str) – Stage of data transfer (‘rsync’, ‘checksum’, ‘backup’, …).

  • failure (bool, optional) – Indicate failure.

Returns:

The number of updates performed.

Return type:

int

desitransfer.status._options()[source]

Parse command-line options for desi_transfer_status.

Returns:

The parsed command-line options.

Return type:

argparse.Namespace

desitransfer.status.main()[source]

Entry point for desi_transfer_status.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

desitransfer.tucson

Entry point for desi_tucson_transfer.

desitransfer.tucson._configure_log(debug)[source]

Re-configure the default logger returned by desiutil.log.

Parameters:

debug (bool) – If True set the log level to DEBUG.

desitransfer.tucson._get_proc(directories, exclude, src, dst, options, nice=5)[source]

Prepare the next download directory for processing.

Parameters:
  • directories (list) – A list of directories to process.

  • exclude (set) – Do not process directories in this set.

  • src (str) – Root source directory.

  • dst (str) – Root destination directory.

  • options (argparse.Namespace) – The parsed command-line options.

  • nice (int, optional.) – Lower-priority transfers will be run with this value passed to os.nice(), default 5.

Returns:

A tuple containing information about the process.

Return type:

tuple

desitransfer.tucson._options()[source]

Parse command-line options for desi_tucson_transfer.

Returns:

The parsed command-line options.

Return type:

argparse.Namespace

desitransfer.tucson._rsync(src, dst, d, checksum=False)[source]

Construct an rsync command to transfer d.

Parameters:
  • src (str) – Root source directory.

  • dst (str) – Root destination directory.

  • d (str) – Directory to transfer relative to src, dst.

  • checksum (bool, optional) – If True, pass the --checksum option to rsync.

desitransfer.tucson.main()[source]

Entry point for desi_tucson_transfer.

Returns:

An integer suitable for passing to sys.exit().

Return type:

int

desitransfer.tucson.running(pid_file)[source]

Test for a duplicate process already running.

Parameters:

pid_file (str) – Name of file containing a process id.

Returns:

True if a duplicate process is detected.

Return type:

bool