Base library for the science archive and ingester of an observatory control system
Project description
OCS Archive Library
A base library for the Science Archive and Ingester library to support generalized input file types, generalized data stores, and shared configuration items. This library is configurable via environment variables, but more customization is possible by subclassing the DataFile
class for a specific file type, or subclassing the FileStore
class for a specific file storage scheme.
Prerequisites
Optional prerequisites may be skipped for reduced functionality.
- Python >= 3.6
Installation
It is highly recommended that you install and run your python code inside a dedicated python virtual environment.
Add the ocs_archive
package to your python environment:
(venv) $ pip install ocs_archive
Configuration
Environment Variables
Group | Variable | Description | Default |
---|---|---|---|
FileStore | FILESTORE_TYPE |
Type of filestorage to use. Options are dummy , local , or s3 . |
dummy |
FILESYSTEM_STORAGE_ROOT_DIR |
If using local file storage, this is the directory on the local filesystem to use as the root of the storage directories |
empty string | |
FILESYSTEM_STORAGE_BASE_URL |
If using local file storage, this is the base URL at which those files will be hosted from |
http://0.0.0.0/ |
|
Observation Portal | OBSERVATION_PORTAL_BASE_URL |
Base URL for the Observation Portal | empty string |
OBSERVATION_PORTAL_API_TOKEN |
API Token used to authenticate with the Observation Portal | empty string | |
AWS | BUCKET |
If using s3 file storage; AWS S3 Bucket Name |
testbucket |
AWS_ACCESS_KEY_ID |
If using s3 file storage; AWS Access Key with write access to the S3 bucket |
empty string | |
AWS_SECRET_ACCESS_KEY |
If using s3 file storage; AWS Secret Access Key |
empty string | |
AWS_DEFAULT_REGION |
If using s3 file storage; AWS S3 Default Region |
empty string | |
S3_ENDPOINT_URL |
If using s3 file storage; Endpoint url for connecting to s3. This can be modified to connect to a local instance of s3. |
"http://s3.us-west-2.amazonaws.com" |
|
S3_DAYS_TO_IA_STORAGE |
If using s3 file storage, this is the age in days after which data will be ingested directly to Infrequent Access (IA) storage vs normal storage. |
60 | |
DataFile | FILETYPE_MAPPING_OVERRIDES |
A string literal representation of a python dictionary containing a mapping of file extensions to dotpaths to python Classes which subclass the DataFile class. This appends and overrides the default list in the FileFactory class. | "{}" |
HEADER_BLACKLIST |
Comma delimited string list of header values that should be removed from the data before storage in the archive. This can be overriden when instantiating a DataFile as well as via environment variable. | HISTORY,COMMENT |
|
REQUIRED_HEADERS |
Comma delimited string list of header values that must be present in the DataFile. This can be overriden when instantiating a DataFile as well as via environment variable | ||
NULL_HEADER_VALUES |
Comma delimited string list of header values that should be turned into None or empty keys. This only applies to the FitsFile class. |
N/A,UNSPECIFIED,UNKNOWN |
|
CALIBRATION_TYPES |
Comma delimited string list of configuration types which represent calibration images. This is used to automatically set calibration images public date to be the observation date if it is not present | BIAS,DARK,SKYFLAT,EXPERIMENTAL |
|
PUBLIC_PROPOSAL_TAGS |
A comma delimited string list of Observation Portal proposal tags to denote data from this proposal as public. If public, the public date will be set to the observation date. The ocs_archive will fall back to the list of PUBLIC_PROPOSALS if any of a proposal's tags are not found in this list. |
public |
|
PRIVATE_PROPOSAL_TAGS |
A comma delimited string list of Observation Portal proposal tags to denote data from this proposal as private. If private, the public date will be set to 999 years in the future. The ocs_archive will fall back to the list of PRIVATE_PROPOSALS if any of a proposal's tags are not found in this list. |
private,internal |
|
PUBLIC_PROPOSALS |
Comma delimited string list of proposal IDs which represent public proposals. This is used to set the public date of observations under those proposals to the observation date if it is not present. The matching is based on if each character group appears anywhere within the proposal ID | EPO,calib,standard,pointing |
|
PRIVATE_PROPOSALS |
A comma delimited string list of proposal IDs which represent private proposals. This is used to set the public date of the observations under those proposals to be 999 years in the future. The matching is based on if each character group appears anywhere within the proposal ID | LCOEngineering |
|
DAYS_UNTIL_PUBLIC |
The number of days until user data becomes public by default. This is added onto the observation date to get the public date if one is not specifed with the data | 365 |
|
PRIVATE_FILE_TYPES |
A comma delimited string list of fragments of the file name which denote a private data file. If any of the fragments are found within the filename, the public date will be set 999 years in the future for this file | -t00,-x00 |
|
Header Mapping | OBSERVATION_DATE_KEY |
The key in which to find an iso formatted observation date within the header data | DATE-OBS |
OBSERVATION_DAY_KEY |
The key in which to find an iso formatted observation day within the header data | DAY-OBS |
|
OBSERVATION_END_TIME_KEY |
The key in which to find an iso formatted observation end date within the header data | UTSTOP |
|
REDUCTION_LEVEL_KEY |
The key in which to find a numeric reduction level within the header data. Raw is 0, while anything non-zero is some form of processing | RLEVEL |
|
EXPOSURE_TIME_KEY |
The key in which to find the exposure time in fractional seconds in the header data | EXPTIME |
|
INSTRUMENT_ID_KEY |
The key in which to find the instrument ID in the header data | INSTRUME |
|
SITE_ID_KEY |
The key in which to find the site ID in the header data | SITEID |
|
TELESCOPE_ID_KEY |
The key in which to find the telescope ID in the header data | TELID |
|
OBSERVATION_ID_KEY |
The key in which to find the observation ID in the header data | BLKUID |
|
CONFIGURATION_ID_KEY |
The key in which to find the configuration ID in the header data | MOLUID |
|
PRIMARY_FILTER_KEY |
The key in which to find the primary filter value in the header data | FILTER |
|
TARGET_NAME_KEY |
The key in which to find the target object's name in the header data | OBJECT |
|
REQUEST_ID_KEY |
The key in which to find the request ID in the header data | REQNUM |
|
REQUESTGROUP_ID_KEY |
The key in which to find the request group ID in the header data | TRACKNUM |
|
CONFIGURATION_TYPE_KEY |
The key in which to find the configuration type in the header data | OBSTYPE |
|
PROPOSAL_ID_KEY |
The key in which to find the proposal ID in the header data | PROPID |
|
CATALOG_TARGET_FRAME_KEY |
The key in which to find the base filename of the catalog file for the target of this observation in the header data | L1IDCAT |
|
PUBLIC_DATE_KEY |
The key in which to find the iso formatted date in which this data should become available to the public in the header data | L1PUBDAT |
|
RELATED_FRAME_KEYS |
A comma delimited list of keys in the header data to look for related frame base filenames for this observation | L1IDBIAS,L1IDDARK,L1IDFLAT,L1IDSHUT,L1IDMASK,L1IDFRNG,L1IDCAT,L1IDARC,L1ID1D,L1ID2D,L1IDSUM,TARFILE,ORIGNAME,ARCFILE,FLATFILE,GUIDETAR |
|
RADIUS_KEY |
The key in which to find FOV radius for a circular FOV, used to calculate WCS polygon if specified. Unit of arcseconds | RADIUS |
|
RA_KEY |
The key in which to find FOV center RA for a circular FOV, used to calculate WCS polygon if specified. Unit of hour angle | RA |
|
DEC_KEY |
The key in which to find FOV center DEC for a circular FOV, used to calculate WCS polygon if specified. Unit of decimal degrees | DEC |
Input File Format Configuration
The library is designed to be configured mostly through environment variables, but custom DataFile
subclasses can be included and specified via an environment variable in order to support new and more complicated data formats. All data files must contain the minimum set of metadata in order to ingested into the archive. This metadata is used to provide filtering and querying support within the archive. The pieces of file metadata that should be specified have their mappings defined in the Header Data section of the environment variables below. The FitsFile
class provided will work for normal or funpacked fits files, provided you set up the Header Data environment variables with the correct mapping of observation concepts to header keys in your data format.
File Storage Format Configuration
The library supports three types of file storage by default, that can be selected via environment variable. The dummy
type is just used for testing and development and doesn't actually store any file. The local
storage just saves the files into a locally mounted directory. It requires you to run a separate file server on that directory so it knows how to direct links to download the files. This can be accomplished as simply as running python -m http.server --directory=/my/root/dir
. It could alternatively be served using any other file server, like node's http-server. The third option is s3
, and expects to connect to Amazon's S3 or something with that same interface like minio. S3 file storage requires BUCKET
, AWS_*
, and S3_*
environment variables to be set. More storage types can be added via forking the library and subclassing the FileStore
class.
Development
Running the Tests
After cloning this project, from the project root and inside your virtual environment:
(venv) $ pip install -e .[tests]
(venv) $ pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ocs_archive-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30a0a33d247fd9d56c882d7a3a716e235b7b21b8e78dc2f7ef5e86168b858e1a |
|
MD5 | 255aab6002b18e7dbc730871cb1062f1 |
|
BLAKE2b-256 | 4d387f1f036acd7971382ef9a3c756ded69dec0441eaa8f996e6ca87e621c34a |