Skip to main content

CLI and API to allow users to search, download from and upload to HEPData

Project description

GitHub Actions Status Coveralls Status License GitHub Releases PyPI Version GitHub Issues

HEPData-CLI

About

Command line interface (CLI) and application program interface (API) to allow users to search, download from and upload to HEPData.

The code is compatible with Python 3.7 or greater. Inspiration from arxiv-cli (unmaintained since 2018).

Installation (for users)

Install from PyPI using pip:

$ pip install --user hepdata-cli
$ hepdata-cli --help

Installation (for developers)

Install from GitHub in a virtual environment:

$ git clone https://github.com/HEPData/hepdata-cli.git
$ cd hepdata-cli
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ pip install -e '.[tests]'
(venv) $ hepdata-cli --help
(venv) $ pytest --cov=hepdata_cli

Usage

You can use HEPData-CLI both as a command-line interface (CLI) to search, download and upload records from/to the HEPData database, or as a Python library to perform the same operations via its application program interface (API).

CLI

$ hepdata-cli [-v/--version, --help]
$ hepdata-cli [--verbose] find [QUERY] [-kw/--keyword KEYWORD] [-i/--ids IDTYPE]
$ hepdata-cli [--verbose] download [IDS] [-f/--file-format FORMAT] [-i/--ids IDTYPE] [-t/--table-name TABLE-NAME] [-d/--download-dir DOWNLOAD-DIR]
$ hepdata-cli [--verbose] fetch-names [IDS] [-i/--ids IDTYPE]
$ hepdata-cli [--verbose] upload [PATH-TO-FILE-ARCHIVE] [-e/--email YOUR-EMAIL] [-r/--recid RECORD-ID] [-i/--invitation-cookie COOKIE] [-s/--sandbox TRUE/FALSE] [-p/--password PASSWORD]

The command find searches the HEPData database for matches of QUERY. The advanced search syntax from the website can be used.

The command download downloads records from the database (see options below).

The command fetch-names returns the names of the data tables in the records whose ids are supplied.

The command upload uploads a file to the HEPData web site as either a sandbox or normal record.

The argument [-kw/--keyword KEYWORD] filters the search result dictionary for specific keywords. An exact match of the keyword is first attempted, otherwise partial matches are accepted.

The argument [-i/--ids IDTYPE] accepts IDTYPE equal to arxiv, hepdata orinspire.

The argument [-f/--file-format FORMAT] accepts FORMAT equal to csv, root, yaml, yoda, yoda1, yoda.h5, or json. In the first six cases a .tar.gz archive is downloaded and unpacked as a directory, whereas in the last case a .json file is downloaded.

The argument [-t/--table-name TABLE-NAME] accepts a string giving the table name as input. In this case only the specified table is downloaded as a .csv, .root, .yaml, .yoda, .yoda1, .yoda.h5, or .json file.

The argument [-d/--download-dir DOWNLOAD-DIR] specifies the directory to download the files. If not specified, the default download directory is ./hepdata-downloads.

The argument [-e/--email YOUR-EMAIL] is the uploader's email, needed to associate the submission to their HEPData account.

The argument [-i/--invitation-cookie COOKIE] must be supplied for non-sandbox submissions. This can be found in the Uploader invitation email received at the beginning of the submission process.

The argument [-s/--sandbox TRUE/FALSE] is a boolean to decide whether to upload to the sandbox or not.

The argument [-p/--password PASSWORD is the password for the uploader's HEPData account (prompt if not specified). Warning: do not store your password unencrypted in any code intended for shared use.

The hepdata-cli download/fetch-names and hepdata-cli find commands can be concatenated, if a IDTYPE is specified for find.

API

Equivalently to the above, these commands can be invoked by the API (in fact, the CLI is just a wrapper around the API).

from hepdata_cli.api import Client
client = Client(verbose=True)
client.find(query, keyword, ids)
client.download(id_list, file_format, ids, table_name, download_dir)
client.fetch_names(id_list, ids)
client.upload(path_to_file, email, recid, invitation_cookie, sandbox, password)

client.find() takes the keyword argument format to specify which format from str, list, set, or tuple shall be returned. Default is str.

Examples

Example 1 - a plain search:

$ hepdata-cli --verbose find 'reactions:"P P --> LQ LQ X"'

or equivalently

client.find('reactions:"P P --> LQ LQ X"')

matches a single entry and returns full metadata dictionary.

Example 2 - search with keyword:

$ hepdata-cli --verbose find 'reactions:"P P --> LQ LQ"' -kw year

or equivalently

client.find('reactions:"P P --> LQ LQ"', keyword='year')

matches four entries and returns their publication years, as a dictionary.

Example 3 - search for ids of records:

$ hepdata-cli --verbose find 'reactions:"P P --> LQ LQ"' -i hepdata

or equivalently

client.find('reactions:"P P --> LQ LQ"', ids='hepdata')

matches four entries and returns their hepdata ids, as a plain list.

Example 4 - concatenate search with download using inspire ids:

$ hepdata-cli --verbose download $(hepdata-cli find 'reactions:"P P --> LQ LQ"' -i inspire) -i inspire -f csv

or equivalently

id_list = client.find('reactions:"P P --> LQ LQ"', ids='inspire')
downloads = client.download(id_list, ids='inspire', file_format='csv')
print(downloads)  # {'1222326': ['./hepdata-downloads/HEPData-ins1222326-v1-csv/Table1.csv', ...], ...}

downloads four .tar.gz archives containing csv files and unpacks them in the default ./hepdata-downloads directory. Using the API, a dictionary mapping ids to the downloaded files is returned.

Example 5 - find table names in records:

$ hepdata-cli fetch-names $(hepdata-cli find 'reactions:"P P --> LQ LQ"' -i hepdata) -i hepdata

or equivalently

id_list = client.find('reactions:"P P --> LQ LQ"', ids='hepdata')
client.fetch_names(id_list, ids='hepdata')

returns all table names in the four matching records.

Example 6 - combine search with download from arxiv:

This example requires arxiv.py to be installed, which is easily done via:

$ pip install --user arxiv

Then,

import hepdata_cli
hepdata_client = hepdata_cli.Client()
id_list = hepdata_client.find('reactions:"P P --> LQ LQ X"', ids='arxiv', format=list)
print(id_list)  # ['1605.06035', '2101.11582', ...]

import arxiv
papers = arxiv.Client().results(arxiv.Search(id_list=id_list))
for paper in papers:
    paper.download_pdf()

downloads the PDF files from the arXiv.

Example 7 - upload record to the sandbox:

$ hepdata-cli upload /path/to/TestHEPSubmission.tar.gz -e my@email.com -s True

or equivalently

client.upload('/path/to/TestHEPSubmission.tar.gz', email='my@email.com', sandbox=True)

The uploaded submission can then be found from your sandbox. You will be prompted for the password associated with your HEPData account. If your account was created with CERN or ORCID authentication, you will first need to set a password.

Example 8 - replace a record in the sandbox:

$ hepdata-cli upload /path/to/TestHEPSubmission.tar.gz -e my@email.com -r 1234567890 -s True

or equivalently

client.upload('/path/to/TestHEPSubmission.tar.gz', email='my@email.com', recid='1234567890', sandbox=True)

Note that you must have uploaded the original sandbox record yourself and that you will be prompted for a password.

Example 9 - upload a non-sandbox record:

$ hepdata-cli upload /path/to/TestHEPSubmission.tar.gz -e my@email.com -r 123456 -i 8232e07f-d1d8-4883-bb1d-77fd9994ce4f -s False 

or equivalently

client.upload('/path/to/TestHEPSubmission.tar.gz', email='my@email.com', recid='123456', invitation_cookie='8232e07f-d1d8-4883-bb1d-77fd9994ce4f', sandbox=False)

The uploaded submission can then be found from your Dashboard. The invitation cookie is sent in your original invitation email. You must have already claimed permissions by clicking the link in that email or from your Dashboard. Again, you will be prompted for a password, which must be set if using CERN/ORCID login. The password can alternatively be passed as an argument to the CLI (-p PASSWORD) or API (password=PASSWORD). However, please be careful to keep your password secure, for example, by defining an encrypted environment variable for a CI/CD workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hepdata_cli-0.3.1.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hepdata_cli-0.3.1-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file hepdata_cli-0.3.1.tar.gz.

File metadata

  • Download URL: hepdata_cli-0.3.1.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hepdata_cli-0.3.1.tar.gz
Algorithm Hash digest
SHA256 ab7fd9bfa428609a42169505ad81d89ddce0a759c1c92d268d76de56578ae9a5
MD5 ed97fbfd45a8db8e527b69a0e61e7a4e
BLAKE2b-256 ab2c2c7d94289f603f22ed4d013530b7f15305cebda7ab024bcd2449184af99a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hepdata_cli-0.3.1.tar.gz:

Publisher: ci.yml on HEPData/hepdata-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hepdata_cli-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: hepdata_cli-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hepdata_cli-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca1ebbbec10f361b09f178decf0241d15694a297b43917ef2d264045b8a7b347
MD5 3fab64d961e47b5a10501a9a343638bb
BLAKE2b-256 8f11bf68f9139ec54f92a9d871eb2df08d3dec869c34f4907fd2a0531815f3c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for hepdata_cli-0.3.1-py3-none-any.whl:

Publisher: ci.yml on HEPData/hepdata-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page