Skip to main content

Tools to organize and query astronomical catalogs

Project description

Coverage Status

extcats

Organize and query astronomical catalogs

This modules provides classes to import astronomical catalogs into a mongodb database, and to efficiently query this database for positional matches.

Description:

The two main classes of this module are:

  • CatalogPusher: will process the raw files with the catalog sources and creates a database. See insert_example notebook for more details and usage instruction.

  • CatalogQuery: will perform queries on the catalogs. See query_example for examples and benchmarking.

Supported queries includes:

  • all the sources with a certain distance.
  • closest source at a given position.
  • binary search: return yes/no if anything is around the positon.
  • user defined queries.

The first item on the above list (cone search around target) provides the basic block for the other two types of positional-based queries. The code supports tree types of basic cone-search queries, depending on the indexing strategy of the database.

  • using HEALPix: if the catalog sources have been assigned an HEALPix index (using healpy <https://healpy.readthedocs.io/en/latest/#>_).
  • using GeoJSON (or 'legacy coordinates'): if the catalog documents have the position arranged in one of these two formats (example <https://docs.mongodb.com/manual/geospatial-queries/>_), the query is based on the $geoWithin and $centerSphere mongo operators.
  • raw: this method uses the $where keyword to evaluate on each document a javascript function computing the angular distance between each source and the target. This method does not require any additional field to be added to the catalog but has, in general, poorer performances with respect to the methods above.

All the core functions are defined in the catquery_utils module. In all cases the results of the queries will be return an astropy.table.Table objects.

Notes on indexing and query performances:

The recommended method to index and query catalogs is based on the GeoJSON coorinate type. See the example_insert notebook for how this can be implemented.

Performant queries requires the database indexes to reside in the RAM. The indexes are efficiently compressed by mongodb default engine (WiredTiger), however there is little redundant (and hence compressible) information in accurately measured coordinate pairs. As a consequence, GeoJSON type indexes tends to require fair amount of free memory (of the order 40 MB for 2M entries). For large catalogs (and / or small RAM) indexing on coordinates might not be feasible. In this case, the HEALPix based indexing should be used. As (possibly) many sources shares the same HEALPix index, compression is more efficient into moderating RAM usage.

Installation:

The easiest way to install the Python library is with pip:

pip install extcats

If you want do modify extcats itself, you'll need an editable installation. After cloning this Git repository:

poetry install

Useful links:

mongodb installation

healpy

astropy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extcats-2.4.4.tar.gz (24.6 kB view details)

Uploaded Source

Built Distribution

extcats-2.4.4-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file extcats-2.4.4.tar.gz.

File metadata

  • Download URL: extcats-2.4.4.tar.gz
  • Upload date:
  • Size: 24.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.1 CPython/3.12.2 Linux/6.5.0-1015-azure

File hashes

Hashes for extcats-2.4.4.tar.gz
Algorithm Hash digest
SHA256 9f71ab42f8c5b6dd21eafe2d85e57e28670c6e05c97b9a96a9ad2f9e703985d7
MD5 e8dce21f8c8bc3c80ab307f92b819213
BLAKE2b-256 7889987386f244a6da10bd6b3cdabdb6d56dd0fd5b59b17895ae531ef014c5a9

See more details on using hashes here.

File details

Details for the file extcats-2.4.4-py3-none-any.whl.

File metadata

  • Download URL: extcats-2.4.4-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.1 CPython/3.12.2 Linux/6.5.0-1015-azure

File hashes

Hashes for extcats-2.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f478de7a673de7f24ea1281b1f1f3588f2935154c9e67adc296f2f8074afe2fb
MD5 a0a4698b6f0cdcbc487d401043e63269
BLAKE2b-256 6aeab72da09008388cf8cabff802e9f92e4fd7ffee29866466f6d01de8d34a61

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page