Skip to main content

Retrieve data and analysis package for Gaia Data Release 3

Project description

Author: Behrouz Safari
Website: AstroDataScience.Net

gaiadr3

Retrieve data and analysis package for Gaia Data Release 3

Installation

Install the latest version of gaiadr3 from PyPI:

pip install gaiadr3

The only requirement is pandas.

Background

Gaia Data Release 3 presents two main types of data: tabular and ancillary. The tabular data can be retrieved using ADQL Tap queries and are described in the Data Model The ancillary data are those that can not be accessed via ADQL Tap queries. Epoch photomerty and spectroscopic data are of this type. You need to use DataLink to retrieve this type of data.

Tap queries

The standard way to retrieve tabular data is by using tap queries. Pass your script to the sql2df function. It will return two pandas dataframes: data and meta.

>>> from gaiadr3 import sql2df
>>> data, meta = sql2df('SELECT TOP 5 source_id, ra, dec FROM gaiadr3.gaia_source')
>>> print(meta)
                                                 description  unit
name                                                              
source_id  Unique source identifier (unique within a part...  None
ra                                           Right ascension   deg
dec                                              Declination   deg
>>> print(data)
             source_id          ra        dec
0  4116903625596296576  266.323047 -22.651077
1  4116903625596299136  266.321568 -22.651833
2  4116903625596302976  266.320308 -22.652672
3  4116903625596305408  266.321332 -22.651379
4  4116903625596305536  266.321399 -22.651430

For ease of use, I have created some shortcut keywords which begin with '@'. Currently, they are:

  • @MT : Main Table (gaiadr3.gaia_source)
  • @LT : Lite Table (gaiadr3.gaia_source_lite)
  • @COLS : A selection of the most important columns
>>> data, meta = sql2df('SELECT TOP 3 @COLS FROM @MT')
>>> data
             source_id          ra  ...  has_mcmc_gspphot  has_mcmc_msc
0  4116903625596296576  266.323047  ...             False         False
1  4116903625596299136  266.321568  ...             False         False
2  4116903625596302976  266.320308  ...             False         False

[3 rows x 24 columns]

Get single source

The simplest way to get data for a single source, is using GaiaObject class. You can create an instance of this class by passing a source_id:

>>> from gaiadr3 import GaiaObject
>>> source_id = 30343944744320
>>> obj = GaiaObject(source_id=source_id)

Now you can use the download method to retrieve both tabular and ancillary. This method accepts two boolean arguments: key_param and ancillary. Both are True by default. After using this method, some important key parameter will be downloaded as a python dictionary in the key_param attribute.

>>> obj.download()
>>> print(obj.key_param['data'])
{'solution_id': 1636148068921376768,
 'ra': 45.09499151004629,
 'dec': 0.4768361311353548,
 'parallax': 1.120139133994462,
 'distance_gspphot': 913.4706,
 'pm': 19.76517,
 'pmra': 19.35330019571839,
 'pmdec': 4.013937591116442,
 'radial_velocity': 40.224552,
 'teff_gspphot': 12291.837,
 'logg_gspphot': 4.0962,
 'phot_g_mean_mag': 9.899,
 'phot_bp_mean_mag': 9.873377,
 'phot_rp_mean_mag': 9.918395,
 'phot_g_mean_flux': 2067028.6966188122,
 'phot_bp_mean_flux': 1534850.5584509764,
 'phot_rp_mean_flux': 854673.6713712276,
 'has_epoch_photometry': True,
 'has_epoch_rv': False,
 'has_rvs': True,
 'has_xp_continuous': True,
 'has_xp_sampled': True,
 'has_mcmc_gspphot': True,
 'has_mcmc_msc': True}

If you don't know what are these parameters, look at obj.key_param['meta'].

The ancillary data will be downloaded as csv files in the 'data' folder in the working directory. These data, if exist, can be accessed as attributes:

>>> print(obj.xp_samp)
     wavelength          flux    flux_error
0         336.0  7.519030e-15  8.592673e-16
1         338.0  6.699424e-15  7.309379e-16
2         340.0  5.937778e-15  6.685453e-16
3         342.0  5.614390e-15  6.325656e-16
4         344.0  5.726218e-15  6.402061e-16
..          ...           ...           ...
338      1012.0  5.449369e-16  6.181877e-17
339      1014.0  5.333555e-16  6.771072e-17
340      1016.0  5.501445e-16  7.086527e-17
341      1018.0  5.647527e-16  6.731273e-17
342      1020.0  6.096352e-16  6.407823e-17

[343 rows x 3 columns]

Attributes corresponding to the ancillary data are: ep_phot, rvs, xp_samp, xp_cont. Use the has attribute to see which of these are available.

>>> print(obj.has)
{'EPOCH_PHOTOMETRY': True,
 'RVS': True,
 'XP_CONTINUOUS': True,
 'XP_SAMPLED': True,
 'MCMC_GSPPHOT': True,
 'MCMC_MSC': True}

Get multiple sources

If you want to get ancillary data for multiple sources you should use DataLink class.

>>> from gaiadr3 import DataLink
>>> sources = [30343944744320, 6196457933368101888]
>>> dl = DataLink(source_id=sources, retrieval_type='ALL')
>>> dl.download()

By default, the data will be downloaded in 'data' folder in the working directory. For each object a folder will be created. Using the get_objects method, you can access each source as a GaiaObject as explained above. Let's get the epoch photometry for green, blue and red bands for the first source:

>>> objects = dl.get_objects()
>>> g, b, r = objects[0].ep_phot
>>> print(g)
                                 mag          flux   flux_error
TCB                                                            
2014-09-06 08:55:04.077123  9.910611  2.045042e+06  3983.560844
2014-09-06 10:41:40.701409  9.909272  2.047565e+06  3652.792295
2014-12-27 00:49:44.365184  9.890911  2.082487e+06  3175.264209
2014-12-27 02:36:15.656166  9.874771  2.113675e+06  4644.732420
2015-01-16 06:49:25.778803  9.883712  2.096341e+06  1828.438678
2015-01-16 08:35:59.407773  9.888541  2.087037e+06  2039.618901
2015-02-15 00:23:35.148389  9.864696  2.133381e+06  2924.895816
2015-07-07 19:46:09.417232  9.886747  2.090489e+06  2817.581457
2015-08-01 03:31:05.147557  9.922456  2.022854e+06  3171.964369
2015-08-30 03:21:11.667936  9.880055  2.103414e+06   977.945250
2016-01-05 13:57:07.739350  9.880296  2.102947e+06  1754.446362
2016-01-05 15:43:39.101625  9.878165  2.107079e+06  1487.112158
2016-01-31 01:44:44.422883  9.919168  2.028988e+06  1516.905039
2016-02-25 03:16:43.072088  9.923205  2.021458e+06  2363.931131
2016-07-17 08:55:16.873392  9.905461  2.054766e+06  3400.421158
2016-07-17 10:41:51.854195  9.909443  2.047244e+06  1766.953020
2016-08-15 14:44:12.432337  9.904904  2.055821e+06  1543.565592
2016-09-08 14:45:34.479689  9.897831  2.069257e+06  2292.732039
2016-09-08 16:32:08.406143  9.886640  2.090695e+06  4654.839112
2017-01-14 21:05:28.352674  9.926262  2.015774e+06  1613.007848
2017-01-14 22:52:01.924915  9.909325  2.047467e+06  3051.287780
2017-02-13 10:01:27.449079  9.920222  2.027019e+06  1790.127633
2017-02-13 11:48:01.024177  9.926633  2.015085e+06  3115.379630
2017-03-05 21:57:47.804485  9.886628  2.090718e+06  1692.944591
2017-03-05 23:44:23.399045  9.877953  2.107490e+06  4360.998681

See more at astrodatascience.net

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaiadr3-0.0.2.tar.gz (13.1 kB view hashes)

Uploaded Source

Built Distribution

gaiadr3-0.0.2-py3-none-any.whl (10.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page