WEAVE-IO query interface for accessing WEAVE data
Project description
Tutorial
Basics
WEAVE objects
-
An OB holds all the information that purtains to making an observation: the targets, the conditions, the instrument configuration. You can locate specific OBs with their obid
data.obs[obid]
-
An Exposure is one integration of both arms of the spectrograph. You can locate Exposures like so:
data.exposures[mjd]
-
A Run is a weave term for an exposure taken in one arm of the spectrograph (so there are always 2 Runs per Exposure). You can locate runs using their runid
data.runs[runid]
. -
A
spectrum
in weaveio refers to a single spectrum of a single target (not the block ofspectr*a*
) -
An L1 product refers to the spectra recorded in an L1File. All L1 data is separated by camera colour (red and blue).
- A single spectrum is the processed spectrum from the raw data
- A stack spectrum is the spectrum resulting from stacking two or more single spectra in a single ob
- A superstack spectrum results from stacking between OBs but with the same instrument configuration
- A supertarget spectrum results from stacking every single spectrum of a single WeaveTarget cname.
-
There are three types of
Target
WeaveTarget
is the unified target based on ra, dec. They have a unique CNAMESurveyTarget
is a target specified by a survey in a surveycatalogue (they reference a single WeaveTarget). These are unique to a catalogue.FibreTarget
is a result of assigning a spectrographFibre
to aSurveyTarget
. These are unique to an OBSpec.
What is an attribute? What is an object? What is a product?
- weave.io is an object orientated query language using a neo4j database hosted at lofar.herts.ac.uk.
- weave.io stores 4 (+1) types of object:
- File - A reference to a physical fits file on the herts system (it also stores references to individual fits.HDUs and their headers as separate objects, but the user doesn't need to know of their existence to use weave.io)
- Object - A object that references a concept in the WEAVE universe that has attributes (an OB object has an obid attribute, an exposure object has an attribute expmjd
- Attribute - A piece of data that belongs to some object
- Product - A special type of attribute which references binary data not stored in the database itself (e.g. spectrum flux). You cannot perform arithmetic/indexing on product attributes.
If confused, ignore...
object/attribute
weave.io uses Python syntax to traverse a hierarchy of objects and their attributes. It is important to note that objects are linked to other objects (e.g. a run belongs to an OB and also to an exposure, which itself belongs to an OB).
You can stitch together these objects to form a hierarchy of objects:
run <-- exposure <-- ob <--obspec(xml)
Every OB is a parent of multiple Exposures which in turn are parents exactly 2 runs each (one red, one blue).
Because of this chain of parentage/relation, every object has access to all attributes where there is a chain, as if they were its own attributes.
Traversal syntax
-
You can request any directly owned attribute of an object
- An OB has an obid:
ob.obid
- An obstemp has a maxseeing
obstemp.maxseeing
- An OB has an obid:
-
You can request any attribute of objects that are further away in the hierarchy as if it were its own. This is useful because a priori you wouldn't be expected to know where any particular piece of data is stored, just that it exists.
run.maxseeing
is identical torun.exposure.ob.obspec.obstemp.maxseeing
-
Traversal works in any direction
- You can go down a hierarchy:
exposure.run.rawspectrum
- You can go up as well:
rawspectrum.run.exposure
- You can go down a hierarchy:
-
Traversal can be implicit like with the indirectly accessed attributes above
- You can skip out stages:
run.obspec
is identical torun.ob.obspec
- You can skip out stages:
-
Traversal requires you to be aware of plurality/multiplicity (neo4j calls this cardinality):
- A run only ever has a single ob, so you query it using a singular name:
run.ob
- But an ob will typically have multiple runs, so you must use plural names:
ob.runs
- weave.io is aware of plurality of the whole hierarchy, so it will shout at you if you are obviously wrong:
ob.run
will fail before you even execute the query.
- A run only ever has a single ob, so you query it using a singular name:
-
Traversal name plurality is relative
- A run has a single ob, which in turn has multiple runs:
run.ob.runs
will return all runs of the ob (including the one that was explicitly referenced at the start. ob.runs.weavetarget.obs
can be read as "For each of the runs, get its weavetarget, and then for each weavetarget get all OBs which assign a fibre to that target.
- A run has a single ob, which in turn has multiple runs:
-
Traversal using dot syntax always increases/maintains the total number of rows returned at the end
- A consequence of traversal is the building up of rows. This is useful to get properly aligned labels/indexes for things.
ob.runs.ob.runs.ob.runs
will return not simply return the runs of this ob, but rather a huge duplicated list because each time you use the '.' syntax, we are saying "for each"
Identifiers
- You can request a specific object if you know its id
one_ob = data.obs[obid]
a_list_of_obs = data.obs[[obid1, obid2, obid3]]
- Plurality still applies here:
data.weavetargets.obs[obid]
will return one ob for each weavetargetdata.obs[obid].weavetargets
returns all weavetargets for this particular obdata.weavetargets.obs[obid]
returns the ob with obid for each weavetarget (sometimes will be None)
Exploration
You can use the explain(obj)
function to see what information is available for a given object:
>>> explain(data.obs)
===================ob===================
An OB is an "observing block" which is essentially a realisation of
an OBSpec. Many OBs can share the same xml OBSpec which describes how
to do the observations.
A ob has a unique id called 'obid'
one ob is linked to:
- many exposures
- many l1stackfiles
- many l1stackspectra
- many l2stackfiles
- many l2stacks
- 1 obspec
a ob directly owns these attributes:
- obid
- obstartmjd
========================================
Likewise, use the explain(attr)
function to see what information is available for an attribute:
>>> explain(data.runs.runid)
runid is the unique id name of a run
>>> explain('snr', data=data)
snrs are owned by multiple different objects (['l1singlespectrum', 'l1stackspectrum', 'l1supertargetspectrum', 'l1superstackspectrum']).
They could be entirely different things.
You will need to specify the parent object for snr when querying.
snr is an attribute belonging to l1singlespectrum
snr is an attribute belonging to l1stackspectrum
snr is an attribute belonging to l1supertargetspectrum
snr is an attribute belonging to l1superstackspectrum
========================================
Also, you can use the explain(obj1, obj2)
function to see how two objects are related:
>>> explain(data.obs, data.runs')
===================ob===================
An OB is an "observing block" which is essentially a realisation of
an OBSpec. Many OBs can share the same xml OBSpec which describes how
to do the observations.
A ob has a unique id called 'obid'
one ob is linked to:
- many exposures
- many l1stackfiles
- many l1stackspectra
- many l2stackfiles
- many l2stacks
- 1 obspec
a ob directly owns these attributes:
- obid
- obstartmjd
==================run===================
A run is one observation of a set of targets for a given
configuration in a specific arm (red or blue). A run belongs to an
exposure, which always consists of one or two runs (per arm).
A run has a unique id called 'runid'
one run is linked to:
- 1 armconfig
- 1 exposure
- 1 observation
a run directly owns these attributes:
- runid
========================================
- A ob has many runs
- A run has only one ob
========================================
When you make a mistake in plurality or a typo, weave-io will offer suggestions:
>>> data.singlespectra
AttributeError: `single_spectra` not understood, did you mean one of:
1. l1singlespectra
2. l1singlespectrum
3. galaxy_spectra.
You can learn more about an object or attribute by using `explain(obj/attribute, ...)`
Running a query
A query finds the locations of all the L1/L2/Raw products that you want. It is analagous to an SQL query except that it is written in Python.
-
A query is constructed using python like so:
from weaveio import * data = Data(username, password) runs = data.obs[obid].runs reds = runs[runs.colour == 'red'] spectra = reds.l1singlespectra
runs
,reds
,spectra
are all queries -
Each part of this query can be run independently, using the parentheses:
runs.runids
is still a queryruns.runids()
returns a actual list of numbersreds()
will return a list of Run objects (containing all attributes of a run)spectra()
will return a spectra object (read from the fits file on herts.ac.uk) which contains the flux, wvls etc
Examples of use:
1. I want to return the number of sky spectra in a given run (runid=1002850)
from weaveio import *
data = Data()
runid = 1003453
nsky = sum(data.runs[runid].targuses == 'S')
print("number of sky targets = {}".format(nsky()))
output: [100]
1b. I want to see how many sky targets each run has
from weaveio import *
data = Data()
nsky = sum(data.runs.targuses == 'S', wrt=data.runs) # sum the number of skytargets with respect to their runs
print(nsky())
output: [100 299 299 100 100 200 160 ...]
1c. Put the above result into a table where I can see the runid
from weaveio import *
data = Data()
nsky = sum(data.runs.targuses == 'S', wrt=data.runs) # sum the number of skytargets with respect to their runs
query_table = data.runs[['id', nsky]] # design a table by using the square brackets
concrete_table = query_table() # make it "real" by executing the query
print(concrete_table)
print(type(concrete_table))
output:
id sum0
------- ----
1003453 100
1003440 299
... ...
<class 'weaveio.readquery.results.Table'> # although this is an astropy table really
2. I want to plot all single sky spectra from last night in the red arm
from weaveio import *
data = Data()
yesterday = 57811
runs = data.runs
is_red = runs.camera == 'red'
is_yesterday = floor(runs.exposure.mjd) == yesterday # round to integer, which is the day
runs = runs[is_red & is_yesterday] # filter the runs first
spectra = runs.l1single_spectra
sky_spectra = spectra[spectra.targuse == 'S']
table = sky_spectra[['wvl', 'flux']]
import matplotlib.pyplot as plt
for row in table: # this may take a while to plot, there is a lot of data
plt.plot(row.wvl, row.flux, 'k-', alpha=0.4)
plt.savefig('sky_spectra.png')
output:
3. I want to plot the H-alpha flux vs. L2 redshift distribution from all WL or W-QSO targets that were observed from all OBs observed in the past month. Use the stacked data
import matplotlib.pyplot as plt
data = Data()
l2s = data.l2stacks
l2s = l2s[(l2s.ob.mjd >= 57780) & any(l2s.fibre_target.surveys == '/WL.*/', wrt=l2s.fibre_target)]
l2s = l2s[l2s['ha_6562.80_flux'] > 0]
table = l2s[['ha_6562.80_flux', 'z']]()
plt.scatter(table['z'], table['ha_6562.80_flux'], s=1)
plt.yscale('log')
plt.savefig('ha-z.png')
4. I want to identify the WL spectrum with the brightest continuum at 5000AA and plot the spectrum from both red and blue arms, together with the error (variance) spectrum.
data = Data()
stackedspectra = data.l1stackedspectra # lots of different stacked spectra from many different OBs
wl_stackedspectra = stackedspectra[any(stackedspectra.surveys == 'WL')]
reds = wl_stackedspectra[wl_stackedspectra.camera == 'red']
blues = wl_stackedspectra[wl_stackedspectra.camera == 'blue']
continuum = []
for red, blue in reds(), blues(): # this loop is offline
continuum.append(my_special_module.median_flux(red, blue, 4950, 5050)) # do some fancy function you have written
index = np.argmax(continuum)
red = reds[index]()
blue = blues[index]()
import matplotlib.pyplot as plt
# uncomment the next line if you are using ipython so that you can see the plots interactively (don't forget to do ssh -XY lofar)
# %matplotlib
plt.plot(red.wvls, red.flux)
plt.plot(blue.wvls, blue.flux)
plt.plot(red.wvls, 1 / red.ivar, label='variance')
plt.plot(blue.wvls, 1 / blue.ivar, label='variance')
5. Get the brightest g-band target in an OB and plot some spectra
a. I would like to identify the brightest (g band) WL spectrum observed in an OB with obid=1234
(using the g band magnitude in the stacked spectrum). Plot the stack, in both red and blue arms (on same plot)
b. Plot the individual spectra that went into making the stack.
d. I would like to search for any other OB that contains the same astronomical object. Overplot the single spectra that were observed in those OBs
import matplotlib.pyplot as plt
# uncomment the next line if you are using ipython so that you can see the plots interactively (don't forget to do ssh -XY lofar)
# % matplotlib
data = Data()
####### < Part A
ob = data.obs[1234] # get the ob
# all L2 data that used stackedspectra.
l2stack = ob.l2stack
# return rows in the L2 dataset that correspond to a lofar target
lofar_l2 = l2stack[any(l2stack.surveys == 'WL')] # each target can belong to more than one survey
l2row = lofar_l2[lofar_l2.mag_gs == max(lofar_l2.mag_gs)] # get the one row that corresponds to the brightest lofar target
# now we jump from L2 rows to the stack spectra
brightest_red = l2row.l1stackspectra[l2row.l1stackspectra.camera == 'red']
brightest_blue = l2row.l1stackspectra[l2row.l1stackspectra.camera == 'blue']
# Now plot the actual data
fig, (redax, blueax) = plt.subplots()
redax.plot(brightest_red.wvls(), brightest_red.flux(), 'r-', label='brightest')
blueax.plot(brightest_blue.wvls(), brightest_blue.flux(), 'b-', label='brightest')
####### Part A />
####### < Part B
# now locate the indivdual single spectra that were stacked
red_spectra = brightest_red.l1singlespectra
blue_spectra = brightest_blue.l1singlespectra
# matplotlib allows you to plot multiple lines with 2d arrays
redax.plot(red_spectra.wvls(), red_spectra.flux(), 'r-', alpha=0.4, label='single for brightest')
blueax.plot(blue_spectra.wvls(), blue_spectra.flux(), 'r-', alpha=0.4, label='single for brightest')
####### Part B />
####### < Part C
# Now get all other stacked spectra that were observed for this target, no matter the OB
brightest_target = l2row.weavetarget
other_reds = brightest_target.l1stackspectra[brightest_target.l1stackspectra.camera == 'red']
other_blues = brightest_target.l1stackspectra[brightest_target.l1stackspectra.camera == 'blue']
# overplot the other observations
for wvl, flux in other_reds.wvls(), other_reds.flux():
redax.plot(wvl, flux, 'k:', alpha=0.2, label='from all obs')
####### Part C />
Demo
6. I would like to identify all of the sky spectra in an OB used to create the sky model spectrum for that OB. I would then like to compare that list to the list of spectra denoted to be used as sky in the input OB to determine how many extra or fewer sky spectra were used in the combination.
Rohit's question
I'm interested in getting the spectra for the AGN sample (specifically the LERGs) from the SV input, looking for sources between a redshift range, say 0.5 < z < 1 (or 1.4, whichever is the upper limit), and analysing their continuum.
I'm also interested in using emission line properties to perform source classifications
- select redshift 0.5 - 1
- get LERGs:
- $\log ([O III]/Hβ) - 1/3 \log ([N II]/Hα) + \log ([S II]/Hα) + \log ([O I]/Hα)$
- get stacked spectra
- Need
FLUX_EL_W
andERR_FLUX_EL_W
columns from L2 data, where EL is the emission line and W its wavelength
def excitation_index(oiii, hb, nii, sii, oi, ha):
return log(oiii/hb) - (log(nii/ha) / 3) + log(sii/ha) + log(oi/ha)
data = Data()
l2stack = data.l2stack
redshift = l2stack.z
EI = excitation_index(l2stack.flux_oiii_5007, l2stack.flux_hbeta, l2stack.flux_nii_6583, l2stack.flux_oi_6300,
l2stack.flux_sii_6716, l2stack.flux_halpha)
in_redshift_range = (l2stack.zbest > 0.5) & (l2stack.zbest < 1.)
is_lerg = EI < 0.95
spectra = l2stack[in_redshift_range & is_lerg].l1stackspectra
reds = spectra[spectra.camera == 'red']
blues = spectra[spectra.camera == 'blue']
import matplotlib.pyplot as plt
# uncomment the next line if you are using ipython so that you can see the plots interactively (don't forget to do ssh -XY lofar)
# % matplotlib
plt.plot(reds.wvls(), reds.flux())
plt.plot(blues.wvls(), blues.flux())
Anniek's fluxes and other catalogues
We can add any catalogue or file we like to the database, all that is required is a precise knowledge of what data was used to create the catalogue/file (so we can associate it correctly).
Find all reprocessed Ha flux done by Anniek for a given OB
data = Data()
data.ob.anniek.flux_halpha
Join your own catalogue based on cname
data = Data()
catalogue = read_my_nice_catalogue('...') # has a cname column
data.join(catalogue, 'cname', weavetarget.cname, name='my_catalogue') # only exists for this session
Join your own file that reprocessed data from a specific file (like Anniek's)
data = Data()
line_fluxes = read_line_fluxes('...') # from the weave single spectra associated with run 100423
data.join(line_fluxes, 'index', data.runs[100423].singlespectra, name='line_fluxes')
machine-readable-change-log ########################### 2022.0.2: include data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for weaveio-2022.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e81a261035f86a7183ba49f9568358eaa37140c687c9b1050d8005526dec2f5 |
|
MD5 | 481fc9cbdd4f98e72f92b8d3e76c73c6 |
|
BLAKE2b-256 | 6fec7e91e864b6b976d7b866dfde3230f164b04ff4237bfce45c4f8bf845385e |