Skip to main content

User-defined science module for the Fink broker.

Project description

pypi Build Status codecov

Fink Science

This repository contains science modules used to generate added values to alert collected by the Fink broker.

Step 0: Fork this repository

Fork and clone the repository, and create a new folder in fink_science/. The name of the folder does not matter much, but try to make it meaningful as much as possible! Let's call it xmatch for the sake of this example.

Step 1: Define your science module

A module contains necessary routines and classes to process the data, and add values. Typically, you will receive alerts in input, and output the same alerts with additional information. Input alert information contains position, flux, telescope properties, ... You can find what's in an alert here [link to be added].

In this example, let's imagine you want to know if alerts have counterpart (cross-match) in the Simbad database based on their localisation on the sky. We wrote a small library containing all the routines (see the fink_science/xmatch folder), and we now write the processor in processor.py (name of the file needs to be processor.py):

@pandas_udf(StringType(), PandasUDFType.SCALAR) # <- mandatory
def cdsxmatch(objectid: Any, ra: Any, dec: Any) -> pd.Series:
    """ Query the CDSXmatch service to find identified objects
    in alerts. The catalog queried is the SIMBAD bibliographical database.

    Parameters
    ----------
    objectid: list of str or Spark DataFrame Column of str
        List containing object ids (custom)
    ra: list of float or Spark DataFrame Column of float
        List containing object ra coordinates
    dec: list of float or Spark DataFrame Column of float
        List containing object dec coordinates

    Returns
    ----------
    out: pandas.Series of string
        Return a Pandas DataFrame with the type of object found in Simbad.
        If the object is not found in Simbad, the type is
        marked as Unknown. In the case several objects match
        the centroid of the alert, only the closest is returned.
        If the request Failed (no match at all), return Column of Fail.

    Examples
    -----------
    Simulate fake data
    >>> ra = [26.8566983, 26.24497]
    >>> dec = [-26.9677112, -26.7569436]
    >>> id = ["1", "2"]

    Wrap data into a Spark DataFrame
    >>> rdd = spark.sparkContext.parallelize(zip(id, ra, dec))
    >>> df = rdd.toDF(['id', 'ra', 'dec'])
    >>> df.show() # doctest: +NORMALIZE_WHITESPACE
    +---+----------+-----------+
    | id|        ra|        dec|
    +---+----------+-----------+
    |  1|26.8566983|-26.9677112|
    |  2|  26.24497|-26.7569436|
    +---+----------+-----------+
    <BLANKLINE>

    Test the processor by adding a new column with the result of the xmatch
    >>> df = df.withColumn(
    ... 	'cdsxmatch', cdsxmatch(df['id'], df['ra'], df['dec']))
    >>> df.show() # doctest: +NORMALIZE_WHITESPACE
    +---+----------+-----------+---------+
    | id|        ra|        dec|cdsxmatch|
    +---+----------+-----------+---------+
    |  1|26.8566983|-26.9677112|     Star|
    |  2|  26.24497|-26.7569436|  Unknown|
    +---+----------+-----------+---------+
    <BLANKLINE>
    """
    # your logic goes here
    matches = cross_match_alerts_raw(
        objectid.values, ra.values, dec.values)

    # For regular alerts, the number of matches is always non-zero as
    # alerts with no counterpart will be labeled as Unknown.
    # If cross_match_alerts_raw returns a zero-length list of matches, it is
    # a sign of a CDS problem (logged).
    if len(matches) > 0:
        # (objectid, ra, dec, name, type)
        # return only the type.
        names = np.transpose(matches)[-1]
    else:
        # Tag as Fail if the request failed.
        names = ["Fail"] * len(objectid)

    # Return a column with added value after processing
    return pd.Series(names)

Remarks:

  • Note the use of the decorator is mandatory. It is a decorator for Apache Spark, and it specifies the output type as well as the type of operation. You need to specify the output type (string in this example).
  • The name of the routine will be used as the name of the new column. So once the processor loaded, you cannot change it! Hence choose a meaningful name!
  • The name of the input argument(s) must match the name of an alert entry(ies).
  • You can return only one new column (i.e. add one new information per alert).

Step 3: Open a pull request

Once your filter is done, we will review it. The criteria for acceptance are:

  • The science module works ;-)
  • The execution time is not too long.

We want to process data as fast as possible, and long running times add delay for further follow-up observations. What execution time is acceptable? It depends, but in any case communicate early the extra time overhead, and we can have a look together on how to speed-up the process if needed.

Step 4: Play!

If your module is accepted, it will be plugged in the broker, and outgoing alerts will contain new information! Define your filter using fink-filters, and you will then be able to receive these alerts in (near) real-time using the fink-client. Note that we do not keep alerts forever available in the broker. While the retention period is not yet defined, you can expect emitted alerts to be available no longer than one week.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fink-science-0.1.2.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fink_science-0.1.2-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file fink-science-0.1.2.tar.gz.

File metadata

  • Download URL: fink-science-0.1.2.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for fink-science-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e96ebc7c22140aa2bc633f1ac0a2364c14e990310f6761ac26bbe705ede9a193
MD5 07aa9765da9e3ef091253fd35d4186f1
BLAKE2b-256 3271d71501d1a2cf55a088d416bf125ba7636e90662c7e44803d8f17e5efc3fc

See more details on using hashes here.

File details

Details for the file fink_science-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: fink_science-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for fink_science-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4e0289c83f54f20cb21671c750caa65640e90c9c7881760aa473ec205989ef48
MD5 c3cd9794456c37f56b1d675134f0668d
BLAKE2b-256 873cf6a6fa76d0575c67b3445e23fb479686e55b6e18cb2cfbdf819f7a4e98df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page