Neptune Fetcher

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Neptune Fetcher

[!NOTE] This package is experimental.

Neptune Fetcher is a Python package designed to separate data retrieval capabilities from the regular neptune package. This separation bypasses the need to initialize the heavy structures of the regular package, which makes data fetching more efficient and improves performance.

Installation

pip install neptune-fetcher

Example usage

Fetching data frame containing run fields

from neptune_fetcher import ReadOnlyProject

project = ReadOnlyProject("workspace/project")
# Fetch all runs with specific columns
runs_df = project.fetch_runs_df(
    columns=["sys/name", "sys/modification_time", "training/lr"],
)

Fetching data from multiple runs

from neptune_fetcher import ReadOnlyProject

project = ReadOnlyProject("workspace/project")

for run in project.fetch_read_only_runs(with_ids=["RUN-1", "RUN-2"]):
    run.prefetch(["parameters/optimizer", "parameters/init_lr"])

    print(run["parameters/optimizer"].fetch())
    print(run["parameters/init_lr"].fetch())

Listing run identifiers

from neptune_fetcher import ReadOnlyProject

project = ReadOnlyProject("workspace/project")

for run in project.list_runs():
    print(run)

Fetching data from a single run

from neptune_fetcher import ReadOnlyProject, ReadOnlyRun

project = ReadOnlyProject("workspace/project")
run = ReadOnlyRun(project, with_id="TES-1")
run.prefetch(["parameters/optimizer", "parameters/init_lr"])

print(run["parameters/optimizer"].fetch())
print(run["parameters/init_lr"].fetch())

API reference

`ReadOnlyProject`

Representation of a Neptune project in a limited read-only mode.

Initialization

Initialize with the ReadOnlyProject class constructor.

Parameters:

Name	Type	Default	Description
`project`	`str`, optional	`None`	Name of a project in the form `workspace-name/project-name`. If `None`, the value of the `NEPTUNE_PROJECT` environment variable is used.
`api_token`	`str`, optional	`None`	Your Neptune API token (or a service account's API token). If `None`, the value of the `NEPTUNE_API_TOKEN` environment variable is used. To keep your token secure, avoid placing it in source code. Instead, save it as an environment variable.
`proxies`	`dict`, optional	`None`	Argument passed to HTTP calls made via the Requests library. For details on proxies, see the Requests documentation.

Example:

project = ReadOnlyProject("workspace/project", api_token="...")

`list_runs()`

Lists minimal information, like identifier and name, for every run in a project.

Example:

for run in project.list_runs():
    print(run)

Returns: Iterator of dictionaries with run identifiers and names.

`fetch_runs()`

Fetches a table containing IDs and names of runs in the project.

Example:

df = project.fetch_runs()

Returns: pandas.DataFrame with two columns (sys/id and sys/name) and rows corresponding to project runs.

`fetch_runs_df()`

Fetches the runs' metadata and returns them as a pandas DataFrame.

Parameters:

Name	Type	Default	Description
`columns`	`List[str]`, optional	`None`	Names of columns to include in the table, as a list of field names. The Neptune ID (`"sys/id"`) is included automatically. If `None`, all the columns of the experiments table are included.
`columns_regex`	`str`, optional	`None`	A regex pattern to filter columns by name. Use this parameter to include columns in addition to the ones specified by the `columns` parameter.
`names_regex`	`str`, optional	`None`	A regex pattern to filter the runs by name. When applied, it needs to limit the number of runs to 100 or fewer.
`with_ids`	`List[str]`, optional	`None`	List of multiple Neptune IDs. Example: `["NLU-1", "NLU-2"]`. Matching any element of the list is sufficient to pass the criterion.
`states`	`List[str]`, optional	`None`	List of states. Possible values: `"inactive"`, `"active"`. "Active" means that at least one process is connected to the run. Matching any element of the list is sufficient to pass the criterion.
`owners`	`List[str]`, optional	`None`	List of multiple owners. Example: `["frederic", "josh"]`. The owner is the user who created the run. Matching any element of the list is sufficient to pass the criterion.
`tags`	`List[str]`, optional	`None`	A list of tags. Example: `"lightGBM"` or `["pytorch", "cycleLR"]`. Note: Only runs that have all specified tags will pass this criterion.
`trashed`	`bool`, optional	`False`	Whether to retrieve trashed runs. If `True`, only trashed runs are retrieved. If `False`, only non-trashed runs are retrieved. If `None` or left empty, all run objects are retrieved, including trashed ones.
`limit`	`int`, optional	`None`	Maximum number of runs to fetch. If `None`, all runs are fetched.
`sort_by`	`str`, optional	`sys/creation_time`	Name of the field to sort the results by. The field must represent a simple type (string, float, integer).
`ascending`	`bool`, optional	`False`	Whether to sort the entries in ascending order of the sorting column values.
`progress_bar`	`bool`, `Type[ProgressBarCallback]`, optional	`None`	Set to `False` to disable the download progress bar, or pass a type of ProgressBarCallback to use your own progress bar. If set to `None` or `True`, the default tqdm-based progress bar will be used.

Example:

# Fetch all runs with specific columns
runs_df = project.fetch_runs_df(
	columns=["sys/name", "sys/modification_time", "training/lr"],
)

# Fetch all runs with specific columns and extra columns that match a regex pattern
runs_df = project.fetch_runs_df(
	columns=["sys/name", "sys/modification_time"],
    columns_regex='tree/.*'
)

# Fetch runs by specific IDs
specific_runs_df = my_project.fetch_runs_df(
	with_ids=["RUN-123", "RUN-456"]
)

# Filter by name regex
specific_runs_df = my_project.fetch_runs_df(
	names_regex='tree_3[2-4]+'
)

Returns: pandas.DataFrame: A pandas DataFrame containing metadata of the fetched runs.

`fetch_read_only_runs()`

List runs of the project in the form of ReadOnlyRun.

Parameters:

Name	Type	Default	Description
`with_ids`	`List[str]`	-	List of Neptune run IDs to fetch.

Example:

for run in project.fetch_read_only_runs(with_ids=["RUN-1", "RUN-2"]):
    ...

Returns: Iterator of ReadOnlyRun objects.

`ReadOnlyRun`

Representation of a Neptune run in a limited read-only mode.

Initialization

Can be created with the class constructor, or as a result of the fetch_read_only_runs() method of the ReadOnlyProject class.

Parameters:

Name	Type	Default	Description
`read_only_project`	`ReadOnlyProject`	-	Source project from which run will be fetched.
`with_id`	`str`	-	Neptune run ID to fetch. Example: `RUN-1`.

Example:

from neptune_fetcher import ReadOnlyProject, ReadOnlyRun

project = ReadOnlyProject("workspace/project", api_token="...")
run = ReadOnlyRun(project, with_id="TES-1")

`.field_names`

List of run field names.

Example:

for run in project.fetch_read_only_runs(with_ids=["TES-1", "TES-2"]):
    print(list(run.field_names))

Returns: Iterator of run fields as strings.

Field lookup: `run[field_name]`

Used to access a specific field of a run. See Available types.

Returns: An internal object used to operate on a specific field.

Example:

run_id = run["sys/id"].fetch()

`prefetch()`

Pre-fetches a batch of fields to the internal cache.

Improves the performance of access to consecutive field values. Only simple field types are supported (int, float, str).

Parameters:

Name	Type	Default	Description
`paths`	`List[str]`	-	List of paths to fetch to the cache.

Example:

run.prefetch(["parameters/optimizer", "parameter/init_lr"])
# No more calls to the API
print(run["parameters/optimizer"].fetch())
print(run["parameter/init_lr"].fetch())

loss = run["loss"].fetch_last()

Returns: Optional[float]

`fetch_values()`

Retrieves all series values from the API.

Parameters:

Name	Type	Default	Description
`include_timestamp`	`bool`	True	Whether the fetched data should include the timestamp field.

Example:

values = run["loss"].fetch_values()

Returns: pandas.DataFrame

License

This project is licensed under the Apache License Version 2.0. For more details, see Apache License Version 2.0.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.3.0

Apr 23, 2024

0.2.0

Apr 19, 2024

0.1.0

Apr 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neptune_fetcher-0.3.0.tar.gz (17.5 kB view hashes)

Uploaded Apr 23, 2024 Source

Built Distribution

neptune_fetcher-0.3.0-py3-none-any.whl (18.8 kB view hashes)

Uploaded Apr 23, 2024 Python 3

Hashes for neptune_fetcher-0.3.0.tar.gz

Hashes for neptune_fetcher-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`8c437fc0d741bf806e8125d07ffa67e7db60ab1c2c587b4f8a6789a4f9a1bfbd`
MD5	`fcde736c3f2d56883c6d88428c1f8f37`
BLAKE2b-256	`2ca6cf6018fa7ea1da8b014c2db78ada522bf6d99350bb7a6b8ccbda20841ad5`

Hashes for neptune_fetcher-0.3.0-py3-none-any.whl

Hashes for neptune_fetcher-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66e42e92260528105297fc01792a69d7370b09bd67f8ec82dc78dba54464ae7b`
MD5	`7a03553d6dd7efc80e83eb4d400624e9`
BLAKE2b-256	`bf08e4a6ab568288ed170f95052d268d753092144939f0d294870acb0f12fe57`

neptune-fetcher 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Neptune Fetcher

Installation

Example usage

Fetching data frame containing run fields

Fetching data from multiple runs

Listing run identifiers

Fetching data from a single run

API reference

ReadOnlyProject

Initialization

list_runs()

fetch_runs()

fetch_runs_df()

fetch_read_only_runs()

ReadOnlyRun

Initialization

.field_names

Field lookup: run[field_name]

prefetch()

Available types

Integer

fetch()

Float

fetch()

String

fetch()

Datetime

fetch()

Object state

fetch()

Boolean

fetch()

Float series

fetch() or fetch_last()

fetch_values()

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`ReadOnlyProject`

`list_runs()`

`fetch_runs()`

`fetch_runs_df()`

`fetch_read_only_runs()`

`ReadOnlyRun`

`.field_names`

Field lookup: `run[field_name]`

`prefetch()`

`fetch()`

`fetch()`

`fetch()`

`fetch()`

`fetch()`

`fetch()`

`fetch()` or `fetch_last()`

`fetch_values()`