No project description provided

These details have not been verified by PyPI

Project links

Project description

Auctus Search

Discover and Load Datasets in your Notebook

with ease-of-use

Python 3.9+ Skrub Jupyter Version Unstable

Search for datasets using Auctus and integrate them seamlessly into your Notebook exploration!

Auctus Search Cover

[!IMPORTANT]

We highly recommend to explore the /example folder for Jupyter Notebook-based tutorials 🎉

The following library is under active-development and is not yet stable. Expect bugs & frequent changes!

Marimo is not yet supported/nor-tested but is in discussion for future releases.

🌆 Auctus Search –– In a Nutshell

Auctus Search is a lightweight library that connects to the Auctus API, allowing easy search, filtering, and loading of datasets.

It offers an easy way to find datasets .search_datasets(search_query="Taxis"), preview them interactively .display(), optionally filter them .with_types(["spatial"]) or .with_score_greater_than(20) to name a few, and integrate them into your notebook workflow as pandas.DataFrame or geopandas.GeoDataFrame objects, .load_selected_dataset().

For a more advanced usage, you can even .profile_selected_dataset() which uses Data Profile Vis under the hood. See further in the API's section.

For developers, it also allows you to integrate it all into your project, have a look at the Auctus Search Mixin in the OSMNxMapping – It is fully integrated for the user to benefits from the Auctus Search capabilities and most importantly the great Auctus API as a whole.

See further notebook-based examples in the examples/ directory. 📓

🥐 Installation

We highly recommend using uv for installation from source to avoid the hassle of Conda or other package managers. It is also the fastest known to date on the OSS market and manages dependencies seamlessly without manual environment activation (Biggest flex!). If you do not want to use uv, there are no issues, but we will cover it in the upcoming section; but in the incoming documentation.

First, ensure uv is installed on your machine by following these instructions.

Prerequisites

Install uv as described above.
Clone Auctus Search (required for alpha development) into your desired directory. Use:
```
git clone git@github.com:VIDA-NYU/auctus_search.git
```
This step ensures pyproject.toml builds auctus_search from source during installation, though we plan for auctus_search to become a PyPi package (uv add auctus_search or pip install auctus_search) in future releases.

Steps

Jump into the Auctus Search repository:
```
cd auctus_search
```
Lock and sync dependencies with uv:
```
uv lock
uv sync
```
(Recommended) Install Jupyter extensions for interactive features requiring Jupyter widgets:
```
uv run jupyter labextension install @jupyter-widgets/jupyterlab-manager
```
Launch Jupyter Lab to explore Auctus Search (Way faster than running Jupyter without uv):
```
uv run --with jupyter jupyter lab
```

[!NOTE]
Future versions will simplify this process: auctus_search will move to PyPi, removing the need for manual cloning, and Jupyter extensions will auto-install via pyproject.toml configuration.

Voila 🥐! You’re all set to explore Auctus Search in Jupyter Lab.

Getting Started!

Below is a concise, step-by-step example of how to use the Auctus Search library in a Jupyter notebook.

Cell 1: Import the Library

from auctus_search import AuctusSearch
# This imports the main `AuctusSearch` class, which provides all the functionality we'll use.

Cell 2: Initialise An AuctusSearch Instance

search = AuctusSearch()  # Create an instance of `AuctusSearch` to start searching for datasets. This object will handle all interactions with the Auctus API and dataset management.

Cell 3: Search for Datasets

collection = search.search_datasets(search_query="Taxis", display_initial_results=True)

# Search for datasets related to "Taxis" (very broad right!). The `search_datasets` method queries the Auctus API and returns a
# `DatasetCollection`. Setting `display_initial_results=True` shows the initial results interactively in the notebook,
# allowing you to see available datasets right away.

# More parameters such as page and size for pagination are available, but we'll stick to the defaults for now. Readers are instructed to check the API below for more details.

Cell 4: Filter the Dataset Collection

filtered_collection = (
    collection
    .with_types(["spatial"])  
    # Refine the search results to only include datasets that at least have a spatial component.
    .with_number_of_rows_greater_than(100000)
    # Refine further to – after the with_types– only include datasets with more than 100,000 rows.
)

Cell 5: Display Filtered Datasets Interactively

filtered_collection.display()

# Display the filtered datasets in an interactive grid. Each dataset is shown as a card with details like name, source,
# and size. You can click "Select This Dataset" on any card to choose one for further use.

Cell 6: Load the Selected Dataset

dataset = search.load_selected_dataset()

# After selecting a dataset in the previous step, this loads it into memory as a `pandas.DataFrame` (or
# `geopandas.GeoDataFrame` if spatial). By default, it also displays an interactive table preview of the dataset.

Are you coping with the idea of Auctus Search a lightweight jupyter-focussed wrapper around the Auctus API?

Want more filtering actions? Have more advanced usage? Check the API below for more details on how to filter datasets.

Enjoy! 🥐

🗺️ Roadmap / Future Work

[!NOTE]
For more about future works, explore the issues tab above!

From labs to more general communities, we want to advance Auctus Search by attaining large unit-test coverage, integrating routines via G.Actions, and producing thorough documentation for users all around.
It would be very interesting to explore interfacing the whole management of the Auctus API so that we could add any alternative to Auctus to have a pretty large library being able to target multiple dataset collection APIs. Such as: https://lil.law.harvard.edu/blog/2025/02/06/announcing-data-gov-archive/

We are also looking forward to seeing more examples in the examples/ directory; Yet in the meantime, we are happy to welcome you to contribute to the library 🎄

🌁 API

[!IMPORTANT] The following project is fully python-typed safe and uses the great @beartype! It should reduce side effects and better library usability on the user end side.

The Auctus Search API is split into two main parts: the AuctusSearch class for searching, profiling, and loading datasets, and the AuctusDatasetCollection class for filtering and displaying results. Here's the rundown:

AuctusSearch

Your main entry point for searching, profiling, and loading datasets.

search_datasets(search_query, page=1, size=10, display_initial_results=False)

Purpose: Searches the Auctus API for datasets matching your query.
Parameters:
- search_query (str or list): Search term(s) (e.g., "Taxis" or ["Taxis", "NYC"] – could also be "Taxis NYC").
- page (int, default=1): Page number of results for pagination. Works with size; a higher size means fewer pages, while a lower size increases the number of pages.
- size (int, default=10): Number of results per page.
- display_initial_results (bool, default=False): If True, displays initial results in a Jupyter notebook cell.
Returns: An AuctusDatasetCollection object containing the search results.

Example:

from auctus_search import AuctusSearch
search = AuctusSearch()
collection = search.search_datasets(search_query="Taxis", page=1, size=100)  # Fetches all "Taxis" data without pagination (may take longer and require scrolling). Adjust `size` and `page` as needed.

profile_selected_dataset()

Purpose: Displays an interactive data profile summary of the selected dataset using the Data Profile Viz library. Requires a dataset to be selected (via search_datasets(.)) and its metadata to be available.
Parameters: None
Returns: None (displays the profile interactively in the notebook)
Raises:
- ValueError if no dataset is selected or if metadata is missing.

Example:

from auctus_search import AuctusSearch
search = AuctusSearch()
collection = search.search_datasets(search_query="Taxis")
collection.display()  # Displays dataset cards; select one by clicking "Select This Dataset"
search.profile_selected_dataset()  # Shows the interactive profile

Note that most probably, an profile_edit_selected_dataset(.) could soon see the light of day. See further in https://github.com/soniacq/DataProfileVis.

load_selected_dataset(display_table=True)

Purpose: Downloads and loads the dataset you selected from the collection (after clicking Select This Dataset).
Parameters:
- display_table (bool, default=True): If True, shows a preview table using Skrub.
Returns: A pandas.DataFrame or geopandas.GeoDataFrame (currently supports CSV; more formats coming soon!).
Raises: ValueError if no dataset is selected.

Example:

dataset = search.load_selected_dataset()  # Ensure a dataset is selected first, or it raises a ValueError.

interactive_table_display(dataframe, n_rows=10, order_by=None, title="Table Report", column_filters=None, verbose=1)

Purpose: Displays an interactive table of your loaded dataset in Jupyter.
Parameters:
- dataframe (pandas.DataFrame or geopandas.GeoDataFrame): The dataset to display.
- n_rows (int, default=10): Number of rows to show.
- order_by (str or list, optional): Column(s) to sort by.
- title (str, optional): Table title.
- column_filters (dict, optional): Filters for columns (e.g., {"city": {"eq": "NYC"}}).
- verbose (int, default=1): Verbosity level.
Returns: None (displays the table in the notebook).

Example:

search.interactive_table_display(dataset, n_rows=5, title="Taxis Data")

AuctusDatasetCollection

A helper class to filter and explore datasets returned from a search. It supports chaining filter methods, making it ideal for interactive use in Jupyter notebooks compared to parameter-heavy alternatives.

Filtering Methods

with_types(types)
- Purpose: Filters datasets by dataset types (e.g., "spatial", "temporal", "numerical", "categorical").
- Parameters:
  - types (list): List of desired types, e.g., ["spatial", "temporal"].
- Returns: A new AuctusDatasetCollection.
- Example:
```
filtered = collection.with_types(["spatial"])
```
with_number_of_rows_greater_than(min_rows)
- Purpose: Keeps datasets with more than min_rows rows.
- Parameters:
  - min_rows (int): Minimum number of rows.
- Returns: A new AuctusDatasetCollection.
- Example:
```
filtered = collection.with_number_of_rows_greater_than(500)
```
with_number_of_rows_less_than(max_rows)
- Purpose: Keeps datasets with fewer than max_rows rows.
- Parameters:
  - max_rows (int): Maximum number of rows.
- Returns: A new AuctusDatasetCollection.
with_number_of_rows_between(min_rows, max_rows)
- Purpose: Filters datasets with rows between min_rows and max_rows.
- Parameters:
  - min_rows (int): Minimum number of rows.
  - max_rows (int): Maximum number of rows.
- Returns: A new AuctusDatasetCollection.
with_number_of_columns_greater_than(min_columns)
- Purpose: Keeps datasets with more than min_columns columns.
- Parameters:
  - min_columns (int): Minimum number of columns.
- Returns: A new AuctusDatasetCollection.
with_number_of_columns_less_than(max_columns)
- Purpose: Keeps datasets with fewer than max_columns columns.
- Parameters:
  - max_columns (int): Maximum number of columns.
- Returns: A new AuctusDatasetCollection.
with_number_of_columns_between(min_columns, max_columns)
- Purpose: Filters datasets with columns between min_columns and max_columns.
- Parameters:
  - min_columns (int): Minimum number of columns.
  - max_columns (int): Maximum number of columns.
- Returns: A new AuctusDatasetCollection.
with_score_greater_than(min_score)
- Purpose: Keeps datasets with a relevancy score above min_score.
- Parameters:
  - min_score (int or float): Minimum score.
- Returns: A new AuctusDatasetCollection.
- Example:
```
filtered = collection.with_score_greater_than(20)
```
with_score_less_than(max_score)
- Purpose: Keeps datasets with a score below max_score. (Less useful since higher scores indicate better relevancy, but included for flexibility.)
- Parameters:
  - max_score (int or float): Maximum score.
- Returns: A new AuctusDatasetCollection.
with_score_between(min_score, max_score)
- Purpose: Filters datasets with scores between min_score and max_score.
- Parameters:
  - min_score (int or float): Minimum score.
  - max_score (int or float): Maximum score.
- Returns: A new AuctusDatasetCollection.

preview()

Purpose: Prints a summary of the dataset collection (search query, filters, and count).
Returns: None (prints to console).
Example:
```
filtered.preview()
```

display()

Purpose: Shows an interactive grid of dataset cards in Jupyter for you to select one.
Returns: None (displays in notebook).
Example:
```
filtered.display()
```

📓 Examples

Check out the examples/ directory in the Auctus Search repo for more detailed Jupyter notebook examples.

Licence

Auctus Search is released under the MIT Licence.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auctus_search-0.1.0.tar.gz (27.3 kB view details)

Uploaded May 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

auctus_search-0.1.0-py3-none-any.whl (20.8 kB view details)

Uploaded May 20, 2025 Python 3

File details

Details for the file auctus_search-0.1.0.tar.gz.

File metadata

Download URL: auctus_search-0.1.0.tar.gz
Upload date: May 20, 2025
Size: 27.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.11

File hashes

Hashes for auctus_search-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`976047a12e0ca53e2f673b3918ed47acff5c7de798665c5f52872e43a80309a0`
MD5	`ffd90e61434cf9a372487ab2ca4e9527`
BLAKE2b-256	`c26129e79c310cc47eda4a0fdf31a99d9c75e9f3da24cd479d080ca29e1fca00`

See more details on using hashes here.

File details

Details for the file auctus_search-0.1.0-py3-none-any.whl.

File metadata

Download URL: auctus_search-0.1.0-py3-none-any.whl
Upload date: May 20, 2025
Size: 20.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.11

File hashes

Hashes for auctus_search-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0f81d44faee23f749d90e89cd5918acd526b8774df0340033371c262be609d0`
MD5	`e8903f8d40d3067cbb2e3c8e7ff52ec9`
BLAKE2b-256	`c5b1f037b09a504447ebd24972041a489faf140c72094e116fb481e06ce75439`

See more details on using hashes here.

auctus-search 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Auctus Search

Discover and Load Datasets in your Notebook

🌆 Auctus Search –– In a Nutshell

🥐 Installation

Prerequisites

Steps

Getting Started!

Cell 1: Import the Library

Cell 2: Initialise An AuctusSearch Instance

Cell 3: Search for Datasets

Cell 4: Filter the Dataset Collection

Cell 5: Display Filtered Datasets Interactively

Cell 6: Load the Selected Dataset

🗺️ Roadmap / Future Work

🌁 API

AuctusSearch

AuctusDatasetCollection

📓 Examples

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes