Skip to main content

DataVoyager Core with Asta Integration - AI agent for data analysis

Project description

This is the asta-flavor of DataVoyager. The standalone and original DataVoyager is datavoyager-core. Documentation there may still be relevant for unchanged parts of the core.

Development Setup

If you only need to work on the agent itself (this repo) without use of the Asta webapp or ecosystem, follow this guide. For e2e development of DV-Asta integration, see https://github.com/allenai/nora/wiki/DataVoyager-development which details all the processes you need control of.

This project uses uv for dependency management and assumes python 3.12. Install uv, then

uv sync

Go to 1password and get "datavoyager agent secrets." You will need to export these vars to your environment. If you use something like direnv, you can put those exports directly in your .envrc file. Don't forget to direnv allow.

Make a copy of .env.template to .env. If the secrets are exported to your environment then no further changes are needed. Otherwise, to get those secrets into docker containers, edit .env with all the secret values.

Check that you can start the agent in local mode. You should see something like this.

$> uv run python local.py
* config_file=config/datavoyager_modal_deployment_magentic_one_config_20250617.yaml
* json_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.jsonl
* md_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.md
* asta_logs.log_file=threads/local-2025-08-05T10-28-36.981698/log.istore.json
* console_log=threads/local-2025-08-05T10-28-36.981698/console.log 
===============================================================================================
You: (enter two blank lines to submit)

Here you can interact with the agent like you would in the Asta webapp. A number of Asta integrations have been swapped with plaintext versions. DataVoyager's reasoning will spray into the console. As it happens you will be able to see evidence of various Asta-specific features, though rendered as plaintext. A few are called out in the excerpts from an example run below.

Here I pasted one of the stock queries from https://datavoyager.allen.ai into the terminal.

Out of age and gender, which factor affects the survival of titanic passengers the most? Use 
the following dataset.

• s3://ai2-asta-workspaces/sampledata/titanic.csv


-----------------------------------------------------------------------------------------------

Step progress events have a unique treatment in the Asta UX, though they are simply mentioned in plaintext here.

>>> entering step >>> Checking for datasets
<<< exited step <<< Loaded 1 new dataset(s)
>>> entering step >>> Pre-processing request
<<< exited step <<< Pre-processing request
>>> entering step >>> Investigating

<dvtext> marks spans that will not be forwarded to the Asta UX, but is important to DV's internal reasoning.

==================================================
MagenticOneOrchestrator
<dvtext>
We are working to address the following REQUEST:

Question: Out of age and gender, which factor affects the survival of titanic passengers the 
most? Use the following dataset.

• s3://ai2-asta-workspaces/sampledata/titanic.csv

Python variable name for dataset: data_0
Dataset preview:
Dataset head:
PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
1            2         1       1  ...  71.2833   C85         C
2            3         1       3  ...   7.9250   NaN         S

Eventually the agent will finish with what we consider the answer to the user query which is annotated with <dvoutput>. The Asta UX picks up these tags to power specific UX behaviors.

<<< exited step <<< Investigation complete.

Asta: <dvoutput cell_id="10">
Based on the analysis of the Titanic dataset, the factor that has the most significant impact 
on passenger survival is gender. The logistic regression results indicate that being female, 
compared to being male, increases the odds of survival by approximately 1124.74%. In contrast, 
age has a negligible effect on the odds of survival, with the regression suggesting only a 
minor change of -0.47% per additional year. Thus, gender is the more impactful factor affecting
survival rates among Titanic passengers.
</dvoutput>

===============================================================================================
You: (enter two blank lines to submit)

Deployment

Make sure you have set up the necessary secrets in your environment, as described above. Set up the .env file so that underlying docker containers will have those secrets.

The modal app name is figured as f"dv-core.{ENV}". Here is the modal.com dashboard with all existing environments. The dv-core.rc environment is automatically updated with the code in the main branch via a github action, so long as they pass tests. Other environments must be updated manually. For example, to deploy to my own personal development deployment called dv-core.jasond

ENV=jasond make deploy

The prod modal environment dv-core.prod is the last stable version of this codebase and is depended upon by the asta.allen.ai asta environment for code stability. When it is time to release this codebase's main to the prod modal environment, run whatever automated and manual tests needed to gain confidence that the current version of code is good before deploying it. To deploy to the prod modal environment you can run the deploy command above locally (changing the ENV value), or you can use the deploy-prod github workflow (under the actions tab). The latter is preferred. Please note: your changes will not be reflected in the asta.allen.ai environment, unless your changes are in the prod modal environment.

PyPI Installation and Publishing

DV can be run as a standalone CLI, installed from pypi:

uv pip install dv-core-asta
dv --help

To cut a new release:

  1. Bump the version:
  • On your PR branch:
  • make show-version
  • make set-version VERSION=x.y.z
  1. Just before merging to main:
  • make push-version-tag
  1. On GitHub, in the Actions tab, run the Publish to PyPI workflow
  • Enter the version tag (e.g., v1.1.7) as the version input

Logs

Logs from the main DV process in modal are forwarded to GCP to the ai2-reviz project. They can be correlated with other activity there. See https://github.com/allenai/nora/wiki/Log-Filtering-and-Navigation

Remote File Sharing

Remote executions (Docker or Modal) use a shared FileShareSpec abstraction to declare which paths should exist inside the runtime. See docs/remote_file_sharing.md for the full guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dv_core_asta-0.2.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dv_core_asta-0.2.0-py3-none-any.whl (128.5 kB view details)

Uploaded Python 3

File details

Details for the file dv_core_asta-0.2.0.tar.gz.

File metadata

  • Download URL: dv_core_asta-0.2.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dv_core_asta-0.2.0.tar.gz
Algorithm Hash digest
SHA256 65ac7e4d413a11a6607fdf301e3a0621b265f8ecbee59c894c4b0663fa9b0d4b
MD5 572fa541257054cdb9f938e0395c7124
BLAKE2b-256 396ad2a1274e19a2da4ca593687dc0433595bbb7bc41e30cc16080774963b67a

See more details on using hashes here.

Provenance

The following attestation bundles were made for dv_core_asta-0.2.0.tar.gz:

Publisher: publish-to-pypi.yml on allenai/dv-core-asta-integration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dv_core_asta-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dv_core_asta-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 128.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dv_core_asta-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73eda4c9dc16991c32737012c9f044f935350387dbae10aca0c3abc901c9f826
MD5 c24c0d65f5294c109927d3358c46a895
BLAKE2b-256 e35f15a36876fa92e91df9ebacdfc467085b7689088c5936bbe730153add8cd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for dv_core_asta-0.2.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on allenai/dv-core-asta-integration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page