DataVoyager Core with Asta Integration - AI agent for data analysis
Project description
This is the asta-flavor of DataVoyager. The standalone and original DataVoyager is datavoyager-core. Documentation there may still be relevant for unchanged parts of the core.
Development Setup
If you only need to work on the agent itself (this repo) without use of the Asta webapp or ecosystem, follow this guide. For e2e development of DV-Asta integration, see https://github.com/allenai/nora/wiki/DataVoyager-development which details all the processes you need control of.
This project uses uv for dependency management and assumes python 3.12. Install uv, then
uv sync
Go to 1password and get "datavoyager agent secrets." You will need to export
these vars to your environment. If you use something like direnv, you can
put those exports directly in your .envrc file. Don't forget to direnv allow.
Make a copy of .env.template to .env. If the secrets are exported to your
environment then no further changes are needed. Otherwise, to get those secrets
into docker containers, edit .env with all the secret values.
Check that you can start the agent in local mode. You should see something like this.
$> uv run python local.py
* config_file=config/datavoyager_modal_deployment_magentic_one_config_20250617.yaml
* json_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.jsonl
* md_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.md
* asta_logs.log_file=threads/local-2025-08-05T10-28-36.981698/log.istore.json
* console_log=threads/local-2025-08-05T10-28-36.981698/console.log
===============================================================================================
You: (enter two blank lines to submit)
Here you can interact with the agent like you would in the Asta webapp. A number of Asta integrations have been swapped with plaintext versions. DataVoyager's reasoning will spray into the console. As it happens you will be able to see evidence of various Asta-specific features, though rendered as plaintext. A few are called out in the excerpts from an example run below.
Here I pasted one of the stock queries from https://datavoyager.allen.ai into the terminal.
Out of age and gender, which factor affects the survival of titanic passengers the most? Use
the following dataset.
• s3://ai2-asta-workspaces/sampledata/titanic.csv
-----------------------------------------------------------------------------------------------
Step progress events have a unique treatment in the Asta UX, though they are simply mentioned in plaintext here.
>>> entering step >>> Checking for datasets
<<< exited step <<< Loaded 1 new dataset(s)
>>> entering step >>> Pre-processing request
<<< exited step <<< Pre-processing request
>>> entering step >>> Investigating
<dvtext> marks spans that will not be forwarded to the Asta UX,
but is important to DV's internal reasoning.
==================================================
MagenticOneOrchestrator
<dvtext>
We are working to address the following REQUEST:
Question: Out of age and gender, which factor affects the survival of titanic passengers the
most? Use the following dataset.
• s3://ai2-asta-workspaces/sampledata/titanic.csv
Python variable name for dataset: data_0
Dataset preview:
Dataset head:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
Eventually the agent will finish with what we consider the answer to the user query which
is annotated with <dvoutput>. The Asta UX picks up these tags to power specific UX behaviors.
<<< exited step <<< Investigation complete.
Asta: <dvoutput cell_id="10">
Based on the analysis of the Titanic dataset, the factor that has the most significant impact
on passenger survival is gender. The logistic regression results indicate that being female,
compared to being male, increases the odds of survival by approximately 1124.74%. In contrast,
age has a negligible effect on the odds of survival, with the regression suggesting only a
minor change of -0.47% per additional year. Thus, gender is the more impactful factor affecting
survival rates among Titanic passengers.
</dvoutput>
===============================================================================================
You: (enter two blank lines to submit)
Deployment
Make sure you have set up the necessary secrets in your environment, as described above.
Set up the .env file so that underlying docker containers will have those secrets.
The modal app name is figured as f"dv-core.{ENV}". Here is the
modal.com dashboard with all existing environments.
The dv-core.rc environment is automatically updated with the code in the main branch via a github action,
so long as they pass tests. Other environments must be updated manually. For example, to deploy to my own personal
development deployment called dv-core.jasond
ENV=jasond make deploy
The prod modal environment dv-core.prod is the last stable version of this codebase and is depended upon by the asta.allen.ai asta environment for code stability. When it is time to release this codebase's main to the prod modal environment, run whatever automated and manual tests needed to gain confidence that the current version of code is good before deploying it. To deploy to the prod modal environment you can run the deploy command above locally (changing the ENV value), or you can use the deploy-prod github workflow (under the actions tab). The latter is preferred. Please note: your changes will not be reflected in the asta.allen.ai environment, unless your changes are in the prod modal environment.
PyPI Installation and Publishing
DV can be run as a standalone CLI, installed from pypi:
uv pip install dv-core-asta
dv --help
To cut a new release:
- Bump the version:
- On your PR branch:
make show-versionmake set-version VERSION=x.y.z
- Just before merging to main:
make push-version-tag
- On GitHub, in the Actions tab, run the
Publish to PyPIworkflow
- Enter the version tag (e.g.,
v1.1.7) as theversioninput
Logs
Logs from the main DV process in modal are forwarded to GCP to the ai2-reviz project. They can be correlated with other activity there. See https://github.com/allenai/nora/wiki/Log-Filtering-and-Navigation
Remote File Sharing
Remote executions (Docker or Modal) use a shared FileShareSpec abstraction to declare which
paths should exist inside the runtime. See docs/remote_file_sharing.md for the full guide.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dv_core_asta-0.2.0.tar.gz.
File metadata
- Download URL: dv_core_asta-0.2.0.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65ac7e4d413a11a6607fdf301e3a0621b265f8ecbee59c894c4b0663fa9b0d4b
|
|
| MD5 |
572fa541257054cdb9f938e0395c7124
|
|
| BLAKE2b-256 |
396ad2a1274e19a2da4ca593687dc0433595bbb7bc41e30cc16080774963b67a
|
Provenance
The following attestation bundles were made for dv_core_asta-0.2.0.tar.gz:
Publisher:
publish-to-pypi.yml on allenai/dv-core-asta-integration
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dv_core_asta-0.2.0.tar.gz -
Subject digest:
65ac7e4d413a11a6607fdf301e3a0621b265f8ecbee59c894c4b0663fa9b0d4b - Sigstore transparency entry: 1608005497
- Sigstore integration time:
-
Permalink:
allenai/dv-core-asta-integration@197b5e8d0443baf876df88484cb0f170ba7afbb8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/allenai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@197b5e8d0443baf876df88484cb0f170ba7afbb8 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dv_core_asta-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dv_core_asta-0.2.0-py3-none-any.whl
- Upload date:
- Size: 128.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73eda4c9dc16991c32737012c9f044f935350387dbae10aca0c3abc901c9f826
|
|
| MD5 |
c24c0d65f5294c109927d3358c46a895
|
|
| BLAKE2b-256 |
e35f15a36876fa92e91df9ebacdfc467085b7689088c5936bbe730153add8cd5
|
Provenance
The following attestation bundles were made for dv_core_asta-0.2.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on allenai/dv-core-asta-integration
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dv_core_asta-0.2.0-py3-none-any.whl -
Subject digest:
73eda4c9dc16991c32737012c9f044f935350387dbae10aca0c3abc901c9f826 - Sigstore transparency entry: 1608005713
- Sigstore integration time:
-
Permalink:
allenai/dv-core-asta-integration@197b5e8d0443baf876df88484cb0f170ba7afbb8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/allenai
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@197b5e8d0443baf876df88484cb0f170ba7afbb8 -
Trigger Event:
workflow_dispatch
-
Statement type: