Skip to main content

DataVoyager Core with Asta Integration - AI agent for data analysis

Project description

This is the asta-flavor of DataVoyager. The standalone and original DataVoyager is datavoyager-core. Documentation there may still be relevant for unchanged parts of the core.

Development Setup

If you only need to work on the agent itself (this repo) without use of the Asta webapp or ecosystem, follow this guide. For e2e development of DV-Asta integration, see https://github.com/allenai/nora/wiki/DataVoyager-development which details all the processes you need control of.

This project uses uv for dependency management and assumes python 3.12. Install uv, then

uv sync

Go to 1password and get "datavoyager agent secrets." You will need to export these vars to your environment. If you use something like direnv, you can put those exports directly in your .envrc file. Don't forget to direnv allow.

Make a copy of .env.template to .env. If the secrets are exported to your environment then no further changes are needed. Otherwise, to get those secrets into docker containers, edit .env with all the secret values.

Check that you can start the agent in local mode. You should see something like this.

$> uv run python local.py
* config_file=config/datavoyager_modal_deployment_magentic_one_config_20250617.yaml
* json_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.jsonl
* md_logger.log_filename=threads/local-2025-08-05T10-28-36.981698/log.md
* asta_logs.log_file=threads/local-2025-08-05T10-28-36.981698/log.istore.json
* console_log=threads/local-2025-08-05T10-28-36.981698/console.log 
===============================================================================================
You: (enter two blank lines to submit)

Here you can interact with the agent like you would in the Asta webapp. A number of Asta integrations have been swapped with plaintext versions. DataVoyager's reasoning will spray into the console. As it happens you will be able to see evidence of various Asta-specific features, though rendered as plaintext. A few are called out in the excerpts from an example run below.

Here I pasted one of the stock queries from https://datavoyager.allen.ai into the terminal.

Out of age and gender, which factor affects the survival of titanic passengers the most? Use 
the following dataset.

• s3://ai2-asta-workspaces/sampledata/titanic.csv


-----------------------------------------------------------------------------------------------

Step progress events have a unique treatment in the Asta UX, though they are simply mentioned in plaintext here.

>>> entering step >>> Checking for datasets
<<< exited step <<< Loaded 1 new dataset(s)
>>> entering step >>> Pre-processing request
<<< exited step <<< Pre-processing request
>>> entering step >>> Investigating

<dvtext> marks spans that will not be forwarded to the Asta UX, but is important to DV's internal reasoning.

==================================================
MagenticOneOrchestrator
<dvtext>
We are working to address the following REQUEST:

Question: Out of age and gender, which factor affects the survival of titanic passengers the 
most? Use the following dataset.

• s3://ai2-asta-workspaces/sampledata/titanic.csv

Python variable name for dataset: data_0
Dataset preview:
Dataset head:
PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
1            2         1       1  ...  71.2833   C85         C
2            3         1       3  ...   7.9250   NaN         S

Eventually the agent will finish with what we consider the answer to the user query which is annotated with <dvoutput>. The Asta UX picks up these tags to power specific UX behaviors.

<<< exited step <<< Investigation complete.

Asta: <dvoutput cell_id="10">
Based on the analysis of the Titanic dataset, the factor that has the most significant impact 
on passenger survival is gender. The logistic regression results indicate that being female, 
compared to being male, increases the odds of survival by approximately 1124.74%. In contrast, 
age has a negligible effect on the odds of survival, with the regression suggesting only a 
minor change of -0.47% per additional year. Thus, gender is the more impactful factor affecting
survival rates among Titanic passengers.
</dvoutput>

===============================================================================================
You: (enter two blank lines to submit)

Deployment

Make sure you have set up the necessary secrets in your environment, as described above. Set up the .env file so that underlying docker containers will have those secrets.

The modal app name is figured as f"dv-core.{ENV}". Here is the modal.com dashboard with all existing environments. The dv-core.rc environment is automatically updated with the code in the main branch via a github action, so long as they pass tests. Other environments must be updated manually. For example, to deploy to my own personal development deployment called dv-core.jasond

ENV=jasond make deploy

The prod modal environment dv-core.prod is the last stable version of this codebase and is depended upon by the asta.allen.ai asta environment for code stability. When it is time to release this codebase's main to the prod modal environment, run whatever automated and manual tests needed to gain confidence that the current version of code is good before deploying it. To deploy to the prod modal environment you can run the deploy command above locally (changing the ENV value), or you can use the deploy-prod github workflow (under the actions tab). The latter is preferred. Please note: your changes will not be reflected in the asta.allen.ai environment, unless your changes are in the prod modal environment.

Logs

Logs from the main DV process in modal are forwarded to GCP to the ai2-reviz project. They can be correlated with other activity there. See https://github.com/allenai/nora/wiki/Log-Filtering-and-Navigation

Remote File Sharing

Remote executions (Docker or Modal) use a shared FileShareSpec abstraction to declare which paths should exist inside the runtime. See docs/remote_file_sharing.md for the full guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dv_core_asta-0.1.0.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dv_core_asta-0.1.0-py3-none-any.whl (128.0 kB view details)

Uploaded Python 3

File details

Details for the file dv_core_asta-0.1.0.tar.gz.

File metadata

  • Download URL: dv_core_asta-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dv_core_asta-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5339a9604241d1de6875fd127c2a9defaff9bff3e98a4eb7bc554b9cf4bebe58
MD5 c1b55b83c34e5937d6acfcd77a8258d7
BLAKE2b-256 4a38a54bba7d15b1d939649c36c50578aaa459d980ea0c98bf25b19489d5f660

See more details on using hashes here.

Provenance

The following attestation bundles were made for dv_core_asta-0.1.0.tar.gz:

Publisher: publish-to-pypi.yml on allenai/dv-core-asta-integration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dv_core_asta-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dv_core_asta-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 128.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dv_core_asta-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80091a2588debc8165e65bd10e9b0209e79dae845cd193e998f302ecd06f0979
MD5 c00e7afa15c27cff9a2e2c6a28061a7e
BLAKE2b-256 76495037bbe4c82711c6c01be7caa1620702a8d8c3a29af40b5488709d965df5

See more details on using hashes here.

Provenance

The following attestation bundles were made for dv_core_asta-0.1.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on allenai/dv-core-asta-integration

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page