Skip to main content

A CLI to map the flow of data in your project

Project description

Overview

getDataDeps is a script that maps data dependencies across R and python files in your project. The tool currently tracks data dependencies for several import/export commands in R, python, and Stata.

Getting Started

  1. Install with: pip install getDataDeps
  2. On the command line run: getDataDeps . or getDataDeps ./path/to/project

If successful, the script will return output to the terminal as well as two files located in dataDepsOutput within the project directory.

One of the files will be a png file that contains the graph of how your data flows through the project, such as the one below.

Example Graph

Helpful tips

  1. Limit your import / export commands to two lines
    • The script looks for the import/export commands and then looks a maximum of one line below it.
  2. Provide space between your import / export commands and code before or after.
    • For example, if you save your data and put a print statement on the next line the script will see the print line, identify the text in between the quotes and add it to the JSON object.
  3. Use the path in the import / export commands.
    • Example: readRDS("./path/to/data.rds") works but readRDS(variableWithPathToData) will not. The script relies on finding the quotes and then extracting what sits between them.

These tips are mostly due to the limitations of how getDataDeps works. Feedback here is greatly appreciated! If there is a specific way you structure your import / exports that isn't covered let me know.

How it works

The script will iterate through your entire project folder, extract files that end in “.R”, “.py", or ".do", and collect information on data imports and data exports. The JSON object will be saved in the ‘dataDepsOutput’ folder as ‘dataDeps.json’ and the graph as ‘dataDepsGraph.png.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

getDataDeps-0.0.22-py2.py3-none-any.whl (6.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file getDataDeps-0.0.22-py2.py3-none-any.whl.

File metadata

  • Download URL: getDataDeps-0.0.22-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for getDataDeps-0.0.22-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 492c5533c3d2e4a507fc3ef57d73b7f009750beff20e33b58c1a8cb8ed232810
MD5 5f849880646bedd47e729822679f6c19
BLAKE2b-256 93957c4e57db6e0428958938f60b6e2a862625497757a623ffcf1a32e5d74fad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page