Skip to main content

Project-oriented workflow in Python

Project description

Project-oriented workflow in Python

Finding project directories in Python (data science) projects.

This library aims to provide both the programmatic functionality from the R rprojroot package and the interactive functionality from the R here package.

Motivation

Problem: I have a project that has a specific folder structure, for example, one mentioned in Noble 2009 or something similar to this project template, and I want to be able to:

  1. Run my python scripts without having to specify a series of ../ to get to the data folder.
  2. cd into the directory of my python script instead of calling it from the root project directory and specify all the folders to the script.
  3. Reference datasets from a root directory when using a jupyter notebook because everytime I use a jupyter notebook, the working directory changes to the location of the notebook, not where I launched the notebook server.

Solution: pyprojroot finds the root working directory for your project as a pathlib.Path object. You can now use the here function to pass in a relative path from the project root directory (no matter what working directory you are in the project), and you will get a full path to the specified file. That is, in a jupyter notebook, you can write something like pandas.read_csv(here('data/my_data.csv')) instead of pandas.read_csv('../data/my_data.csv'). This allows you to restructure the files in your project without having to worry about changing file paths.

Great for reading and writing datasets!

Further reading:

Installation

pip

python -m pip install pyprojroot

conda

https://anaconda.org/conda-forge/pyprojroot

conda install -c conda-forge pyprojroot

Example Usage

Interactive

This is based on the R here library.

from pyprojroot.here import here

here()

Programmatic

This based on the R rprojroot library.

import pyprojroot

base_path = pyprojroot.find_root(pyprojroot.has_dir(".git"))

Demonstration

Load the packages

In [1]: from pyprojroot.here import here
In [2]: import pandas as pd

The current working directory is the "notebooks" folder

In [3]: !pwd
/home/dchen/git/hub/scipy-2019-pandas/notebooks

In the notebooks folder, I have all my notebooks

In [4]: !ls
01-intro.ipynb  02-tidy.ipynb  03-apply.ipynb  04-plots.ipynb  05-model.ipynb  Untitled.ipynb

If I wanted to access data in my notebooks I'd have to use ../data

In [5]: !ls ../data
billboard.csv  country_timeseries.csv  gapminder.tsv  pew.csv  table1.csv  table2.csv  table3.csv  table4a.csv  table4b.csv  weather.csv

However, with there here function, I can access my data all from the project root. This means if I move the notebook to another folder or subfolder I don't have to change the path to my data. Only if I move the data to another folder would I need to change the path in my notebook (or script)

In [6]: pd.read_csv(here('data/gapminder.tsv'), sep='\t').head()
Out[6]:
       country continent  year  lifeExp       pop   gdpPercap
0  Afghanistan      Asia  1952   28.801   8425333  779.445314
1  Afghanistan      Asia  1957   30.332   9240934  820.853030
2  Afghanistan      Asia  1962   31.997  10267083  853.100710
3  Afghanistan      Asia  1967   34.020  11537966  836.197138
4  Afghanistan      Asia  1972   36.088  13079460  739.981106

By the way, you get a pathlib.Path object path back!

In [7]: here('data/gapminder.tsv')
Out[7]: PosixPath('/home/dchen/git/hub/scipy-2019-pandas/data/gapminder.tsv')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprojroot-0.3.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

pyprojroot-0.3.0-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file pyprojroot-0.3.0.tar.gz.

File metadata

  • Download URL: pyprojroot-0.3.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for pyprojroot-0.3.0.tar.gz
Algorithm Hash digest
SHA256 109705bb790968704958efcfc5ccce85d8e3dafa054897cc81371fcbbf56cb10
MD5 b5d96d9c45f3a898774146e0f90bc2e7
BLAKE2b-256 ec7fd04044efe4acc4185db1174209fadac33cc21c015ed0d6bef8884c9fa808

See more details on using hashes here.

File details

Details for the file pyprojroot-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: pyprojroot-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for pyprojroot-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c426b51b17ab4f4d4f95b479cf5b6c22df59bb58fbd4f01b37a6977d29b99888
MD5 030263ea91c66b06d5bb29cbdc54c545
BLAKE2b-256 539beef01392be945c0fe86a8d084ba9188b1e2b22af037d7109b9f40a962cd0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page