Skip to main content

Pathlib functionality for pandas.

Project description

pandas_path - Path style access for pandas

PyPI

Love pathlib.Path*? Love pandas? Wish it were easy to use pathlib methods on pandas Series?

This package is for you. Just one import adds a .path accessor to any pandas Series or Index so that you can use all of the methods on a Path object.

* If not, you should.

Here's an example:

from pathlib import Path
import pandas as pd

# This is the only line you need to register `.path` as an accessor
# on any Series or Index in pandas.
import pandas_path

# we'll make an example series from the py files in this repo;
# note that every element here is just a string--no need to make Path objects yourself
file_paths = pd.Series(str(s) for s in Path().glob('**/*.py'))

# 0                   setup.py
# 1    pandas_path/accessor.py
# 2        pandas_path/test.py
# dtype: object

Use the .path accessor to get just the filename rather than the full path:

file_paths.path.name

# 0       setup.py
# 1    accessor.py
# 2        test.py
# dtype: object

Use the .path accessor to get just the parent folder of each file:

file_paths.path.parent

# 0              .
# 1    pandas_path
# 2    pandas_path
# dtype: object

Use calculated methods like exists to filter for what exists on the filesystem:

file_paths.loc[3] = 'fake_file.txt'

# 0                   setup.py
# 1    pandas_path/accessor.py
# 2        pandas_path/test.py
# 3              fake_file.txt
# dtype: object

file_paths.path.exists()

# 0     True
# 1     True
# 2     True
# 3    False
# dtype: bool

Use path methods like with_suffix to dynamically create new filenames:

file_paths.path.with_suffix('.png')

# 0                   setup.png
# 1    pandas_path/accessor.png
# 2        pandas_path/test.png
# 3               fake_file.png
# dtype: object

Use the / operators just as you would in pathlib (with the .path accessor on either side of the operator.)

"different_root_folder" / file_paths.path

# 0                   different_root_folder/setup.py
# 1    different_root_folder/pandas_path/accessor.py
# 2        different_root_folder/pandas_path/test.py
# dtype: object

We'll even do element wise operations with lists/arrays/series of the same length.

file_paths.path.parent.path / ["other_file1.txt", "other_file2.txt", "other_file3.txt"]

# 0                other_file1.txt
# 1    pandas_path/other_file2.txt
# 2    pandas_path/other_file3.txt
# dtype: object

Limitations

  1. While most operations work out of the box, operator chaining with / will not work as expected since we always return the series itself, not the accessor.
file_paths.path.parent.path / "subfolder" / "other_file1.txt"

# ----> 1 file_paths.path.parent.path / "subfolder" / "other_file1.txt"
# ...
# TypeError: unsupported operand type(s) for /: 'str' and 'str'

Instead, either use the .path accessor on the result or re-write without chaining:

(file_paths.path.parent.path / "subfolder").path / "other_file1.txt"

# 0                subfolder/other_file1.txt
# 1    pandas_path/subfolder/other_file1.txt
# 2    pandas_path/subfolder/other_file1.txt
# dtype: object

file_paths.path.parent.path / "subfolder/other_file1.txt"

# 0                subfolder/other_file1.txt
# 1    pandas_path/subfolder/other_file1.txt
# 2    pandas_path/subfolder/other_file1.txt
# dtype: object
  1. A numpy array or pandas series on the left hand side of / will not work properly.
pd.Series(['a', 'b', 'c']) / pd.Series(['1', '2', '3']).path

## IMPROPERLY BROADCASTS :'(

# 0    0    a/1
# 1    a/2
# 2    a/3
# dtype: object
# 1    0    b/1
# 1    b/2
# 2    b/3
# dtype: object
# 2    0    c/1
# 1    c/2
# 2    c/3
# dtype: object
# dtype: object

Instead, use the path accessor on the right-hand side as well.

pd.Series(['a', 'b', 'c']).path / pd.Series(['1', '2', '3']).path

# 0    a/1
# 1    b/2
# 2    c/3
# dtype: object

That's all folks, enjoy!

Developed and maintained by your friends at DrivenData! ml competitions | ai consulting

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_path-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

File details

Details for the file pandas_path-0.1.0.tar.gz.

File metadata

  • Download URL: pandas_path-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for pandas_path-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b3f3e36163c157c71b68bf35a6a3ea6322b973664a34866e30794e9e1f8becef
MD5 8129f2c9ec440a720817e5f428b8dab2
BLAKE2b-256 1e05e4c3349fe3d9f21bbacb082c6ec0d9efbd3a0ab1e597fc6b5ea878e46066

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page