Skip to main content

Python interface for World of Code

Project description

python-woc

Python package GitHub Actions Workflow Status GitHub commit activity GitHub contributors

python-woc is the python interface to the World of Code (WoC) data. It precedes the oscar.py project and is hundreds of times faster than the invoking lookup scripts via subprocess.

What mappings and objects are supported?

Note that python-woc does not support all data types in WoC. It has built-in readers for:

  • Tokyo Cabinet hash databases (.tch files)
  • Stacked Binary files (.bin files)

Gzipped files (.s/.gz, e.g. PYthruMaps/c2bPtaPkgOPY.0.gz) are not supported yet, because currently it makes no sense to manipulate them natively in Python. Instead, you should refer to WoC tutorial and decompress them into a pipe, and deal them with command line utilities.

Mappings below are supported by both woc.get_values and woc.objects:

['A2P', 'A2a', 'A2b', 'A2c', 'A2f', 'A2fb', 'P2A', 'P2a', 'P2c', 'P2p', 'a2A', 'a2P', 'a2b', 'a2c', 'a2f', 'a2p', 'b2P', 'b2c', 'b2f', 'b2fa', 'b2tac', 'bb2cf', 'c2P', 'c2b', 'c2cc', 'c2dat', 'c2f', 'c2fbb', 'c2h', 'c2p', 'c2pc', 'c2r', 'c2ta', 'f2a', 'f2b', 'f2c', 'obb2cf', 'p2P', 'p2a', 'p2c']

And objects:

['commit', 'tree', 'blob', 'tag']

If you are still unsure what characters in the mappings mean, checkout the WoC Tutorial.

Requirements

  • Linux with a GNU toolchain (only tested on x86_64, Ubuntu / CentOS)

  • Python 3.8 or later

Install python-woc

From PyPI

The latest version of python-woc is available on PyPI and can be installed using pip:

pip3 install python-woc

From Source

You can also install python-woc from source. First, clone the repository:

git clone https://github.com/ssc-oscar/python-woc.git
cd python-woc

We use poetry as the package manager. Setting up using poetry:

python3 -m pip install poetry
python3 -m poetry install

[!TIP] On some UTK servers, installing poetry yields the following error: urllib3 v2 only supports OpenSSL 1.1.1+. A workaround is to run python3 -m pip install 'urllib3<2.0' before installing poetry.

Generate Profiles

[!NOTE] If you are on UTK/PKU WoC servers, you can skip this step. Profiles are already generated and available at /home/wocprofile.json or /etc/wocprofile.json.

One of the major improvents packed in python-woc is profile. Profiles tell the driver what versions of what maps are available, decoupling the driver from the folder structure of the data. It grants the driver the ability to work with multiple versions of WoC, on a different machine, or even on the cloud.

Profiles are generated using the woc.detect script. The script takes a list of directories, scans for matched filenames, and generates a profile:

python3 woc.detect /path/to/woc/1 /path/to/woc/2 ... > wocprofile.json

By default, python-woc looks for wocprofile.json, ~/.wocprofile.json, /home/wocprofile.json and /etc/wocprofile.json for the profile.

Use CLI

python-woc's CLI is a drop-in replacement for the getValues and showCnt perl scripts. We expect existing scripts to be work just well with the following:

alias getValues='python3 -m woc.get_values'
alias showCnt='python3 -m woc.show_content'

The usage is the same as the original scripts, and the output should be identical:

# echo some_key | echo python3 -m woc.get_values some_map
> echo e4af89166a17785c1d741b8b1d5775f3223f510f | showCnt commit 3
tree f1b66dcca490b5c4455af319bc961a34f69c72c2
parent c19ff598808b181f1ab2383ff0214520cb3ec659
author Audris Mockus <audris@utk.edu> 1410029988 -0400
committer Audris Mockus <audris@utk.edu> 1410029988 -0400

News for Sep 5

You may find more examples in the lookup repository. If you find any incompatibilities, please submit an issue report.

Use Python API

The python API is designed to get rid of the overhead of invoking the perl scripts via subprocess. It is also more native to python and provides a more intuitive interface.

With a wocprofile.json, you can create a WocMapsLocal object and access the maps in the file system:

>>> from woc.local import WocMapsLocal
>>> woc = WocMapsLocal()  # or use only the version R: woc = WocMapsLocal(version="R")
>>> woc.maps
{'p2c', 'a2b', 'c2ta', 'a2c', 'c2h', 'b2tac', 'a2p', 'a2f', 'c2pc', 'c2dat', 'b2c', 'P2p', 'P2c', 'c2b', 'f2b', 'b2f', 'c2p', 'P2A', 'b2fa', 'c2f', 'p2P', 'f2a', 'p2a', 'c2cc', 'f2c', 'c2r', 'b2P'}

To query the maps, you can use the get_values method:

>>> woc.get_values("b2fa", "05fe634ca4c8386349ac519f899145c75fff4169")
('1410029988', 'Audris Mockus <audris@utk.edu>', 'e4af89166a17785c1d741b8b1d5775f3223f510f')
>>> woc.get_values("c2b", "e4af89166a17785c1d741b8b1d5775f3223f510f")
['05fe634ca4c8386349ac519f899145c75fff4169']
>>> woc.get_values("b2tac", "05fe634ca4c8386349ac519f899145c75fff4169")
[('1410029988', 'Audris Mockus <audris@utk.edu>', 'e4af89166a17785c1d741b8b1d5775f3223f510f')]

Use show_content to get the content of a blob, a commit, or a tree:

>>> woc.show_content("tree", "f1b66dcca490b5c4455af319bc961a34f69c72c2")
[('100644', 'README.md', '05fe634ca4c8386349ac519f899145c75fff4169'), ('100644', 'course.pdf', 'dfcd0359bfb5140b096f69d5fad3c7066f101389')]
>>> woc.show_content("commit", "e4af89166a17785c1d741b8b1d5775f3223f510f")
('f1b66dcca490b5c4455af319bc961a34f69c72c2', ('c19ff598808b181f1ab2383ff0214520cb3ec659',), ('Audris Mockus <audris@utk.edu>', '1410029988', '-0400'), ('Audris Mockus <audris@utk.edu>', '1410029988', '-0400'), 'News for Sep 5')
>>> woc.show_content("blob", "05fe634ca4c8386349ac519f899145c75fff4169")
'# Syllabus for "Fundamentals of Digital Archeology"\n\n## News\n\n* ...'

Note that the function yields different types for different maps. Please refer to the documentation for details.

Sometimes you may want to know the exact size of WoC, doing so is easy and quick with count:

>>> woc.count("blob")  # count the number of blobs
17334020520
>>> woc.count("A2P")  # count the number of unique authors
44613280

👉🏻 More examples can be found in the guide.

Use Python Objects API

The objects API provides a more intuitive way to access the WoC data. Note that the objects API is not a replacement to oscar.py even looks pretty much like the same: many of the methods have their signatures changed and refactored to be more consistent, intuitive and performant. Query results are cached, so you can access the same object multiple times without additional overhead.

Call init_woc_objects to initialize the objects API with a WoC instance:

from woc.local import WocMapsLocal
from woc.objects import init_woc_objects
woc = WocMapsLocal()
init_woc_objects(woc)

To get the tree of a commit:

from woc.objects import Commit
>>> c1 = Commit("91f4da4c173e41ffbf0d9ecbe2f07f3a3296933c")
>>> c1.tree
Tree(836f04d5b374033b1608269e2f3aaabae263a0db)
>>> c1.projects[0].url
'https://github.com/woc-hack/thebridge'

For more, check woc.objects in the documentation.

Remote Access

>>> from woc.remote import WocMapsRemote
>>> woc = WocMapsRemote(base_url="https://woc.osslab-pku.org/api", api_key="woc-api-key")
>>> woc.get_values("b2fa", "05fe634ca4c8386349ac519f899145c75fff4169")
('1410029988', 'Audris Mockus <audris@utk.edu>', 'e4af89166a17785c1d741b8b1d5775f3223f510f')

👉🏻 Read the guide for more details.

Contributing

We welcome awesome contributions from the community. If you are motivated to add new features or fix bugs, please refer to the contributing guide.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_woc-0.4.0.tar.gz (484.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

python_woc-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

python_woc-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

python_woc-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

python_woc-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

python_woc-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file python_woc-0.4.0.tar.gz.

File metadata

  • Download URL: python_woc-0.4.0.tar.gz
  • Upload date:
  • Size: 484.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for python_woc-0.4.0.tar.gz
Algorithm Hash digest
SHA256 5f2e2980f0b4471232a3e6629a48711a3043bb980f84017dd999a5973ae9994f
MD5 3ce4163364ed2aaafbabde44fe3d94c8
BLAKE2b-256 acf6b892db60168665552c60e85490fb1e534f0b42b6bcda2fa49cd04587ece6

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_woc-0.4.0.tar.gz:

Publisher: release.yml on ssc-oscar/python-woc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_woc-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for python_woc-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 11a576d74567dcff6dc0f266332b0fd9d18d58164e6941f01a1da4b49d7f4bc5
MD5 faf4e1fa952b0b91ddb779c772e73d53
BLAKE2b-256 cfc1f8b1916d65c66c274e4f0ba4f04e806af583f011bab7174707a396e32b11

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_woc-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ssc-oscar/python-woc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_woc-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for python_woc-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 86b145c6239fe295bd4ced3b50a551f9c7e1878d3a2eb87afc44f06283d83c26
MD5 ff5be46d3dd322676168fa976c87b0bb
BLAKE2b-256 699f6d063d2cfe29cfddab58c8ece3645687e911381aa8d9d74fb7d7f7792b50

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_woc-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ssc-oscar/python-woc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_woc-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for python_woc-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 80f843595b882435d45993be4758c66f01ee567d301057bfdb9b5439e7f3117b
MD5 36dcb22e83a6d371f51a2fe87373cf93
BLAKE2b-256 3cc054d65a1e9dd04ba9b0c058d36354ac9172cfded5358597cc47a1513de9f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_woc-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ssc-oscar/python-woc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_woc-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for python_woc-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 15c52fa85faae12081899c356ab138028a79a5e0ecdfe86ed6f88d6d8f8e9bb7
MD5 db1ef10b25aecb446ed418acffe2f3bd
BLAKE2b-256 42456c95854c54c7ac9542ed7c4f2a2d78d46771fd565625db44841a23d15c2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_woc-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ssc-oscar/python-woc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file python_woc-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for python_woc-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 85c935a7dc2781e4f853fb247c4033169143163841009031b9023d89a0908fef
MD5 90cd01d9a614b0728c29362443688b18
BLAKE2b-256 7ada2808738f99722e1920e0fc5e7e41f51e60393e7d2468996619b3469ac51a

See more details on using hashes here.

Provenance

The following attestation bundles were made for python_woc-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ssc-oscar/python-woc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page