Skip to main content

patent-parsing-tools is a library providing tools for generating training and test set from Google's USPTO data helpful with for testing machine learning algorithms

Project description

patent-parsing-tools

USPTO patents dataset generator.

Documentation Status patent-parsing-tools CI PyPI version PyPI - Python Version

Documentation

Read the docs

System requirements

sudo yum install python-devel libxslt-devel libxml2-devel

Installation:

pip install patent-parsing-tools

Examples:

Downloading dataset:

python -m patent_parsing_tools.downloader \
  --directory dataset \
  --year-from 2010 \
  --year-to 2010

Collecting and serializing data:

python -m patent_parsing_tools.supervisor \
  --working-directory patents/working_directory \
  --train-destination patents/train_destination \
  --test-destination patents/test_destination \
  --year-from 2014 \
  --year-to 2015

Generating dictionary with train set:

python -m patent_parsing_tools.bow.dictionary_maker \
  --train-directory patents/train_destination \
  --max-patents 1000000000 \
  --dictionary dictionary.txt \
  --dict-max-size 4096

Generate bag of words with train set and test set:

python -m patent_parsing_tools.bow.bag_of_words \
  --serialized-patents patents/train_destination \
  --destination-directory patents/final_dataset_train \
  --dictionary dictionary.txt \
  --batch-size 1048576
python -m patent_parsing_tools.bow.bag_of_words \
  --serialized-patents patents/test_destination \
  --destination-directory patents/final_dataset_test \
  --dictionary dictionary.txt \
  --batch-size 1048576

Testing

pytest

Contributing and develpment

$ mkvirtualenv ppt
$ workon ppt
(ppt) $ pip install -r requirements.txt

Publish new release

$ git tag v1.0
$ git push origin v1.0

Building documentation

(ppt) $ sphinx-build -M html docs docs_build

References

Usage:

License

The MIT License (MIT). Copyright (c) 2014 Michał Dul, Piotr Przetacznik, Krzysztof Strojny. Check LICENSE files for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patent-parsing-tools-0.9.5.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

patent_parsing_tools-0.9.5-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file patent-parsing-tools-0.9.5.tar.gz.

File metadata

  • Download URL: patent-parsing-tools-0.9.5.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.8.18

File hashes

Hashes for patent-parsing-tools-0.9.5.tar.gz
Algorithm Hash digest
SHA256 8a4c2da98468fde1c87ca20d01cc1988b077e9a5493588b2e192f22e9c7883ef
MD5 4fe1f2bf42c6a2f3fb84b245ef36f67f
BLAKE2b-256 f0180b8a5cbd4e2fb669e2c34b0c10839bf089ded88e02ff94e4f80c1b749896

See more details on using hashes here.

File details

Details for the file patent_parsing_tools-0.9.5-py3-none-any.whl.

File metadata

File hashes

Hashes for patent_parsing_tools-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7bb52a2deaaaec6faa49ac3d78f59f959189d4e4215a15776e5cadbb40dd3802
MD5 07c93439d0b248945ae9db1e24e62b20
BLAKE2b-256 65f78b2f6a6f49f85107f3660af525809cb6714a8b61fb9673b09dfc8c0f8b40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page