patent-parsing-tools is a library providing tools for generating training and test set from Google's USPTO data helpful with for testing machine learning algorithms
Project description
[![Build Status](https://travis-ci.org/pprzetacznik/patent-parsing-tools.svg?branch=master)](https://travis-ci.org/pprzetacznik/patent-parsing-tools) [![Documentation Status](https://readthedocs.org/projects/patent-parsing-tools/badge/?version=latest)](https://patent-parsing-tools.readthedocs.io/en/latest/?badge=latest) ![patent-parsing-tools CI](https://github.com/pprzetacznik/patent-parsing-tools/workflows/patent-parsing-tools%20CI/badge.svg)
## Documentation
[Read the docs](https://patent-parsing-tools.readthedocs.io/en/latest/)
## System requirements:
`Bash sudo yum install python-devel libxslt-devel libxml2-devel `
## Python requirements:
`Bash pip install -r requirements.txt `
## Running:
Collecting and serializing data: `Bash python -m patent_parsing_tools.supervisor [working_directory] [train_destination] [test_destination] [year_from] [year_to] `
Eg. `Bash python -m patent_parsing_tools.supervisor patents/working_directory patents/train_destination patents/test_destination 2014 2015 `
Generating dictionary with train set: `Bash python -m patent_parsing_tools.bow.dictionary_maker [train_directory] [max_parsed_patents] [dict_max_size] [dictionary_name] `
Eg. `Bash python -m patent_parsing_tools.bow.dictionary_maker patents/train_destination 1000000000 4096 dictionary.txt `
Generate bag of words with train set and test set: `Bash python -m patent_parsing_tools.bow.bag_of_words [directory_with_serialized_patents] [destination_directory] [dictionary.txt] [package_size > 1024] `
Eg. `Bash python -m patent_parsing_tools.bow.bag_of_words patents/train_destination patents/final_dataset_train dictionary.txt 1048576 python -m patent_parsing_tools.bow.bag_of_words patents/test_destination patents/final_dataset_test dictionary.txt 1048576 `
## Running tests
`Bash python -m unittest discover . `
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for patent-parsing-tools-0.9.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14b0d6e629492beeac1ea03ac966a14adff2f2835d28c439c8e9bb88cec2544c |
|
MD5 | 4cc138f3ab7f891ae88c37d1d008cd48 |
|
BLAKE2b-256 | a2a33297a0836d3385545bd4728af1d496e1537c553505561699f153e54c8dca |
Hashes for patent_parsing_tools-0.9.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9aeb8166be8471a19914d39730761f9ef95758b7f6d07ba4748eb1009f43b8fc |
|
MD5 | b578ec4cd6508b883d070feec70bfceb |
|
BLAKE2b-256 | cb61cee95b06e9c00e4d77841e6fe9e63f6bb2822b14aca6791e975359304c56 |