Skip to main content

Python library and webapp for searching standard industry and product classifiers

Project description

Perdu

Python library and webapp for matching against standard industry and product classifiers. Comes with NAICS, GS1, and USEEIO built-in.

Build Status Build status Coverage Status

Installation

Install using pip or conda:

conda -c conda-forge -c cmutel perdu

-or-

pip install perdu

Depends on:

  • appdirs
  • docopt
  • flask
  • peewee
  • rdflib
  • rdflib-jsonld
  • whoosh

Usage

As a webapp:

conda_webapp

As a library:

import perdu
perdu.search_useeio("plastic toy")

Search basics

Perdu uses whoosh as the search engine. When you first import it, Perdu will import the three built-in catalogues in around one minute.

Built-in catalogues

Uploading data

Currently, the only possibility to upload data to the web interface is via CSV, with the first column being the item name or title, and the second (optional) column being the item description. See perdu.test.fixtures for examples.

Adding other catalogues

See the files in perdu.extraction for examples on how to extract data from PDFs (NAICS), XML (GS1), and JSON (USEEIO). Each search catalogue will have its own schema, but Perdu expects these schemas to have at least the columns name, description, and code (see examples in perdu.searching). New catalogues will need to have suitable functions provided in perdu.webapp.search_mapping.

Advanced searching

In addition to the default search method used in the web interface, Perdu also offers search corrections (search_corrector_gs1, search_corrector_naics, and search_corrector_useeio) and disjunction maximization (search_gs1_disjoint, search_useeio_disjoint, and search_naics_disjoint).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perdu-0.1.1.tar.gz (623.2 kB view details)

Uploaded Source

Built Distribution

perdu-0.1.1-py3-none-any.whl (623.1 kB view details)

Uploaded Python 3

File details

Details for the file perdu-0.1.1.tar.gz.

File metadata

  • Download URL: perdu-0.1.1.tar.gz
  • Upload date:
  • Size: 623.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for perdu-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2a53a2dae0fa76f3750e4a0a418674f01f421b5481be2c66b6d5a4216843a507
MD5 2f93a4d3770605044fa1c53a4e034e75
BLAKE2b-256 d787785efd36e45bee89d5fc25a7c3413d034804135479f8246b44788f5ed53f

See more details on using hashes here.

File details

Details for the file perdu-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: perdu-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 623.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for perdu-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b73939e1f3797b4d2b07e4eb7aadfe91d356d8e9a02bdb8b93326c08b69a069
MD5 112fdc70e5d5e57e5b2765fa3b4ec800
BLAKE2b-256 35239a15aed9c449b9c579f63f355fd925a2c99d82e3eb91cceddc2aee8c2b8a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page