Python binding for nativeextractor
Project description
NativeExtractor module for Python
This is official Python binding for the NativeExtractor project.
Installation
Requirements
- Python >=2.7 (>3 usage is highly recommended)
pip
build-essential
(gcc, make)libglib2.0
,libglib2.0-dev
,libpythonX-dev
We recommend to use virtual environments.
virtualenv myproject
source myproject/bin/activate
or
python -m venv myproject
source myproject/bin/activate
Instant PyPi solution
pip install pynativeextractor
Manual
-
Clone the repo
git clone --recurse-submodules https://github.com/SpongeData-cz/pynativeextractor.git
-
Install via
pip
orpip3
pip install -e ./pynativeextractor/
Typical usage
import os
from pynativeextractor.extractor import BufferStream, Extractor, DEFAULT_MINERS_PATH
# Construct new Extractor instance
ex = Extractor()
# Add fictional miner from web_entities.so with name match_url matching all URLs
ex.add_miner_so(os.path.join(DEFAULT_MINERS_PATH, 'web_entities.so'), 'match_url')
text = '{}'.format("https://spongedata.cz")
# Make from hw stream (you can also do the stream from files - use FileStream - mmap is used internally)
with BufferStream(text) as bf:
# Initialize occurrences list as empty list
occurrences = []
# Set the stream to the extractor
with ex.set_stream(bf):
# Mine all occurrences of URLs
while not ex.eof():
# Summarize occurrences
occurrences += ex.next()
print(occurrences) # Prints [{'label': 'URL', 'value': 'https://spongedata.cz', 'pos': 0, 'len': 13, 'prob': 1.0}]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pynativeextractor-1.0.15.tar.gz
(43.5 kB
view hashes)