Skip to main content

th2_grpc_crawler_data_processor

Project description

th2 gRPC crawler data processor library (0.2.0)

This project contains the gRPC interface to implement if you want to create your own crawler data processor.

The crawler data processor work with the crawler

How to transform template

  1. Create a directory with the same name as project name (use underscores instead of dashes) under src/main/proto directory (remove other files and directories if they exist).
  2. Place your custom .proto files in created directory. Pay attention to package specifier and import statements.
  3. Edit release_version and vcs_url properties in gradle.properties file.
  4. Edit rootProject.name variable in settings.gradle file. This will be the name of Java package.
  5. Edit package_info.json file in order to specify name and version for Python package (create file if it's absent).
  6. Edit parameters of setup.py in setup function invocation such as: author, author_email, url. Do not edit the others.
  7. Edit README.md file according to the new project.

Note that the name of created directory under src/main/proto directory is used in Python (it's a package name).

How to maintain project

  1. Make your changes.
  2. Up version of Java package in gradle.properties file.
  3. Up version of Python package in package_info.json file.
  4. Commit everything.

How to run project

Java

If you wish to manually create and publish package for Java, run these command:

gradle --no-daemon clean build publish artifactoryPublish \
       -Pbintray_user=${BINTRAY_USER} \
       -Pbintray_key=${BINTRAY_KEY}

BINTRAY_USER and BINTRAY_KEY are parameters for publishing.

Python

If you wish to manually create and publish package for Python:

  1. Generate services by gradle:
       gradle --no-daemon clean generateProto
    
    You can find the generated files by following path: src/gen/main/services/python
  2. Generate code from .proto files and publish everything:
    pip install -r requirements.txt
    python setup.py generate
    python setup.py sdist
    twine upload --repository-url ${PYPI_REPOSITORY_URL} --username ${PYPI_USER} --password ${PYPI_PASSWORD} dist/*
    
    PYPI_REPOSITORY_URL, PYPI_USER and PYPI_PASSWORD are parameters for publishing.

Changes:

v0.2.0 (Breaking changes)

Braking:

  • Use list of MessageID instead of mapping between session and MessageID. User now will have to specify MessageID for both directions in the response if he or she needs to set a checkpoint. The list should contain a single MessageID for each pair alias + direction. If more than one is found the last one (according to their sequences) will be taken.
  • The rpc methods was renamed according to the Protobuf naming convention (uses PascalCase).
  • The event and message IDs from the response to connect method are removed because this functionality requires additional improvements on Crawler's side.

Added:

  • New method that will be invoked by the crawler each time the new interval is started.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

File details

Details for the file th2_grpc_crawler_data_processor-0.2.0.dev2058388652.tar.gz.

File metadata

  • Download URL: th2_grpc_crawler_data_processor-0.2.0.dev2058388652.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.12

File hashes

Hashes for th2_grpc_crawler_data_processor-0.2.0.dev2058388652.tar.gz
Algorithm Hash digest
SHA256 cd0d3661abada66add1a4ea3e7345904be40c19831ed83e0fa877beafd6d7dc4
MD5 432491a4480d0b835358264f6b55bb65
BLAKE2b-256 65b4fd88ffeba329ba368bb0161e23053d3eb517e9d1fbc4cceeceebd4ec581a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page