th2_grpc_crawler_data_processor
Project description
th2 gRPC crawler data processor library (0.2.1)
This project contains the gRPC interface to implement if you want to create your own crawler data processor.
The crawler data processor work with the crawler
How to transform template
- Create a directory with the same name as project name (use underscores instead of dashes) under
src/main/proto
directory (remove other files and directories if they exist). - Place your custom
.proto
files in created directory. Pay attention topackage
specifier andimport
statements. - Edit
release_version
andvcs_url
properties ingradle.properties
file. - Edit
rootProject.name
variable insettings.gradle
file. This will be the name of Java package. - Edit
package_info.json
file in order to specify name and version for Python package (create file if it's absent). - Edit parameters of
setup.py
insetup
function invocation such as:author
,author_email
,url
. Do not edit the others. - Edit
README.md
file according to the new project.
Note that the name of created directory under src/main/proto
directory is used in Python (it's a package name).
How to maintain project
- Make your changes.
- Up version of Java package in
gradle.properties
file. - Up version of Python package in
package_info.json
file. - Commit everything.
How to run project
Java
If you wish to manually create and publish package for Java, run these command:
gradle --no-daemon clean build publish artifactoryPublish \
-Pbintray_user=${BINTRAY_USER} \
-Pbintray_key=${BINTRAY_KEY}
BINTRAY_USER
and BINTRAY_KEY
are parameters for publishing.
Python
If you wish to manually create and publish package for Python:
- Generate services by gradle:
You can find the generated files by following path:gradle --no-daemon clean generateProto
src/gen/main/services/python
- Generate code from
.proto
files and publish everything:pip install -r requirements.txt python setup.py generate python setup.py sdist twine upload --repository-url ${PYPI_REPOSITORY_URL} --username ${PYPI_USER} --password ${PYPI_PASSWORD} dist/*
PYPI_REPOSITORY_URL
,PYPI_USER
andPYPI_PASSWORD
are parameters for publishing.
Changes:
v0.2.1
maxMessageSize
in DataProcessorInfo allows telling Crawler send messages not greater thanmaxMessageSize
v0.2.0 (Breaking changes)
Braking:
- Use list of
MessageID
instead of mapping between session andMessageID
. User now will have to specifyMessageID
for both directions in the response if he or she needs to set a checkpoint. The list should contain a singleMessageID
for each pairalias + direction
. If more than one is found the last one (according to their sequences) will be taken. - The rpc methods was renamed according to the Protobuf naming convention (uses PascalCase).
- The event and message IDs from the response to connect method are removed because this functionality requires additional improvements on Crawler's side.
Added:
- New method that will be invoked by the crawler each time the new interval is started.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file th2_grpc_crawler_data_processor-0.2.0.dev1768927127.tar.gz
.
File metadata
- Download URL: th2_grpc_crawler_data_processor-0.2.0.dev1768927127.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26965da6c21842448d08be53f58d3b766e53b36b51dac7be2825c651139246d3 |
|
MD5 | b3f67f053b2eeb3d6fcf507401923a81 |
|
BLAKE2b-256 | af83ddfb1f180b7a2adca006b23b597272f86ca6acd7b3868ec4b007d07f5ffa |