leek · PyPI

Distributed Task framework, fast than any other taskqueue

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries

Project description

Distributed Task framework, fast than any other taskqueue

Functional description

A distributed crawler framework that is more flexible than scrapy and easier to use than Celery. Do the most with the least code and the simplest way

Can skillfully use the framework to crawl data within 1 minute, without learning complex documents. Easily expand various middleware
Feature description:

Support middleware:
   Support reids kafka sqlite memory  four kinds of middleware (redis is the first choice, support batch publishing tasks, distributed consumption is lightning fast)

Concurrent support:
   Support process threading gevent three concurrent consumption modes (can be mixed)

Frequency control and current limit:
   Precisely control how many functions are run in 1 second

Task deduplication:
   If you repeatedly push the successfully consumed task, the task will be automatically filtered out

Consumption confirmation:
  When consumption confirmation is enabled, and the consumption task is stopped manually, the task will not be lost

Number of retries:
   When the function fails, it will retry the specified number of times immediately, and the consumption will be confirmed when the maximum number of retries is reached

Task visualization:
   You can view the current task consumption in real time through the redis web version management tool

version description

Supported version: Python 3.0+

Pip installation

pip install leek

DEMO description

0. Release tasks and consumer tasks (decorator version)

from leek import task_deco

@task_deco('test0') # Added task queue decorator on consumption function
def f0(a, b):
    print(f"t_demo0,a:{a},b:{b}")

# Post task
for i in range(1, 51):
    f0.pub(i, i)

# Consumer task
f0.start()

1. Publish consumer tasks (use extra parameters)

from leek import task_deco

@task_deco('test1', qps=30, threads_num=30, max_retry_times=3, ack=True)
def f1(a, b):
    print(f"t_demo1,a:{a},b:{b}")

# Post task
for i in range(1, 31):
    f1.pub(i, i + 1) # or f1.publish_redispy(i,i+1)

# Consumer task
f1.start()

2. Publish consumer tasks (non-decorator version)

from leek import RedisPublish, RedisCustomer


for zz in range(1, 501):
    param = {"a": zz, "b": zz, "c": zz}
    RedisPublish(queue_name='test2').publish_redispy(param)


def print_msg_dict(a, b, c):
    print(f"msg_dict:{a},{b},{c}")


# Consumption of multi-parameter type tasks queue_name Consumption queue name qps The number of consumption tasks per second (there is no limit by default)
RedisCustomer(queue_name='test2', consuming_function=print_msg_dict,
              qps=50).start_consuming_message()

3. Batch submit task consumption(consumption using coroutine)

from gevent import monkey
monkey.patch_all()
from leek import task_deco

# #### 3. Submit tasks in batches
result = [{'a': i,'b': i,'c': i} for i in range(1, 51)]

# customer_type Consumer type (default thread), max_push_size the number of records submitted in batches each time (default value 50)
# If you use gevent, please add at the beginning of the code: from gevent import monkey monkey.patch_all()
@task_deco('test3', qps=50, customer_type='gevent', max_push_size=100) # Added task queue decorator on the consumption function
def f3(a, b, c):
    print(f"t_demo3:{a},{b},{c}")

# Post task
f3.pub_list(result)

# Consumer task
f3.start()

4. Switch task queue middleware to sqlite (default is redis)

from leek import task_deco, MiddlewareEum

@task_deco('test4', middleware=MiddlewareEum.SQLITE, qps=10)
def f4(a, b, c):
    print(f"t_demo4:{a},{b},{c}")

for zz in range(1, 51):
    f4.pub(zz, zz, zz)

f4.start()

Reids install

reids install

reids docker install

docker run  -d -p 6379:6379 redis

redis web manage tool redisweb avatar

Usage scenarios and features

1. Highly concurrent distributed crawler (verified by tens of millions of online data crawling verification)

2. Distributed data cleaning (automatic deduplication of cleaning, support to continue cleaning after interruption at any time)

3. Short video processing (video download and upload, bandwidth is sufficient without waiting)

4. Asynchronous real-time online query interface (speed reaches millisecond level)

5. Other usage scenarios are being expanded

Release Notes

2020-06-11 Version 4.1.5 Added support for gevent coroutine consumption parameter customer_type='gevent'

2020-05-20 Added consumption function timeout time parameter

2020-05-10 Added sqlite middleware support

2020-04-13 The consumption function adds automatic control of the number of threads

2020-04-10 New frequency limiting parameter for consumption function

2020-01-08 Consumer function supports multiple parameter types

2019-12-06 Simplified multi-threaded consumer queue class

2019-10-14 Added consumption function error retry mechanism, retry by default 3 times

2019-10-12 Task deduplication ignores parameter order

2019-09-27 Fix the bug of submitting list task

2019-05-25 Added dynamic parameter transfer when adding tasks

2019-04-06 Added automatic deduplication function for crawling tasks

2019-03-23 Added single-thread asynchronous batch submission function

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

1.7.1

Aug 2, 2022

1.7.0

Aug 2, 2022

1.6.9

Jul 9, 2022

1.6.8

Jul 7, 2022

1.6.7

Apr 11, 2022

1.6.6

Dec 24, 2021

1.6.5

Dec 23, 2021

1.6.4

Dec 23, 2021

1.6.3

Dec 23, 2021

1.6.2

Dec 16, 2021

1.6.1

Dec 9, 2021

1.6.0

Sep 29, 2021

1.5.8

Sep 27, 2021

1.5.6

Sep 27, 2021

1.5.5

Sep 24, 2021

1.5.4

Sep 24, 2021

1.5.3

Sep 23, 2021

1.5.2

Aug 18, 2021

1.5.1

Jul 16, 2021

1.5.0

Jul 8, 2021

1.4.9

Jul 8, 2021

1.4.8

Jul 5, 2021

1.4.7

Jun 28, 2021

1.4.6

Jun 22, 2021

1.4.5

Jun 19, 2021

1.4.4

Jun 18, 2021

1.4.3

Jun 18, 2021

1.4.2

Jun 16, 2021

1.4.1

Jun 8, 2021

1.4.0

Jun 3, 2021

1.3.9

Jun 1, 2021

1.3.8

May 19, 2021

1.3.7

May 19, 2021

1.3.6

May 19, 2021

1.3.5

May 19, 2021

1.3.4

May 18, 2021

1.3.3

May 18, 2021

1.3.2

May 13, 2021

1.3.1

May 13, 2021

1.3.0

May 9, 2021

1.2.9

May 7, 2021

1.2.8

May 7, 2021

1.2.6

May 7, 2021

1.2.4

May 7, 2021

1.2.3

May 7, 2021

1.2.2

May 5, 2021

1.2.1

May 5, 2021

1.2.0

May 5, 2021

1.1.9

May 3, 2021

1.1.8

Apr 26, 2021

1.1.7

Apr 25, 2021

1.1.6

Apr 25, 2021

1.1.5

Apr 25, 2021

1.1.4

Apr 24, 2021

1.1.3

Apr 24, 2021

1.1.2

Apr 24, 2021

1.1.1

Apr 24, 2021

This version

1.0.3

Aug 10, 2020

1.0.2

Aug 10, 2020

1.0.1

Aug 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leek-1.0.3.tar.gz (5.7 kB view hashes)

Uploaded Aug 10, 2020 Source

Built Distribution

leek-1.0.3-py3-none-any.whl (4.4 kB view hashes)

Uploaded Aug 10, 2020 Python 3

Hashes for leek-1.0.3.tar.gz

Hashes for leek-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`337776e4f728c3a39b2b7581bb67ac3569a409847cd38d1fd615025f0ff41afe`
MD5	`ef1683922a531e69aa5b41ceae943595`
BLAKE2b-256	`8116966bb5e3b9921715e990997bc063dd39328efa023389c78271446540b635`

Hashes for leek-1.0.3-py3-none-any.whl

Hashes for leek-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b8f1629db3ad0968038bdfe141850011a08639bdd95bfb64f4c9dad4ae78ecd`
MD5	`828f6a9c80e9b9e18ee25e460abe0616`
BLAKE2b-256	`ee27ee7ef8afa6c31434e363654af8f1de15cbc473f052384da745b4223543a4`