145+ extra higher-level functional tools beyond standard and third-Party libraries.
Project description
Featured on GitHub's Trending Python repos on May 25, 2018. Thank you so much for support!
145+ extra higher-level functional tools that go beyond standard library's itertools
, functools
, etc. and popular third-party libraries like toolz
, funcy
, and more-itertools
.
-
Like
toolz
and others, most of the tools are designed to be efficient, pure, and lazy. Several useful yet non-functional tools are also included. -
While
toolz
and others target basic scenarios, this library targets more advanced and higher-level scenarios. -
A few useful CLI tools for respective functions are also installed. They are available as
extratools-[func]
.
Full documentation is available here.
Why this library?
Typical pseudocode has less than 20 lines, where each line is a higher-level description. However, when implementing, many lower-level details have to be filled in.
This library reduces the burden of writing and refining the lower-level details again and again, by including an extensive set of carefully designed general purpose higher-level tools.
Current status and future plans?
There are currently 140+ functions among 17 categories, 3 data structures, and 3 CLI tools.
- Currently adopted by TopSim and PrefixSpan-py.
This library is under active development, and new tools are added on weekly basis.
- Any idea or contribution is highly welcome.
Besides many other interesting ideas, I am planning to make the following updates in recent days/weeks/months.
-
Add
dicttools.unflatten
andjsontools.unflatten
. -
Add
trie
andsuffixtree
(according to generalized suffix tree). -
Update
seqtools.align
to support more than two sequences.
No plan to implement tools that are well covered by other popular libraries.
Which tools are available?
-
Function Categories:
debugtools
dicttools
gittools
graphtools
htmltools
jsontools
mathtools
misctools
printtools
rangetools
recttools
seqtools
settools
sortedtools
stattools
strtools
tabletools
-
Data Structures:
defaultlist
disjointsets
segmenttree
-
CLI Tools:
dicttools.remap
jsontools.flatten
stattools.teststats
Any example?
Here are ten examples out of our hundreds of tools.
jsontools.flatten(data, force=False)
flattens a JSON object by returning all the tuples, each with a path and the respective value.
import json
from extratools.jsontools import flatten
flatten(json.loads("""{
"name": "John",
"address": {
"streetAddress": "21 2nd Street",
"city": "New York"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
],
"children": [],
"spouse": null
}"""))
# {'name': 'John',
# 'address.streetAddress': '21 2nd Street',
# 'address.city': 'New York',
# 'phoneNumbers[0].type': 'home',
# 'phoneNumbers[0].number': '212 555-1234',
# 'phoneNumbers[1].type': 'office',
# 'phoneNumbers[1].number': '646 555-4567',
# 'children': [],
# 'spouse': None}
rangetools.gaps(covered, whole=(-inf, inf))
computes the uncovered ranges of the whole rangewhole
, given the covered rangescovered
.
from math import inf
from extratools.rangetools import gaps
list(gaps(
[(-inf, 0), (0.1, 0.2), (0.5, 0.7), (0.6, 0.9)],
(0, 1)
))
# [(0, 0.1), (0.2, 0.5), (0.9, 1)]
recttools.heatmap(rect, rows, cols, points, usepos=False)
computes the heatmap within rectanglerect
by a grid ofrows
rows andcols
columns.
from extratools.recttools import heatmap
heatmap(
((1, 1), (3, 4)),
3, 4,
[(1.5, 1.25), (1.5, 1.75), (2.75, 2.75), (2.75, 3.5), (3.5, 2.5)]
)
# {1: 2, 7: 1, 11: 1, None: 1}
heatmap(
((1, 1), (3, 4)),
3, 4,
[(1.5, 1.25), (1.5, 1.75), (2.75, 2.75), (2.75, 3.5), (3.5, 2.5)],
usepos=True
)
# {(0, 1): 2, (1, 3): 1, (2, 3): 1, None: 1}
setcover(whole, covered, key=len)
solves the set cover problem by covering the universe setwhole
as best as possible, using a subset of the covering setscovered
.
from extratools.settools import setcover
list(setcover(
{ 1, 2, 3, 4, 5},
[{1, 2, 3}, {2, 3, 4}, {2, 4, 5}]
))
# [{1, 2, 3}, {2, 4, 5}]
seqtools.compress(data, key=None)
compresses the sequencedata
by encoding continuous identical items to a tuple of item and count, according to run-length encoding.
from extratools.seqtools import compress
list(compress([1, 2, 2, 3, 3, 3, 4, 4, 4, 4]))
# [(1, 1), (2, 2), (3, 3), (4, 4)]
mergeseqs(seqs, default=None, key=None)
merges the sequences of equal length inseqs
into a single sequences. ReturnsNone
if there is conflict in any position.
from extratools.seqtools import mergeseqs
seqs = [
(0 , 0 , None, 0 ),
(None, 1 , 1 , None),
(2 , None, None, None),
(None, None, None, None)
]
list(mergeseqs(seqs[1:]))
# [2,
# 1,
# 1,
# None]
list(mergeseqs(seqs))
# None
strtools.smartsplit(s)
finds the best delimiter to automatically split strings
. Returns a tuple of delimiter and split substrings.
from extratools.strtools import smartsplit
smartsplit("abcde")
# (None,
# ['abcde'])
smartsplit("a b c d e")
# (' ',
# ['a', 'b', 'c', 'd', 'e'])
smartsplit("/usr/local/lib/")
# ('/',
# ['', 'usr', 'local', 'lib', ''])
smartsplit("a ::b:: c :: d")
# ('::',
# ['a ', 'b', ' c ', ' d'])
smartsplit("{1, 2, 3, 4, 5}")
# (', ',
# ['{1', '2', '3', '4', '5}'])
strtools.learnrewrite(src, dst, minlen=3)
learns the respective regular expression and template to rewritesrc
todst
.
from extratools.strtools import learnrewrite
learnrewrite(
"Elisa likes Apple.",
"Apple is Elisa's favorite."
)
# ('(.*) likes (.*).',
# "{1} is {0}'s favorite.")
tabletools.parsebymarkdown(text)
parses a text of multiple lines to a table, according to Markdown format.
from extratools.tabletools import parsebymarkdown
list(parsebymarkdown("""
| foo | bar |
| --- | --- |
| baz | bim |
"""))
# [['foo', 'bar'],
# ['baz', 'bim']]
tabletools.hasheader(data)
returns the confidence (between0
and1
) of whether the first row of the tabledata
is header.
from extratools.tabletools import hasheader
t = [
['Los Angeles' , '34°03′' , '118°15′' ],
['New York City', '40°42′46″', '74°00′21″'],
['Paris' , '48°51′24″', '2°21′03″' ]
]
hasheader(t)
# 0.0
hasheader([
['City', 'Latitude', 'Longitude']
] + t)
# 0.6666666666666666
hasheader([
['C1', 'C2', 'C3']
] + t)
# 1.0
How to install?
This package is available on PyPI. Just use pip3 install -U extratools
to install it.
To enable all the features, please install extra dependencies by pip3 install -U sh RegexOrder TagStats
.
How to cite?
When using for research purpose, please cite this library as follows.
@misc{extratools,
author = {Chuancong Gao},
title = {{extratools}},
howpublished = "\url{https://github.com/chuanconggao/extratools}",
year = {2018}
}
Any recommended library?
There are several great libraries recommended to use together with extratools
:
regex
sortedcontainers
toolz
sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.