A Python package that eases the process of retrieving, organizing and altering data.
Project description
Fluxify
A Python package that eases the process of retrieving, organizing and altering data.
Required packages
- pandas
- imperium
- ijson
Installation
pip install fluxify
Main classes
fluxify.mapper.Mapper
This class is used read and processing fast files with small amounts of data that can be loaded into memory.
fluxify.lazy_mapper.LazyMapper
You've probable guessed it, this class is used to iterate on large files of data wether it is of format CSV, JSON or XML.
Usage
Retrieve data from a simple CSV file
id,brand,price,state,published_at
938,Xaomi,390.90,used,2020-01-03 12:32:29
04593,iPhone,1299.90,new,2020-01-02 09:48:12
Mapper implementation
from fluxify.mapper import Mapper
import yaml
# Could also be loaded from a file
yamlmapping = """
brand:
col: 1
price:
col: 2
state:
col: 3
publish_date:
col: 4
transformations:
- { transformer: 'date', in_format: '%Y-%m-%d %H:%M:%S', out_format: '%H:%M %d/%m/%Y' }
is_new:
conditions:
-
condition: "subject['state'] == 'new'"
returnOnSuccess: True
returnOnFail: False
"""
Map = yaml.load(yamlmapping, Loader=yaml.FullLoader)
mapper = Mapper(_type='csv')
data = mapper.map('path/to/csvfile.csv', Map)
print(data)
Output
[
{
'brand': 'Xaomi',
'price': '390.90',
'state': 'used',
'published_date': '12:32 03/01/2020'
'is_new': False
},
{
'brand': 'iPhone',
'price': '1299.90',
'state': 'new',
'published_date': '09:48 02/01/2020'
'is_new': True
}
]
LazyMapper implementation
The LazyMapper does not return all the mapped data at the end,
instead it maps the data in small sizes that you can specify in order to not max out the memory.
from fluxify.lazy_mapper import LazyMapper
import yaml
# Could also be loaded from a file
yamlmapping = """
brand:
col: 1
price:
col: 2
state:
col: 3
publish_date:
col: 4
transformations:
- { transformer: 'date', in_format: '%Y-%m-%d %H:%M:%S', out_format: '%H:%M %d/%m/%Y' }
is_new:
conditions:
-
condition: "subject['state'] == 'new'"
returnOnSuccess: True
returnOnFail: False
"""
Map = yaml.load(yamlmapping, Loader=yaml.FullLoader)
mapper = LazyMapper(_type='csv', error_tolerance=True, bulksize=500)
mapper.map('path/to/csvfile.csv', Map)
def some_callback(results):
for item in results:
pass # Perform some action
mapper.set_callback(some_callback)
mapper.map('path/to/csvfile.csv', Map)
As you can see, in this example the mapper will call the callback function every time it accumulates 500 mapped items.
Mapping settings
col key is used to specify the column number or attribute name from where the value must be retrieved.
If you want to specify the input data as the retrieved value use _all_ as the value of col
transformations key is used to apply transformations to the retrieved value. Available transformers are listed below.
conditions key is used to apply conditions and alter the retrieved value.
These conditions are in Python syntax, but you may not use all of Python's native functions.
Available functions are listed below.
default is used to define a default value for when a retrieved value is null.
Warning: If the default key is defined with a value, it will be applied before applying transformations
and conditions.
Special cases for JSON and XML
XML
Set the multiple to true if you want to retrieve data from multiple XML tags with the same name.
Use the index key with multiple: true if you wish to retrieve only one value from a number of XML tags.
When retrieving a XML value, the default behaviour is to retrieve the .text value of the tag.
If you wish to change this, to retrieve a tag containing many other tags, use raw key and set it to false.
This will return you an object of type xml.etree.Element, you could later apply transformations on this object to alter,
organize and retrieve the data.
JSON
Use index key to retrieve a specific value from an array.
Of course, it only works if the retrieved value is of type array.
Supported formats
| Format | CSV | JSON | XML | TXT |
|---|---|---|---|---|
| Supported | YES | YES | YES | NO |
Transformers
Fluxify has built-in transformers that can alter/modify the data.
| Function | Arguments | Description |
|---|---|---|
| number | stringvalue | Parses a string to an integer or float value |
| split | delimiter, index | Splits a string into parts with a delimiter and returns the splitted result if the index argument is not defined. |
| date | in_format, out_format | Let's you format a date string to the desired format. |
| replace | search, new | Replaces the search value with new value from string |
| boolean | No arguments | Parses a string to Boolean if the string contains [true |
| equipments_from_string | delimiter | Custom usage |
| options_from_string | delimiter | Custom usage |
Exceptions
Fluxify has different Exception classes for different reasons
They reside in the exceptions sub-package fluxify.exceptions
| Class | Arguments | Description |
|---|---|---|
| ArgumentNotFoundException | message | This exception is raised whenever a argument is not found. |
| InvalidArgumentException | message | This exception is raised when a passed parameter/argument is invalid. |
| ConditionNotFoundException | message | This exception is raised when the "condition" key is not defined in the mapping. |
| UnsupportedTransformerException | message | This exception is raised when a transformer other than the ones defined above, is used. |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fluxify-0.0.14-py3-none-any.whl.
File metadata
- Download URL: fluxify-0.0.14-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4416b9aa9f9454fa4ebee699f1dab0381b4c118b0729f3f3bf65554232ac4663
|
|
| MD5 |
7e4dff2bd8921842f7ce7b491cc8c25a
|
|
| BLAKE2b-256 |
8d3a898b3508b23133892061cb0dfce085b6304a21025b3e12808778f851181a
|