The Google Refine Python Client Library provides an interface to communicating with a Google Refine server.
Project description
The Google Refine Python Client Library provides an interface to communicating with a Google Refine server.
Currently, the following API is supported:
project creation/import, deletion, export
facet computation
text
text filter
numeric
blank
starred & flagged
… extensible class
‘engine’: managing multiple facets and their computation results
sorting & reordering
clustering
transforms
transposes
single and mass edits
annotation (star/flag)
column
move
add
split
rename
reorder
remove
reconciliation
reconciliation judgment facet
guessing column type
querying reconciliation services preferences
perform reconciliation
Configuration
By default the Google Refine server URL is http://127.0.0.1:3333 The environment variables GOOGLE_REFINE_HOST and GOOGLE_REFINE_PORT enable overriding the host & port.
In order to run all tests, a live Refine server is needed. No existing projects are affected.
Installation
(Someone with more familiarity with python’s byzantine collection of installation frameworks is very welcome to improve/”best practice” all this.)
Install dependencies, which currently is urllib2_file:
sudo pip install -r requirements.txt
Ensure you have a Refine server running somewhere and, if necessary, set the envvars as above.
Run tests, build, and install:
python setup.py test # to do a subset, e.g., --test-suite tests.test_facet
python setup.py build
python setup.py install
There is a Makefile that will do this too, and more.
TODO
The API so far has been filled out from building a test suite to carry out the actions in David Huynh’s Refine tutorial which while certainly showing off a wide range of Refine features doesn’t cover the entire suite. Notable exceptions currently include:
reconciliation support is useful but not complete
undo/redo
Freebase
join columns
columns from URL
Contribute
Patches welcome! Source is at https://github.com/PaulMakepeace/refine-client-py
Useful Tools
One aspect of development is watching HTTP transactions. To that end, I found Fiddler on Windows and HTTPScoop invaluable. The latter won’t URL-decode nor nicely format JSON but the Online JavaScript Beautifier will.
Credits
Paul Makepeace, author, <paulm@paulm.com>
David Huynh, initial cut
Artfinder, inspiration
Some data used in the test suite has been used from publicly available sources,
louisiana-elected-officials.csv: from http://www.sos.louisiana.gov/tabid/136/Default.aspx
us_economic_assistance.csv: “The Green Book”
eli-lilly.csv: ProPublica’s “Docs for Dollars” leading to a Lilly Faculty PDF processed by David Huynh’s ScraperWiki script
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file refine-client-0.2.1.tar.gz
.
File metadata
- Download URL: refine-client-0.2.1.tar.gz
- Upload date:
- Size: 550.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
05f82b559b060cc00634423fb5767153aa803bdec3f3e4f769f0442b305e6280
|
|
MD5 |
901820ddec5afb06959029bfd680b97a
|
|
BLAKE2b-256 |
1eccb5df9928f76fdf13371b4a8ba89c6d44d9ea9c9af96add379971bed8a912
|