Data retrieval from remote archives
Project description
Overview
Advarchs is simple tool for retrieving data from web archives. It is especially useful if you are working with remote data stored in compressed spreadsheets or of similar format.
Getting Started
Say you need to perform some data anlytics on an excel spreadsheet that gets refreshed every month and stored in RAR format. You can target a that file and convert it to a pandas dataframe with the following procedure:
import pd
import os
import tempfile
from advarchs import webfilename,extract_web_archive
TEMP_DIR = tempfile.gettempdir()
url = "http://www.site.com/archive.rar"
arch_file_name = webfilename(url)
arch_path = os.path.join(TEMP_DIR, arch_file_name)
xlsx_files = extract_web_archive(url, arch_path, ffilter=['xlsx'])
for xlsx_f in xlsx_files:
xlsx = pd.ExcelFile(xlsx_f)
...
Requirements
Python 3.5+
p7zip
Special note
On CentOS and Ubuntu <= 16.04, the following packages are needed:
unrar
Installation
pip install advarchs
Contributing
See CONTRIBUTING
Code of Conduct
This project adheres to the Contributor Covenant 1.2. By participating, you are advised to adhere to this Code of Conduct in all your interactions with this project.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file advarchs-0.1.7.tar.gz
.
File metadata
- Download URL: advarchs-0.1.7.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.16 CPython/3.7.3 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d3e4e805663d82d275f441febc42a7955477aec0d702c1f70fe8e37cd32c4c6 |
|
MD5 | e93476e7dc019753ec19676041be17c3 |
|
BLAKE2b-256 | 0f866c0fa4e56faed31f1cc2b5232d6919cc374bf9b5d0109f3ea64f83e6de18 |
File details
Details for the file advarchs-0.1.7-py3-none-any.whl
.
File metadata
- Download URL: advarchs-0.1.7-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.16 CPython/3.7.3 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e804502cb8e94b95790df27ae6f59109ac9e2f7e1c8acbfeef840ad4204ef75b |
|
MD5 | edd3cba43525cfc4e1d33c10513eec79 |
|
BLAKE2b-256 | 3b895c909cc8c079a90b09b8f0ff361a442e349930332b46cfbf5adf7b470c16 |