Skip to main content

Data retrieval from remote archives

Project description

PyPI Version Supported Python Versions Build Status Wheel Status

Overview

Advarchs is simple tool for retrieving data from web archives. It is especially useful if you are working with remote data stored in compressed spreadsheets or of similar format.

Getting Started

Say you need to perform some data anlytics on an excel spreadsheet that gets refreshed every month and stored in RAR format. You can target a that file and convert it to a pandas dataframe with the following procedure:

import pd
import os
import tempfile
from advarchs import webfilename,extract_web_archive

TEMP_DIR = tempfile.gettempdir()

url = "http://www.site.com/archive.rar"
arch_file_name = webfilename(url)
arch_path = os.path.join(TEMP_DIR, arch_file_name)
xlsx_files = extract_web_archive(url, arch_path, ffilter=['xlsx'])
for xlsx_f in xlsx_files:
    xlsx = pd.ExcelFile(xlsx_f)

...

Requirements

  • Python 3.5+

  • p7zip

Special note

On CentOS and Ubuntu <= 16.04, the following packages are needed:

  • unrar

Installation

pip install advarchs

Contributing

See CONTRIBUTING

Code of Conduct

This project adheres to the Contributor Covenant 1.2. By participating, you are advised to adhere to this Code of Conduct in all your interactions with this project.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

advarchs-0.1.7.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

advarchs-0.1.7-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file advarchs-0.1.7.tar.gz.

File metadata

  • Download URL: advarchs-0.1.7.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.3 Windows/10

File hashes

Hashes for advarchs-0.1.7.tar.gz
Algorithm Hash digest
SHA256 5d3e4e805663d82d275f441febc42a7955477aec0d702c1f70fe8e37cd32c4c6
MD5 e93476e7dc019753ec19676041be17c3
BLAKE2b-256 0f866c0fa4e56faed31f1cc2b5232d6919cc374bf9b5d0109f3ea64f83e6de18

See more details on using hashes here.

File details

Details for the file advarchs-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: advarchs-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.3 Windows/10

File hashes

Hashes for advarchs-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 e804502cb8e94b95790df27ae6f59109ac9e2f7e1c8acbfeef840ad4204ef75b
MD5 edd3cba43525cfc4e1d33c10513eec79
BLAKE2b-256 3b895c909cc8c079a90b09b8f0ff361a442e349930332b46cfbf5adf7b470c16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page