Airtable Download CSV helper
Project description
Airscraper
A simple scraper to download csv from any airtable shared view programatically, think of it as a programatic way of downloading csv from airtable shared view. Use it if:
- You want to download a shared view periodically
- You don't mind the shared view to be accessed basically without authorization
Requirements
Because its a simple scraper, basically only beautifulsoup is needed
- BeautifulSoup4
- Pandas
Installation
Using pip (Recommended)
pip install airscraper
Build From Source
- Install build dependencies:
pip install --upgrade pip setuptools wheel
pip install tqdm
pip install --user --upgrade twine
- Build the Package
python setup.py bdist_wheel
- Install the built Package
pip install --upgrade dist/airscraper-0.1-py3-none-any.whl
- Use it without adding python in front of it
airscraper [url]
Direct Execution (Testing Purpose)
- Clone this project
- Install the requirements
pip install -r requirements.txt
- run the code
python airscraper/airscraper.py [url]
Usage
Create a shared view link and use that link to download the shared view into csv. All [url]
mentioned in the examples are referring to the shared view link you get from this step.
As CLI
# Print Result to Terminal
python airscraper/airscraper.py [url]
# Pipe the result to csv file
python airscraper/airscraper.py [url] > [filename].csv
As Python Package
from airscraper import AirScraper
client = AirScraper([url])
data = client.get_table().text
# print the result
print(data)
# save as file
with open('data.csv','w') as f:
f.write(data)
# use it with pandas
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(data), sep=',')
df.head()
Help
usage: airscraper [-h] [-l LOCALE] [-tz TIMEZONE] view_url
Download CSV from Airtable Shared View Link, You can pass the result to file using
'> name.csv'
positional arguments:
view_url url generated from sharing view using link in airtable
optional arguments:
-h, --help show this help message and exit
-l LOCALE, --locale LOCALE
Your locale, default to 'en'
-tz TIMEZONE, --timezone TIMEZONE
Your timezone, use URL encoded string, default to
'Asia/Jakarta'
What's next
Currently I'm thinking of several things in mind:
- ✅ Making this installed package
- Adds accessibility to use it in FaaS Platform (most use case I could thought of are related to this)
- ✅ Create a proper package that can be imported (so I could use it in my ETL script)
- ✅ Fill in LICENSE and setup.py, (to be honest I have no idea yet what to put into it)
- It turns out there are a lot of resources out there if you know what to look for :)
Contributing
If you have similar problem or have any idea to improve this package please let me know in the issues or just hit me up on twitter @BanditelolRP
Development
If you're going to try to develop it yourself, here's my overall workflow
1. Create a virtual environment
I usually used venv
on python 3.8 to create a new virtualenvironment
python -m venv venv
# and activate the environment
source venv/bin/activate
2. Create a virtual environment
Install necessary requirements and install the package for development using editable
pip install wheels pytest -q
pip install -r requirements.txt
pip install -e .
3. Play around with the code
You can browse the notebook for explanation on how it works and some example use case, and I really appreciate helps in documentation and testing. Have fun!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for airscraper-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f839ec17073939f5ce782bdad2e1ee2096b39ef936fbafb2c69315b9cbf895c |
|
MD5 | 0d02f8c96ab9075941d7439ce9bd60b4 |
|
BLAKE2b-256 | 18d6e9e7b3fa43088bb4fac813ecf6404f100473c7527c239df3c50f32fbf923 |