Download small ad listings from OLX marketplaces"
Project description
Download data from OLX listings
This is a Python script that can download listings from the small ads platform OLX in the following countries:
- Argentina
- Bulgaria
- Bosnia
- Brazil
- Colombia
- Costa Rica
- Ecuador
- Egypt
- Guatemala
- India
- Indonesia
- Kazakhstan
- Lebanon
- Oman
- Pakistan
- Panama
- Peru
- Poland
- Portugal
- Romania
- San Salvador
- South Africa
- Ukraine
- Uzbekistan
If you use olxsearch for scientific research, please cite it in your publication:
Fink, C. (2020): olxsearch: Python script to download OLX small ads data. doi:10.5281/zenodo.3906038.
Dependencies
The script is written in Python 3 and depends on the Python modules BeautifulSoup, dateparser, geocoder, pandas and Requests.
To install dependencies on a Debian-based system, run:
apt-get update -y &&
apt-get install -y python3-dev python3-pip python3-virtualenv
(There’s an Archlinux AUR package pulling in all dependencies, see further down)
Installation
- using
pip
or similar:
pip3 install olxsearch
-
OR: manually:
- Clone this repository
git clone https://gitlab.com/christoph.fink/olxsearch.git
- Change to the cloned directory
- Use the Python
setuptools
to install the package:
cd olxsearch python3 ./setup.py install
-
OR: (Arch Linux only) from AUR:
# e.g. using yay
yay python-olxsearch
Usage
Import the olxsearch
module.
Then instantiate an olxsearch.OlxSearch
object, supplying a country
and a search_term
as arguments. The object’s listings
property is a generator providing access to each ad listed on the platform that matches the supplied search term. Its listings_as_dataframe
property is a pandas.DataFrame
containing all these ads.
import olxsearch
olx_search_argentina = olxsearch.OlxSearch("Argentina", "Yerba mate")
print(next(olx_search_argentina.listings))
# {'id': '1102114778', 'title': 'YERBA MATE SECADERO X 500 GRS.', 'description': 'YERBA MATE SECADERO \nPAQUETE X 500 GRS. $70\nPACK X 10 UNIDADES VENTA MÍNIMA\nCALIDAD DE EXPORTACIÓN \nEXCELENTE RELACIÓN PRECIO * CALIDAD \nAPROVECHE ANTES QUE SE TERMINEN\nCOMUNÍQUESE A NUESTRO WHATSAPP', 'created_at': '2020-02-18T16:57:38-03:00', 'created_at_first': '2020-02-18T16:57:02-03:00', 'republish_date': None, 'images': ['https://apollo-virginia.akamaized.net:443/v1/files/ns52s6zc369y2-AR/image'], 'price': (70.0, 'ARS'), 'lat': -34.626, 'lon': -58.4}
# pandas DataFrame
olx_search_southafrica = olxsearch.OlxSearch("South Africa", "Biltong")
listings = olx_search_southafrica.listings_as_dataframe
# id title ... lat lon
# 0 1061464181 Biltong slicer ... -25.703179 28.178248
# 1 1061707900 Claasen Biltong Slicer excellent condition ... -28.549999 25.233299
# 2 1061884723 Biltong maker ... -26.701476 27.092649
# ...
# 38 1061429395 Biltong snyer ... -29.082081 26.148292
# 39 1059714562 Biltongkas / biltong box / biltong dryers / me... ... -25.712152 28.002048
#
# [40 rows x 10 columns]
Data privacy
By default, olxsearch pseudonymises downloaded metadata, i.e. it replaces (direct) identifiers with randomised identifiers (generated using hashes, i.e. one-way “encryption”). This serves as one step of a responsible data processing workflow. However, other (meta-)data might nevertheless qualify as indirect identifiers, as they, combined or on their own, might allow re-identification of the seller. If you want to use data downloaded using olxsearch in a GDPR-compliant fashion, you have to follow up the data collection stage with data minimisation and further pseudonymisation or anonymisation efforts.
Olxsearch can keep original identifiers (i.e. skip pseudonymisation). To instruct it to do so, instantiate an OlxSearch
with the parameter pseudonymise_identifiers=False
. Ensure that you fulfil all legal and organisational requirements to handle personal information before you decide to collect non-pseudonyismed data.
import olxsearch
downloader = OlxSearch(
"Ecuador",
"bolones verdes",
pseudonymise_identifiers = False # get legal/ethics advice before doing this
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file olxsearch-1.0.6.tar.gz
.
File metadata
- Download URL: olxsearch-1.0.6.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76499c10a8eabb1475c34ace0839e42007d853816d88e13e5dce8586adeafaab |
|
MD5 | 792b50f25a9a91df52580423fce5b5ea |
|
BLAKE2b-256 | 1de8981b3e84b3103928d30eb40f6bd2cfd0d618c0606cce44155808ef4876cb |
File details
Details for the file olxsearch-1.0.6-py3-none-any.whl
.
File metadata
- Download URL: olxsearch-1.0.6-py3-none-any.whl
- Upload date:
- Size: 53.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f4762640bf984b01b0e5a7f60f6d938a2301f58dc9de0897014e2d0101e96607 |
|
MD5 | 46b48a13885b757468628065a5b520d9 |
|
BLAKE2b-256 | b8208ad9aba217f803960f5cd74c9d6d5832e5bb1a2f5bc448ad8d24a8398a6c |