Skip to main content

Download small ad listings from OLX marketplaces"

Project description

Download data from OLX listings

This is a Python script that can download listings from the small ads platform OLX in the following countries:

  • Argentina
  • Bulgaria
  • Bosnia
  • Brazil
  • Colombia
  • Costa Rica
  • Ecuador
  • Egypt
  • Guatemala
  • India
  • Indonesia
  • Kazakhstan
  • Lebanon
  • Oman
  • Pakistan
  • Panama
  • Peru
  • Poland
  • Portugal
  • Romania
  • San Salvador
  • South Africa
  • Ukraine
  • Uzbekistan

If you use olxsearch for scientific research, please cite it in your publication:
Fink, C. (2020): olxsearch: Python script to download OLX small ads data. doi:10.5281/zenodo.3906038.

Dependencies

The script is written in Python 3 and depends on the Python modules BeautifulSoup, dateparser, geocoder, pandas and Requests.

To install dependencies on a Debian-based system, run:

apt-get update -y &&
apt-get install -y python3-dev python3-pip python3-virtualenv

(There’s an Archlinux AUR package pulling in all dependencies, see further down)

Installation

  • using pip or similar:
pip3 install olxsearch
  • OR: manually:

    • Clone this repository
    git clone https://gitlab.com/christoph.fink/olxsearch.git
    
    • Change to the cloned directory
    • Use the Python setuptools to install the package:
    cd olxsearch
    python3 ./setup.py install
    
  • OR: (Arch Linux only) from AUR:

# e.g. using yay
yay python-olxsearch

Usage

Import the olxsearch module.

Then instantiate an olxsearch.OlxSearch object, supplying a country and a search_term as arguments. The object’s listings property is a generator providing access to each ad listed on the platform that matches the supplied search term. Its listings_as_dataframe property is a pandas.DataFrame containing all these ads.

import olxsearch

olx_search_argentina = olxsearch.OlxSearch("Argentina", "Yerba mate")
print(next(olx_search_argentina.listings))
# {'id': '1102114778', 'title': 'YERBA MATE SECADERO X 500 GRS.', 'description': 'YERBA MATE SECADERO \nPAQUETE X 500 GRS. $70\nPACK X 10 UNIDADES VENTA MÍNIMA\nCALIDAD DE EXPORTACIÓN \nEXCELENTE RELACIÓN PRECIO * CALIDAD \nAPROVECHE ANTES QUE SE TERMINEN\nCOMUNÍQUESE A NUESTRO WHATSAPP', 'created_at': '2020-02-18T16:57:38-03:00', 'created_at_first': '2020-02-18T16:57:02-03:00', 'republish_date': None, 'images': ['https://apollo-virginia.akamaized.net:443/v1/files/ns52s6zc369y2-AR/image'], 'price': (70.0, 'ARS'), 'lat': -34.626, 'lon': -58.4}


# pandas DataFrame
olx_search_southafrica = olxsearch.OlxSearch("South Africa", "Biltong")
listings = olx_search_southafrica.listings_as_dataframe
#             id                                              title  ...        lat        lon
# 0   1061464181                                     Biltong slicer  ... -25.703179  28.178248
# 1   1061707900         Claasen Biltong Slicer excellent condition  ... -28.549999  25.233299
# 2   1061884723                                      Biltong maker  ... -26.701476  27.092649
# ...
# 38  1061429395                                      Biltong snyer  ... -29.082081  26.148292
# 39  1059714562  Biltongkas / biltong box / biltong dryers / me...  ... -25.712152  28.002048
# 
# [40 rows x 10 columns]

Data privacy

By default, olxsearch pseudonymises downloaded metadata, i.e. it replaces (direct) identifiers with randomised identifiers (generated using hashes, i.e. one-way “encryption”). This serves as one step of a responsible data processing workflow. However, other (meta-)data might nevertheless qualify as indirect identifiers, as they, combined or on their own, might allow re-identification of the seller. If you want to use data downloaded using olxsearch in a GDPR-compliant fashion, you have to follow up the data collection stage with data minimisation and further pseudonymisation or anonymisation efforts.

Olxsearch can keep original identifiers (i.e. skip pseudonymisation). To instruct it to do so, instantiate an OlxSearch with the parameter pseudonymise_identifiers=False. Ensure that you fulfil all legal and organisational requirements to handle personal information before you decide to collect non-pseudonyismed data.

import olxsearch

downloader = OlxSearch(
    "Ecuador",
    "bolones verdes",
    pseudonymise_identifiers = False  # get legal/ethics advice before doing this
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olxsearch-1.0.6.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

olxsearch-1.0.6-py3-none-any.whl (53.0 kB view details)

Uploaded Python 3

File details

Details for the file olxsearch-1.0.6.tar.gz.

File metadata

  • Download URL: olxsearch-1.0.6.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for olxsearch-1.0.6.tar.gz
Algorithm Hash digest
SHA256 76499c10a8eabb1475c34ace0839e42007d853816d88e13e5dce8586adeafaab
MD5 792b50f25a9a91df52580423fce5b5ea
BLAKE2b-256 1de8981b3e84b3103928d30eb40f6bd2cfd0d618c0606cce44155808ef4876cb

See more details on using hashes here.

File details

Details for the file olxsearch-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: olxsearch-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 53.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.3

File hashes

Hashes for olxsearch-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f4762640bf984b01b0e5a7f60f6d938a2301f58dc9de0897014e2d0101e96607
MD5 46b48a13885b757468628065a5b520d9
BLAKE2b-256 b8208ad9aba217f803960f5cd74c9d6d5832e5bb1a2f5bc448ad8d24a8398a6c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page