Skip to main content

scrape housing listings on bien'ici from any bien'ici search url 💛

Project description

bieniciscraper

bieniciscraper is a Python package that allows you to scrape all real estate listings from any Bien'Ici URL 💛

Table of Contents

  1. Features
  2. Installation
  3. Usage
  4. Command-line Arguments
  5. Important Notes
  6. Disclaimer
  7. License

Features

  • URL Support: Accepts any search URL.
  • Comprehensive Data:
    • Retrieves all listings.
    • Extracts 13 attributes per listing:
      • city
      • postal_code
      • ad_type
      • property_type
      • reference
      • title
      • publication_date
      • modification_date
      • new_property
      • rooms_quantity
      • bedrooms_quantity
      • price
      • photos
  • Flexibility:
    • Limit the scope of scraping with the dynamic -l argument.
    • Use any search URL with the -u dynamic argument.
    • Export data with a customized file name using the -o dynamic argument.
  • Reliability:
    • A resilient structure with the applied retry logic.
    • Exports data in a structured .csv file format.

Installation

$ pip3 install bieniciscraper

Note: The installation will also install the requests and retry external libraries.

Usage

$ bieniciscraper -u https://www.bienici.com/recherche/achat/france/chateau -l 10 -o demo.csv
going to page: 1
total results: 591
total results to scrape: 10
scraped: Château à vendre dans le lot avec dépendances et piscine.
scraped: Turenne Collonges la rouge - Demeure du XVIII siècle de 300  habitables sur une parcelle 1,9 ha à rénover entièrement
scraped: Manoir 15 pièces BIVIERS
scraped: Château du XVIème siècle et son parc au coeur de Lyon
scraped: Domaine 3 hectares proche Etretat
scraped: Vente Château 19 pièces
scraped: ANCIENNE DEPENDANCE DE L'ABBAYE DE CONQUES, CONSTITUEE D'UN CHATEAU
scraped: Château
scraped: Vente Château 8 pièces
scraped: DOMAINE D'EXCEPTION MONTS DU LYONNAIS
limit reached
csv written
elapsed: 1.20 s
~~ success
 _       _         _            
| |     | |       | |          
| | ___ | |__  ___| |_ __ __  
| |/ _ \| '_ \/ __| __/| '__|
| | (_) | |_) \__ \ |_ | |  
|_|\___/|_.__/|___/\__||_|  

Command-line Arguments

  • --url/-u: Specify your search URL.
  • --limit/-l: Limit the number of items you want to scrape.
  • --output/-o: Name the file in which data will be saved.

Important Notes

This Python script collects data from the internal Bien'Ici API. It can convert any Bien'Ici search URL into an available API request using advanced website-side JS reverse-engineering.

Beware: scraping is limited to 2,500 ads per search url. Bien'ici allows access to a maximum of 100 pages and then limits the display. To bypass this limitation, split your main search link into smaller scoped links:

For instance, to scrape all listings from Paris, divide by neighborhoods:
75001 all listings
75002 all listings
... and so on.

💇

Always ensure that your usage adheres to the legal constraints relevant in your jurisdiction. If you have package-related request, please contact us at: contact@lobstr.io.

Disclaimer

This tool is intended for educational use. Always ensure that scraping a website is within your legal rights before using this or any other scraping tool. Respect the robots.txt of websites and be conscious of ethical and legal considerations.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bieniciscraper-1.0.4.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

bieniciscraper-1.0.4-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file bieniciscraper-1.0.4.tar.gz.

File metadata

  • Download URL: bieniciscraper-1.0.4.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for bieniciscraper-1.0.4.tar.gz
Algorithm Hash digest
SHA256 827250e12411bab12b78cc604709ae2c2074e77e8bb9107be9a9d2b4fe4894dd
MD5 1bbb695bfd7c921e925a2ee4befb42c3
BLAKE2b-256 efba490550960875936eb192d2db37f0c3f5f20ab6190e9003fb40769a6accac

See more details on using hashes here.

File details

Details for the file bieniciscraper-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for bieniciscraper-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 eb724741f5fcad3f36d45b0fb5a1ed7e4fc5e5a9a48a41d5994a58a44fdbdaf1
MD5 503166af870c3e9b9c941f0f8ddba76b
BLAKE2b-256 3a8ce17f0b4b2721cb9e80a07852c67b1777ab6c937a138bd7d98c2fb5a17722

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page