Skip to main content

Simple webscraper for Craigslist.

Project description

CraigslistScraper

Note: CraigslistScraper is for personal use and data science only.

Updated (12/26/2023): Craigslist implemented some changes a couple of months ago that broke the previous version of this library. The recent updates to this library address these changes. That said, it also involved a complete refactor of the library, and the new version 1.1.1 is not backwards compatible with the previous version 1.0.1.

CraigslistScraper is a lightweight tool for scraping Craigslist. Users can define what they would like to search for, then CraigslistScraper can fetch and parse data from both searches and individual ads.

There are no official docs, but the code-base is ~200 lines of code and is documented.

Table of Contents

Installation

To install the package just run:

pip install craigslistscraper

The only requirements are Python 3.7+, and the requests and beautifulsoup4 libraries.

Usage

CraigslistScraper is built around 6 functions/classes for flexibility. These functions/classes are listed below.

For general searches:

  • Search
  • SearchParser
  • fetch_search

For single ads/posts:

  • Ad
  • AdParser
  • fetch_ad

SearchParser and AdParser are BeautifulSoup-like abstractions for extracting certain fields from the html data received from Craigslist. Developers may find this useful.

Search and Ad are classes that lazily fetch data from user-defined searches and ads. To define a search you need at least a query and city, and to define an ad you need at least a url. Examples are provied below and in the examples/ folder.

fetch_search() and fetch_ad() are eager and functional implementations that return a Search and Ad.


Below is a simple example, more examples can be found in the examples/ folder.

import craigslistscraper as cs
import json

# Define the search. Everything is done lazily, and so the html is not 
# fetched at this step.
search = cs.Search(
    query = "bmw e46",
    city = "minneapolis",
    category = "cto"
)

# Fetch the html from the server. Don't forget to check the status. 
status = search.fetch()
if status != 200:
    raise Exception(f"Unable to fetch search with status <{status}>.")

for ad in search.ads:
    # Fetch additional information about each ad. Check the status again.
    status = ad.fetch()
    if status != 200:
        print(f"Unable to fetch ad '{ad.title}' with status <{status}>.")
        continue

    # There is a to_dict() method for convenience. 
    data = ad.to_dict()

    # json.dumps is merely for pretty printing. 
    print(json.dumps(data, indent = 4))

Analyzing

Data can easily be converted to your json, csv, etc. and used in various downstream data analysis tasks.

CSV Example

This data can then be analyzed, some examples include:

CSV Example

CSV Example

License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

craigslistscraper-1.1.2.tar.gz (45.2 kB view details)

Uploaded Source

Built Distribution

craigslistscraper-1.1.2-py3-none-any.whl (45.9 kB view details)

Uploaded Python 3

File details

Details for the file craigslistscraper-1.1.2.tar.gz.

File metadata

  • Download URL: craigslistscraper-1.1.2.tar.gz
  • Upload date:
  • Size: 45.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.9.13 Darwin/22.6.0

File hashes

Hashes for craigslistscraper-1.1.2.tar.gz
Algorithm Hash digest
SHA256 b92cee9c52bb72119ac2ab629274acf622d35ca9f5a633c2cd05bec556dedc93
MD5 42f2e9f264e742d7ddb4afdecde46e0f
BLAKE2b-256 ffef0841b98e2b1ccea15874f83540e0c4262089417bbbe1748b21d4455dc889

See more details on using hashes here.

File details

Details for the file craigslistscraper-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for craigslistscraper-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7350250f824d3f32d846c0c6cf4c6589aa30af5069e46eace86bf93acbdabd60
MD5 d5d79f5daebe08f425ca661f5cd9f21d
BLAKE2b-256 38027b00886df6defa4c1496c34a299a0aaf7095cde94288d3268d53ebb4cba7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page