Skip to main content

This project aims to create a text-based scraper containing links to create a final PDF with general information about job openings.

Reason this release was yanked:

Da documentation

Project description

OhMyScrapper - v0.2.1

This project aims to create a text-based scraper containing links to create a final PDF with general information about job openings.

This project is using uv by default.

Scope

  • Read texts;
  • Extract links;
  • Use meta og:tags to extract information;

Installation

I recomend to use the uv, so you can just use the command bellow and everything is installed:

uv sync

How to use and test (development only)

OhMyScrapper works in 3 stages:

  1. It collects and loads urls from a text (by default input/_chat.txt) in a database;
  2. It scraps/access the collected urls and read what is relevant. If it finds new urls, they are collected as well;
  3. Export a list of urls in CSV files;

You can do 3 stages with the command:

make start

Remember to add your text file in the folder /input with the name _chat.txt!

You will find the exported files in the folder /output like this:

  • /output/report.csv
  • /output/report.csv-preview.html
  • /output/urls-simplified.csv
  • /output/urls-simplified.csv-preview.html
  • /output/urls.csv
  • /output/urls.csv-preview.html

BUT: if you want to do step by step, here it is:

First we load a text file you would like to look for urls, the idea here is to use the whatsapp history, but it works with any txt file.

The default file is input/_chat.txt. If you have the default file you just use the command load:

make load

or, if you have another file, just use the argument -file like this:

uv run main.py load -file=my-text-file.txt

That will create a database if it doesn't exist and store every url the oh-my-scrapper find. After that, let's scrap the urls with the command scrap-urls:

make scrap-urls

That will scrap only the linkedin urls we are interested in. For now they are:

  • linkedin_post: https://%.linkedin.com/posts/%
  • linkedin_redirect: https://lnkd.in/%
  • linkedin_job: https://%.linkedin.com/jobs/view/%
  • linkedin_feed" https://%.linkedin.com/feed/%
  • linkedin_company: https://%.linkedin.com/company/%

But we can use every other one generically using the argument --ignore-type:

uv run main.py scrap-urls --ignore-type

And we can ask to make it recursively adding the argument --recursive:

uv run main.py scrap-urls --recursive

!!! important: we are not sure about blocks we can have for excess of requests

And we can finally export with the command:

make export

That's the basic usage! But you can understand more using the help:

uv run main.py --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ohmyscrapper-0.2.1.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ohmyscrapper-0.2.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file ohmyscrapper-0.2.1.tar.gz.

File metadata

  • Download URL: ohmyscrapper-0.2.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ohmyscrapper-0.2.1.tar.gz
Algorithm Hash digest
SHA256 2d0d97dcb47cc5f028d20d6de34e551853f89c290d5918c0f681b163e30f19fe
MD5 9189ac282f5253dc859d0d3588636abb
BLAKE2b-256 7a52f37ed5fcfee8f7089bb1a2d452f68e965fca957b3fc6c43cf36fe48ecb6e

See more details on using hashes here.

File details

Details for the file ohmyscrapper-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: ohmyscrapper-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ohmyscrapper-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 86b4a52644a9a6f7b91cdcacb410788f32ed15420a08cba08b315ff68c8b9b4d
MD5 6dbdf60ece532d972f2acb32d583671a
BLAKE2b-256 30a1aa14c17db205565d8c59a7b4c0b6faa228276e2b8c160d0cec15a3aaa7c6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page