Skip to main content

This project aims to create a text-based scraper containing links to create a final PDF with general information about job openings.

Reason this release was yanked:

Bad documentation

Project description

OhMyScrapper - v0.1.1

This project aims to create a text-based scraper containing links to create a final PDF with general information about job openings.

This project is using uv by default.

Scope

  • Read texts;
  • Extract links;
  • Use meta og:tags to extract information;

Installation

I recomend to use the uv, so you can just use the command bellow and everything is installed:

uv sync

How to use and test (development only)

OhMyScrapper works in 3 stages:

  1. It collects and loads urls from a text (by default input/_chat.txt) in a database;
  2. It scraps/access the collected urls and read what is relevant. If it finds new urls, they are collected as well;
  3. Export a list of urls in CSV files;

You can do 3 stages with the command:

make start

Remember to add your text file in the folder /input with the name _chat.txt!

You will find the exported files in the folder /output like this:

  • /output/report.csv
  • /output/report.csv-preview.html
  • /output/urls-simplified.csv
  • /output/urls-simplified.csv-preview.html
  • /output/urls.csv
  • /output/urls.csv-preview.html

BUT: if you want to do step by step, here it is:

First we load a text file you would like to look for urls, the idea here is to use the whatsapp history, but it works with any txt file.

The default file is input/_chat.txt. If you have the default file you just use the command load:

make load

or, if you have another file, just use the argument -file like this:

uv run main.py load -file=my-text-file.txt

That will create a database if it doesn't exist and store every url the oh-my-scrapper find. After that, let's scrap the urls with the command scrap-urls:

make scrap-urls

That will scrap only the linkedin urls we are interested in. For now they are:

  • linkedin_post: https://%.linkedin.com/posts/%
  • linkedin_redirect: https://lnkd.in/%
  • linkedin_job: https://%.linkedin.com/jobs/view/%
  • linkedin_feed" https://%.linkedin.com/feed/%
  • linkedin_company: https://%.linkedin.com/company/%

But we can use every other one generically using the argument --ignore-type:

uv run main.py scrap-urls --ignore-type

And we can ask to make it recursively adding the argument --recursive:

uv run main.py scrap-urls --recursive

!!! important: we are not sure about blocks we can have for excess of requests

And we can finally export with the command:

make export

That's the basic usage! But you can understand more using the help:

uv run main.py --help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ohmyscrapper-0.1.1.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ohmyscrapper-0.1.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file ohmyscrapper-0.1.1.tar.gz.

File metadata

  • Download URL: ohmyscrapper-0.1.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ohmyscrapper-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a9b4ee130243b72be712ad8d11b600956763775eed08bdc077e209f218179404
MD5 26c6aba586b5b14bf303f90fd8a7664f
BLAKE2b-256 5193849a77a9f0bc1bc0ac92193e70ee42dfe95382c90b570bf9a8f80b1a4f65

See more details on using hashes here.

File details

Details for the file ohmyscrapper-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ohmyscrapper-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ohmyscrapper-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd46d6c94cbb317aaf3b57c2c419e2ca1a03a3dea4b572a204eec609aced0d39
MD5 9b8a32898a4fe312234c374cfb03a76b
BLAKE2b-256 9b6301957764735eb3efb5b4828a45e3c92aced219cce2c71a475aec42018594

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page