OhMyScrapper scrapes texts and urls looking for links and jobs-data to create a final report with general information about job positions.

Project description

🐶 OhMyScrapper - v0.4.0

OhMyScrapper scrapes texts and urls looking for links and jobs-data to create a final report with general information about job positions.

Scope

Read texts;
Extract and load urls;
Scrapes the urls looking for og:tags and titles;
Export a list of links with relevant information;

Installation

You can install directly in your pip:

pip install ohmyscrapper

I recomend to use the uv, so you can just use the command bellow and everything is installed:

uv add ohmyscrapper
uv run ohmyscrapper --version

But you can use everything as a tool, for example:

uvx ohmyscrapper --version

How to use and test (development only)

OhMyScrapper works in 3 stages:

It collects and loads urls from a text in a database;
It scraps/access the collected urls and read what is relevant. If it finds new urls, they are collected as well;
Export a list of urls in CSV files;

You can do 3 stages with the command:

ohmyscrapper start

Remember to add your text file in the folder /input with the name that finishes with .txt!

You will find the exported files in the folder /output like this:

/output/report.csv
/output/report.csv-preview.html
/output/urls-simplified.csv
/output/urls-simplified.csv-preview.html
/output/urls.csv
/output/urls.csv-preview.html

BUT: if you want to do step by step, here it is:

First we load a text file you would like to look for urls. It it works with any txt file.

The default folder is /input. Put one or more text (finished with .txt) files in this folder and use the command load:

ohmyscrapper load

or, if you have another file in a different folder, just use the argument -input like this:

ohmyscrapper load -input=my-text-file.txt

In this case, you can add an url directly to the database, like this:

ohmyscrapper load -input=https://cesarcardoso.cc/

That will append the last url in the database to be scraped.

That will create a database if it doesn't exist and store every url the oh-my-scrapper find. After that, let's scrap the urls with the command scrap-urls:

ohmyscrapper scrap-urls --recursive --ignore-type

That will scrap only the linkedin urls we are interested in. For now they are:

linkedin_post: https://%.linkedin.com/posts/%
linkedin_redirect: https://lnkd.in/%
linkedin_job: https://%.linkedin.com/jobs/view/%
linkedin_feed" https://%.linkedin.com/feed/%
linkedin_company: https://%.linkedin.com/company/%

But we can use every other one generically using the argument --ignore-type:

ohmyscrapper scrap-urls --ignore-type

And we can ask to make it recursively adding the argument --recursive:

ohmyscrapper scrap-urls --recursive

!!! important: we are not sure about blocks we can have for excess of requests

And we can finally export with the command:

ohmyscrapper export
ohmyscrapper export --file=output/urls-simplified.csv --simplify
ohmyscrapper report

That's the basic usage! But you can understand more using the help:

ohmyscrapper --help

License

This package is distributed under the MIT license.

Project details

Release history Release notifications | RSS feed

0.9.5

Feb 9, 2026

0.9.4

Feb 8, 2026

0.9.3

Feb 6, 2026

0.9.2

Feb 5, 2026

0.9.1

Feb 4, 2026

0.9.0

Feb 4, 2026

0.8.6

Feb 4, 2026

0.8.5

Feb 4, 2026

0.8.4

Feb 2, 2026

0.8.3

Feb 2, 2026

0.8.2

Jan 13, 2026

0.8.1

Dec 31, 2025

0.8.0

Dec 31, 2025

0.7.4

Dec 30, 2025

0.7.3

Dec 30, 2025

0.7.2

Dec 29, 2025

0.7.1

Dec 29, 2025

0.7.0

Dec 28, 2025

0.6.1

Dec 28, 2025

0.6.0

Dec 26, 2025

0.5.3

Dec 22, 2025

0.5.2 yanked

Dec 22, 2025

Reason this release was yanked:

Problem not solved

0.5.1

Dec 20, 2025

0.5.0

Dec 19, 2025

This version

0.4.0

Dec 18, 2025

0.3.4

Dec 18, 2025

0.3.2

Dec 18, 2025

0.3.1

Dec 18, 2025

0.3.0

Dec 17, 2025

0.2.3

Dec 17, 2025

0.2.2 yanked

Dec 17, 2025

Reason this release was yanked:

Bad documentation

0.2.1 yanked

Dec 17, 2025

Reason this release was yanked:

Da documentation

0.1.1 yanked

Dec 17, 2025

Reason this release was yanked:

Bad documentation

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ohmyscrapper-0.4.0.tar.gz (12.0 kB view details)

Uploaded Dec 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ohmyscrapper-0.4.0-py3-none-any.whl (16.6 kB view details)

Uploaded Dec 18, 2025 Python 3

File details

Details for the file ohmyscrapper-0.4.0.tar.gz.

File metadata

Download URL: ohmyscrapper-0.4.0.tar.gz
Upload date: Dec 18, 2025
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ohmyscrapper-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`a43da36c35e9475a1a3e9cb97882e322afba71df6f733721d40d5835ac31b4db`
MD5	`54e6af251360037cb813c5cccfe702bc`
BLAKE2b-256	`488f4b25d89a777500cdf58499433c2f4a9d96ad6e40f090365a92704ebd063a`

See more details on using hashes here.

File details

Details for the file ohmyscrapper-0.4.0-py3-none-any.whl.

File metadata

Download URL: ohmyscrapper-0.4.0-py3-none-any.whl
Upload date: Dec 18, 2025
Size: 16.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ohmyscrapper-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`887decccbee3f5f177beb8c622811403f7f656aa362d89ed408db30177809bac`
MD5	`a2dde0fca13a07fd41d8cda6fbe1a98d`
BLAKE2b-256	`60ee66d98338e0cbc466a8c078668f2cca22a4b6745a5feee8d31c524e6a93b2`

See more details on using hashes here.

ohmyscrapper 0.4.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

🐶 OhMyScrapper - v0.4.0

Scope

Installation

How to use and test (development only)

BUT: if you want to do step by step, here it is:

See Also

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes