A python module to generate link previews.
Project description
SneakPeek
A python module and a minimalistic server to generate link previews.
What is supported
- Any page which supports Open Graph Protocol (which most sane websites do)
- Special handling for sites like
Installation
Run the following to install
pip install sneakpeek
Usage as a Python Module
From a URL
>>> import sneakpeek
>>> from pprint import pprint
>>> link = sneakpeek.SneakPeek("https://www.youtube.com/watch?v=dQw4w9WgXcQ")
>>> link.fetch()
>>> link.is_valid()
True
>>> pprint(link)
{'description': 'The official video for “Never Gonna Give You Up” by Rick '
'AstleyTaken from the album ‘Whenever You Need Somebody’ – '
'deluxe 2CD and digital deluxe out 6th May ...',
'domain': 'www.youtube.com',
'image': 'https://i.ytimg.com/vi/dQw4w9WgXcQ/maxresdefault.jpg',
'image:height': '720',
'image:width': '1280',
'scrape': False,
'site_name': 'YouTube',
'title': 'Rick Astley - Never Gonna Give You Up (Official Music Video)',
'type': 'video.other',
'url': 'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
'video:height': '720',
'video:secure_url': 'https://www.youtube.com/embed/dQw4w9WgXcQ',
'video:tag': 'never gonna give you up karaoke',
'video:type': 'text/html',
'video:url': 'https://www.youtube.com/embed/dQw4w9WgXcQ',
'video:width': '1280'}
>>> link = sneakpeek.SneakPeek(url="https://codingcoffee.dev")
>>> link.fetch()
>>> pprint(link)
{'description': 'A generalist with multi faceted interests and extensive '
'experience with DevOps, System Design and Full Stack '
'Development. I like blogging about things which interest me, '
'have a niche for optimizing and customizing things to the '
'very last detail, this includes my text editor and operating '
'system alike.',
'domain': 'codingcoffee.dev',
'image': 'https://www.gravatar.com/avatar/7ecdc5e1441ecd501faaf42a6ab9d6c0?s=200',
'scrape': False,
'title': 'Ameya Shenoy',
'type': 'website',
'url': 'https://codingcoffee.dev'}
Use scrape=True
to fetch data using scraping instead of relying on open graph tags
>>> link = sneakpeek.SneakPeek(url="https://news.ycombinator.com/item?id=23812063", scrape=True)
>>> link.fetch()
>>> pprint(link)
{'description': '',
'domain': 'news.ycombinator.com',
'image': 'y18.gif',
'scrape': True,
'title': 'WireGuard as VPN Server on Kubernetes with AdBlocking | Hacker News',
'type': 'other',
'url': 'https://news.ycombinator.com/item?id=23812063'}
From HTML
>>> HTML = """
... <html xmlns:og="http://ogp.me/ns">
... <head>
... <title>The Rock (1996)</title>
... <meta property="og:title" content="The Rock" />
... <meta property="og:description" content="The Rock: Directed by Michael Bay. With Sean Connery, Nicolas Cage, Ed Harris, John Spencer. A mild-mannered chemist and an ex-con must lead the counterstrike when a rogue group of military men, led by a renegade general, threaten a nerve gas attack from Alcatraz against San Francisco.">
... <meta property="og:type" content="movie" />
... <meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
... <meta property="og:image" content="https://m.media-amazon.com/images/M/MV5BZDJjOTE0N2EtMmRlZS00NzU0LWE0ZWQtM2Q3MWMxNjcwZjBhXkEyXkFqcGdeQXVyNDk3NzU2MTQ@._V1_FMjpg_UX1000_.jpg">
... </head>
... </html>
... """
>>> movie = sneakpeek.SneakPeek(html=HTML)
>>> movie.is_valid()
True
>>> pprint(movie)
{'description': 'The Rock: Directed by Michael Bay. With Sean Connery, Nicolas '
'Cage, Ed Harris, John Spencer. A mild-mannered chemist and an '
'ex-con must lead the counterstrike when a rogue group of '
'military men, led by a renegade general, threaten a nerve gas '
'attack from Alcatraz against San Francisco.',
'domain': None,
'image': 'https://m.media-amazon.com/images/M/MV5BZDJjOTE0N2EtMmRlZS00NzU0LWE0ZWQtM2Q3MWMxNjcwZjBhXkEyXkFqcGdeQXVyNDk3NzU2MTQ@._V1_FMjpg_UX1000_.jpg',
'scrape': False,
'title': 'The Rock',
'type': 'movie',
'url': 'http://www.imdb.com/title/tt0117500/'}
Usage as a Server
A simple server using FastAPI and uvicorn is used to serve the requests.
sneekpeek serve
You can view the docs at http://localhost:9000/docs
Usage as a CLI
sneakpeek preview --url "https://github.com/codingcoffee/" | jq
{
"domain": "github.com",
"scrape": false,
"url": "https://github.com/codingCoffee",
"title": "codingCoffee - Overview",
"type": "profile",
"image": "https://avatars.githubusercontent.com/u/13611153?v=4?s=400",
"description": "Automate anything and everything 🙋♂️. codingCoffee has 68 repositories available. Follow their code on GitHub.",
"error": null,
"image:alt": "Automate anything and everything 🙋♂️. codingCoffee has 68 repositories available. Follow their code on GitHub.",
"site_name": "GitHub"
}
Docker
As a Server
docker run -it --rm -p 9000:9000 codingcoffee/sneakpeek -- serve --host 0.0.0.0
As a CLI
docker run -it --rm -p 9000:9000 codingcoffee/sneakpeek -- preview --url "https://github.com/codingcoffee"
Configuration
- Sign up for a developer account on twitter here
- Create an app
- Add the following variables as ENV vars
TWITTER_CONSUMER_KEY="sample"
TWITTER_CONSUMER_SECRET="sample"
TWITTER_ACCESS_TOKEN="sample"
TWITTER_ACCESS_TOKEN_SECRET="sample"
Development
pip install -U poetry
git clone https://github.com/codingcoffee/sneakpeek
cd sneakpeek
poetry install
Running Tests
poetry run pytest
- Tested Websites
TODO
- Instagram (using instagram-scraper)
- https://joinfishbowl.com/post_v3ibj1p63t
- CI/CD for publishing to PyPi
Contribution
Have better suggestions to optimize the server image? Found some typos? Need special handling for a new website? Found a bug? Go ahead and create an Issue! Contributions of any kind welcome!
Want to work on a TODO? Its always a good idea to talk about what are going to do before you actually start it, so frustration can be avoided.
Some rules for coding:
- Use the code style the project uses
- For each feature, make a seperate branch, so it can be reviewed separately
- Use commits with a good description, so everyone can see what you did
License
The code in this repository has been released under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sneakpeek-0.9.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02202dabb3cb66eefe7b412ea7c9e99dac003a8f20ab5559584f4c0c2c7655d5 |
|
MD5 | 306120a1bdcf14737dedaeac764173a4 |
|
BLAKE2b-256 | 9a68225b950593b5d947d0a5b28655e1b98d4b6e3075cbf7e74877c90ce62bae |