Skip to main content

Extract social media links from websites

Project description

Extract Social Media

https://img.shields.io/pypi/v/extract-social-media.svg https://img.shields.io/pypi/pyversions/extract-social-media.svg https://img.shields.io/travis/fluquid/extract-social-media.svg Coverage Status Requirements Status

Extract social media links from websites.

Many websites reference their facebook, twitter, linkedin, youtube accounts and these can be invaluable to gather 360 degree information about a company.

This library allows to extract links or handles for the most commonly used international social media networks.

  • Free software: MIT license

  • Python versions: 2.7, 3.4+

Features

  • Extract social media links/handles from html content

  • Attempts to extract links/handles also from widgets, scripts, etc.

  • Supports most widely used social networks

    • facebook

    • linkedin

    • twitter

    • youtube

    • github

    • google plus

    • pinterest

    • instagram

    • snapchat

    • flipboard

    • flickr

    • weibo

    • periscope

    • telegram

    • soundcloud

    • feedburner

    • vimeo

    • slideshare

    • vkontakte

    • xing

Quickstart

import requests
from html_to_etree import parse_html_bytes
res = requests.get('https://techcrunch.com/contact/')
tree = parse_html_bytes(res.content, res.headers.get('content-type'))

set(find_links_tree(tree))

{'http://pinterest.com/techcrunch/',
 'http://www.youtube.com/user/techcrunch',
 'http://www.linkedin.com/company/techcrunch',
 'https://www.facebook.com/techcrunch',
 'https://flipboard.com/@techcrunch',
 'http://instagram.com/techcrunch',
 'https://plus.google.com/+TechCrunch',
 'https://instagram.com/techcrunch',
 'https://twitter.com/techcrunch'}

Caveats

  • currently finds all social media links on a page

    • need to look into finding most relevant links based on link location, link context, company name, etc.

Credits

This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.

History

0.4.0 (2017-08-18)

  • naive blacklisting for photos, videos, search, tweets, etc.

0.3.0 (2017-08-18)

  • fixed exception when “href” is empty or non-string

0.2.0 (2017-06-08)

  • better test coverage

  • accepting data-href

0.1.0 (unreleased)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extract-social-media-0.4.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

extract_social_media-0.4.0-py2.py3-none-any.whl (6.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file extract-social-media-0.4.0.tar.gz.

File metadata

File hashes

Hashes for extract-social-media-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c062972a8b981a5be080c25a4d2e650ef7e875b8898ac7074424f3e653baf574
MD5 d537fa495570534def2b7d758f02dd31
BLAKE2b-256 b931117982deaf788e74710741e32467cd94e52d3e51bf6724ed7e41a53ea8cb

See more details on using hashes here.

File details

Details for the file extract_social_media-0.4.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for extract_social_media-0.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 de3cb65b312f0ea7f0e60edbdbe6e567a3c957fb91ceca77e73a0f8622fb8673
MD5 eff18147197fbab62270906dde3cde38
BLAKE2b-256 6698d21c9cecb686d4be026bee1a890e3b0cd0b7837e99486c2a68a085308d53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page