Skip to main content

Extract social media links from websites

Project description

Extract Social Media

https://img.shields.io/pypi/v/extract-social-media.svg https://img.shields.io/pypi/pyversions/extract-social-media.svg https://img.shields.io/travis/fluquid/extract-social-media.svg Coverage Status Requirements Status

Extract social media links from websites.

Many websites reference their facebook, twitter, linkedin, youtube accounts and these can be invaluable to gather 360 degree information about a company.

This library allows to extract links or handles for the most commonly used international social media networks.

  • Free software: MIT license
  • Python versions: 2.7, 3.4+

Features

  • Extract social media links/handles from html content
  • Attempts to extract links/handles also from widgets, scripts, etc.
  • Supports most widely used social networks
    • facebook
    • linkedin
    • twitter
    • youtube
    • github
    • google plus
    • pinterest
    • instagram
    • snapchat
    • flipboard
    • flickr
    • weibo
    • periscope
    • telegram
    • soundcloud
    • feedburner
    • vimeo
    • slideshare
    • vkontakte
    • xing

Quickstart

import requests
from html_to_etree import parse_html_bytes
res = requests.get('https://techcrunch.com/contact/')
tree = parse_html_bytes(res.content, res.headers.get('content-type'))

set(find_links_tree(tree))

{'http://pinterest.com/techcrunch/',
 'http://www.youtube.com/user/techcrunch',
 'http://www.linkedin.com/company/techcrunch',
 'https://www.facebook.com/techcrunch',
 'https://flipboard.com/@techcrunch',
 'http://instagram.com/techcrunch',
 'https://plus.google.com/+TechCrunch',
 'https://instagram.com/techcrunch',
 'https://twitter.com/techcrunch'}

Caveats

  • currently finds all social media links on a page
    • need to look into finding most relevant links based on link location, link context, company name, etc.

Credits

This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.

History

0.4.0 (2017-08-18)

  • naive blacklisting for photos, videos, search, tweets, etc.

0.3.0 (2017-08-18)

  • fixed exception when “href” is empty or non-string

0.2.0 (2017-06-08)

  • better test coverage
  • accepting data-href

0.1.0 (unreleased)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for extract-social-media, version 0.4.0
Nom du fichier, taille File type Version de Python Date de publication Hashes
Nom du fichier, taille extract_social_media-0.4.0-py2.py3-none-any.whl (6.2 kB) File type Wheel Version de Python py2.py3 Date de publication Hashes View hashes
Nom du fichier, taille extract-social-media-0.4.0.tar.gz (18.3 kB) File type Source Version de Python Aucune Date de publication Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page