Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

Extract social media links from websites

Project description

Extract Social Media

https://img.shields.io/pypi/v/extract-social-media.svg https://img.shields.io/pypi/pyversions/extract-social-media.svg https://img.shields.io/travis/fluquid/extract-social-media.svg Coverage Status Requirements Status

Extract social media links from websites.

Many websites reference their facebook, twitter, linkedin, youtube accounts and these can be invaluable to gather 360 degree information about a company.

This library allows to extract links or handles for the most commonly used international social media networks.

  • Free software: MIT license
  • Python versions: 2.7, 3.4+

Features

  • Extract social media links/handles from html content
  • Attempts to extract links/handles also from widgets, scripts, etc.
  • Supports most widely used social networks
    • facebook
    • linkedin
    • twitter
    • youtube
    • github
    • google plus
    • pinterest
    • instagram
    • snapchat
    • flipboard
    • flickr
    • weibo
    • periscope
    • telegram
    • soundcloud
    • feedburner
    • vimeo
    • slideshare
    • vkontakte
    • xing

Quickstart

import requests
from html_to_etree import parse_html_bytes
res = requests.get('https://techcrunch.com/contact/')
tree = parse_html_bytes(res.content, res.headers.get('content-type'))

set(find_links_tree(tree))

{'http://pinterest.com/techcrunch/',
 'http://www.youtube.com/user/techcrunch',
 'http://www.linkedin.com/company/techcrunch',
 'https://www.facebook.com/techcrunch',
 'https://flipboard.com/@techcrunch',
 'http://instagram.com/techcrunch',
 'https://plus.google.com/+TechCrunch',
 'https://instagram.com/techcrunch',
 'https://twitter.com/techcrunch'}

Caveats

  • currently finds all social media links on a page
    • need to look into finding most relevant links based on link location, link context, company name, etc.

Credits

This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.

History

0.4.0 (2017-08-18)

  • naive blacklisting for photos, videos, search, tweets, etc.

0.3.0 (2017-08-18)

  • fixed exception when “href” is empty or non-string

0.2.0 (2017-06-08)

  • better test coverage
  • accepting data-href

0.1.0 (unreleased)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for extract-social-media, version 0.4.0
Filename, size File type Python version Upload date Hashes
Filename, size extract_social_media-0.4.0-py2.py3-none-any.whl (6.2 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size extract-social-media-0.4.0.tar.gz (18.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page