Skip to main content

A module to extract social media URLs from given URL

Project description

Social Media URL Extractor

Overview

The Social Media URL Extractor is a Python module that extracts social media URLs from a given webpage. It supports multiple social media platforms and can be used both as a module in your Python projects or as a command-line tool.

Features

  • Extracts URLs for various social media platforms such as Facebook, Twitter, LinkedIn, Instagram, YouTube, Pinterest, TikTok, Reddit, Snapchat, Tumblr, Medium, GitHub, Flickr, VK, Vimeo, Dailymotion, and Quora.
  • Configurable patterns through a YAML file.
  • Can be used as a Python module or a command-line tool.
  • Optional debug mode for detailed logging.

Installation

  1. Clone the repository: git clone https://github.com/yourusername/social-media-url-extractor.git
    cd social-media-url-extractor

  2. Create and activate a virtual environment: python -m venv venv source venv/bin/activate # On Windows use venv\Scripts\activate

  3. Install the dependencies: pip install -r requirements.txt

  4. Install the package in editable mode: pip install -e .

Usage

As a Command-Line Tool Run the tool with a URL and an optional configuration file:

extract-social-media-urls https://www.example.com --config=src/social_media_url_extractor/config.yaml
Enable debug mode for detailed logging:
extract-social-media-urls https://www.example.com --config=src/social_media_url_extractor/config.yaml --debug

As a Python Module

Import the module and create an instance of SocialMediaURLExtractor:

from social_media_url_extractor import SocialMediaURLExtractor
extractor = SocialMediaURLExtractor(config_path='src/social_media_url_extractor/config.yaml')
urls = extractor.extract_urls('https://www.example.com')
for platform, links in urls.items():
    if links:
        print(f'{platform.capitalize()}:')
        for link in links:
            print(f'  - {link}')

Configuration

The URL patterns for different social media platforms are specified in a YAML configuration file. Here is an example config.yaml:

patterns:
  facebook:  
    - "https?://(www\\.)?facebook\\.com/profile\\.php\\?id=[0-9]+/?"  
    - "https?://(www\\.)?facebook\\.com/[a-zA-Z0-9_\\-\\.]+/?"  
    - "https?://(www\\.)?fb\\.com/[a-zA-Z0-9_\\-\\.]+/?"  
  twitter:  
    - "https?://(www\\.)?twitter\\.com/[a-zA-Z0-9_\\-]+/?"  
    - "https?://(www\\.)?x\\.com/[a-zA-Z0-9_\\-]+/?"  
  linkedin:  
    - "https?://(www\\.)?linkedin\\.com/in/[a-zA-Z0-9_\\-]+/?"  
    - "https?://(www\\.)?linkedin\\.com/company/[a-zA-Z0-9_\\-]+/?"  
    - "https?://(www\\.)?linkedin\\.com/school/[a-zA-Z0-9_\\-]+/?"  
  instagram:  
    - "https?://(www\\.)?instagram\\.com/[a-zA-Z0-9_\\-\\.]+/?"  
  youtube:  
    - "https?://(www\\.)?youtube\\.com/[a-zA-Z0-9_\\-]+/?"  
    - "https?://(www\\.)?youtube\\.com/c/[a-zA-Z0-9_\\-]+/?"  
    - "https?://(www\\.)?youtube\\.com/@[a-zA-Z0-9_\\-]+/?"  
  pinterest:  
    - "https?://(www\\.)?pinterest\\.com/[a-zA-Z0-9_\\-]+/?"  
    - "https?://in\\.pinterest\\.com/[a-zA-Z0-9_\\-]+/?"  
  tiktok:  
    - "https?://(www\\.)?tiktok\\.com/@[a-zA-Z0-9_\\-\\.]+/?"  
  reddit:  
    - "https?://(www\\.)?reddit\\.com/user/[a-zA-Z0-9_\\-]+/?"  
    - "https?://(www\\.)?reddit\\.com/r/[a-zA-Z0-9_\\-]+/?"  
  snapchat:  
    - "https?://(www\\.)?snapchat\\.com/add/[a-zA-Z0-9_\\-\\.]+/?"  
  tumblr:  
    - "https?://(www\\.)?[a-zA-Z0-9_\\-]+\\.tumblr\\.com/?"  
  medium:  
    - "https?://(www\\.)?medium\\.com/@[a-zA-Z0-9_\\-\\.]+/?"  
  github:  
    - "https?://(www\\.)?github\\.com/[a-zA-Z0-9_\\-\\.]+/?"  
  flickr:  
    - "https?://(www\\.)?flickr\\.com/photos/[a-zA-Z0-9_\\-]+/?"  
  vk:  
    - "https?://(www\\.)?vk\\.com/[a-zA-Z0-9_\\-]+/?"  
  vimeo:  
    - "https?://(www\\.)?vimeo\\.com/[a-zA-Z0-9_\\-]+/?"  
  dailymotion:  
    - "https?://(www\\.)?dailymotion\\.com/[a-zA-Z0-9_\\-]+/?"  
  quora:  
    - "https?://(www\\.)?quora\\.com/profile/[a-zA-Z0-9_\\-]+/?"  

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

social-media-url-extractor-0.1.1.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

social_media_url_extractor-0.1.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file social-media-url-extractor-0.1.1.tar.gz.

File metadata

File hashes

Hashes for social-media-url-extractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1185324f670cde063ce7975f2142bc54056f57151a8182bb9b374f2b4c6d71ea
MD5 3277000c517decdcdd6a38608e978b7d
BLAKE2b-256 f6942e74735f2921f350184db7d38f288557ab64bea6606a8f00e419432cffd9

See more details on using hashes here.

File details

Details for the file social_media_url_extractor-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for social_media_url_extractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f07b6217d679d75dd7fb2f4615a007d59f9262d2024c842509b52eb8fde242a7
MD5 ad4273620cbd818c6e495ae26d6c63dd
BLAKE2b-256 92acea9a9a2937331ecc372808e2134583d798803971ff25d5f965169103211d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page