A module to extract social media URLs from given URL
Project description
Social Media URL Extractor
Overview
The Social Media URL Extractor is a Python module that extracts social media URLs from a given webpage. It supports multiple social media platforms and can be used both as a module in your Python projects or as a command-line tool.
Features
- Extracts URLs for various social media platforms such as Facebook, Twitter, LinkedIn, Instagram, YouTube, Pinterest, TikTok, Reddit, Snapchat, Tumblr, Medium, GitHub, Flickr, VK, Vimeo, Dailymotion, and Quora.
- Configurable patterns through a YAML file.
- Can be used as a Python module or a command-line tool.
- Optional debug mode for detailed logging.
Installation
-
Clone the repository: git clone https://github.com/yourusername/social-media-url-extractor.git
cd social-media-url-extractor -
Create and activate a virtual environment: python -m venv venv source venv/bin/activate # On Windows use venv\Scripts\activate
-
Install the dependencies: pip install -r requirements.txt
-
Install the package in editable mode: pip install -e .
Usage
As a Command-Line Tool Run the tool with a URL and an optional configuration file:
extract-social-media-urls https://www.example.com --config=src/social_media_url_extractor/config.yaml
Enable debug mode for detailed logging:
extract-social-media-urls https://www.example.com --config=src/social_media_url_extractor/config.yaml --debug
As a Python Module
Import the module and create an instance of SocialMediaURLExtractor:
from social_media_url_extractor import SocialMediaURLExtractor
extractor = SocialMediaURLExtractor(config_path='src/social_media_url_extractor/config.yaml')
urls = extractor.extract_urls('https://www.example.com')
for platform, links in urls.items():
if links:
print(f'{platform.capitalize()}:')
for link in links:
print(f' - {link}')
Configuration
The URL patterns for different social media platforms are specified in a YAML configuration file. Here is an example config.yaml:
patterns:
facebook:
- "https?://(www\\.)?facebook\\.com/profile\\.php\\?id=[0-9]+/?"
- "https?://(www\\.)?facebook\\.com/[a-zA-Z0-9_\\-\\.]+/?"
- "https?://(www\\.)?fb\\.com/[a-zA-Z0-9_\\-\\.]+/?"
twitter:
- "https?://(www\\.)?twitter\\.com/[a-zA-Z0-9_\\-]+/?"
- "https?://(www\\.)?x\\.com/[a-zA-Z0-9_\\-]+/?"
linkedin:
- "https?://(www\\.)?linkedin\\.com/in/[a-zA-Z0-9_\\-]+/?"
- "https?://(www\\.)?linkedin\\.com/company/[a-zA-Z0-9_\\-]+/?"
- "https?://(www\\.)?linkedin\\.com/school/[a-zA-Z0-9_\\-]+/?"
instagram:
- "https?://(www\\.)?instagram\\.com/[a-zA-Z0-9_\\-\\.]+/?"
youtube:
- "https?://(www\\.)?youtube\\.com/[a-zA-Z0-9_\\-]+/?"
- "https?://(www\\.)?youtube\\.com/c/[a-zA-Z0-9_\\-]+/?"
- "https?://(www\\.)?youtube\\.com/@[a-zA-Z0-9_\\-]+/?"
pinterest:
- "https?://(www\\.)?pinterest\\.com/[a-zA-Z0-9_\\-]+/?"
- "https?://in\\.pinterest\\.com/[a-zA-Z0-9_\\-]+/?"
tiktok:
- "https?://(www\\.)?tiktok\\.com/@[a-zA-Z0-9_\\-\\.]+/?"
reddit:
- "https?://(www\\.)?reddit\\.com/user/[a-zA-Z0-9_\\-]+/?"
- "https?://(www\\.)?reddit\\.com/r/[a-zA-Z0-9_\\-]+/?"
snapchat:
- "https?://(www\\.)?snapchat\\.com/add/[a-zA-Z0-9_\\-\\.]+/?"
tumblr:
- "https?://(www\\.)?[a-zA-Z0-9_\\-]+\\.tumblr\\.com/?"
medium:
- "https?://(www\\.)?medium\\.com/@[a-zA-Z0-9_\\-\\.]+/?"
github:
- "https?://(www\\.)?github\\.com/[a-zA-Z0-9_\\-\\.]+/?"
flickr:
- "https?://(www\\.)?flickr\\.com/photos/[a-zA-Z0-9_\\-]+/?"
vk:
- "https?://(www\\.)?vk\\.com/[a-zA-Z0-9_\\-]+/?"
vimeo:
- "https?://(www\\.)?vimeo\\.com/[a-zA-Z0-9_\\-]+/?"
dailymotion:
- "https?://(www\\.)?dailymotion\\.com/[a-zA-Z0-9_\\-]+/?"
quora:
- "https?://(www\\.)?quora\\.com/profile/[a-zA-Z0-9_\\-]+/?"
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file
social-media-url-extractor-0.1.1.tar.gz.File metadata
File hashes
1185324f670cde063ce7975f2142bc54056f57151a8182bb9b374f2b4c6d71ea3277000c517decdcdd6a38608e978b7df6942e74735f2921f350184db7d38f288557ab64bea6606a8f00e419432cffd9See more details on using hashes here.