Extract links from bio link services like Linktree
Project description
Biosites
A Python package for extracting links from bio link aggregator services like Linktree, inpock, lit.link, and others.
Features
- Extract links from 10+ popular bio link services
- Automatic redirect resolution for shortened URLs
- Configurable user agents
- Async/await support (Async API Only)
- Type-safe with Pydantic models
- Command-line interface
Supported Services
- Linktree (linktr.ee, linktree.com)
- Lit.link (lit.link)
- Littly (litt.ly)
- Inpock (link.inpock.co.kr)
- Bio.site (bio.site)
- Instabio (instabio.cc)
- LinkBio (linkbio.co)
- Link.me (link.me)
- Generic HTML link extraction for unsupported services
Installation
pip install biosites
Or with poetry/uv:
poetry add biosites
# or
uv add biosites
Usage
Command Line
Extract links from a bio page:
biosites https://linktr.ee/username
Extract from shortened URL (automatically follows redirects):
biosites https://bit.ly/shortened-link
Python API
import asyncio
from biosites import LinkExtractor
async def main():
extractor = LinkExtractor()
# Extract links from a bio page
result = await extractor.extract("https://linktr.ee/username")
# Access extracted links
for link in result.links:
print(f"{link.title}: {link.url}")
if link.metadata:
print(f" Metadata: {link.metadata}")
# Check service type
print(f"Service: {result.service_type}")
asyncio.run(main())
Custom User Agent
from biosites import LinkExtractor
extractor = LinkExtractor(
user_agent="MyBot/1.0 (https://example.com/bot)"
)
Check if URL is Supported
from biosites import LinkExtractor
extractor = LinkExtractor()
supported, service = extractor.can_handle("https://linktr.ee/username")
if supported:
print(f"URL will be handled by {service}")
Handle Redirects
The package automatically handles redirected URLs from shorteners:
result = await extractor.extract("https://bit.ly/shortened")
# Automatically follows to final bio link service
# Access redirect information
if result.metadata and "redirect_chain" in result.metadata:
print(f"Original URL: {result.metadata['original_url']}")
print(f"Redirect chain: {result.metadata['redirect_chain']}")
Development
Setup
# Clone the repository
git clone https://github.com/yourusername/biosites.git
cd biosites
# Install dependencies
uv venv
uv pip install -e ".[dev]"
Running Tests
pytest
Type Checking
mypy biosites
Linting
ruff check biosites tests
ruff format biosites tests
Architecture
The package uses a modular architecture:
- Base Extractor: Abstract base class defining the interface
- Service Extractors: Specialized extractors for each bio service
- Link Extractor: Main entry point that routes to appropriate extractor
- Redirect Handler: Handles URL shorteners and redirects
- Models: Pydantic models for type safety
Each extractor implements:
can_handle(url): Check if the extractor supports the URLextract_links(html, url): Extract links from the HTML content
Adding New Services
To add support for a new bio link service:
- Create a new extractor in
biosites/extractors/ - Inherit from
BaseLinkExtractor - Implement
can_handle()andextract_links()methods - Register the extractor in
biosites/extractor.py
Example:
from biosites.base import BaseLinkExtractor
from biosites.models import ExtractedLink
class NewServiceExtractor(BaseLinkExtractor):
@classmethod
def can_handle(cls, url: str) -> bool:
return "newservice.com" in url.lower()
async def extract_links(self, html: str, url: str) -> list[ExtractedLink]:
# Parse HTML and extract links
links = []
# ... extraction logic ...
return links
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/new-service) - Write tests for your changes
- Ensure all tests pass and type checking is clean
- Commit your changes with descriptive messages
- Push to your branch and create a Pull Request
Requirements
- Python 3.10+
- aiohttp
- pydantic
- beautifulsoup4
- selectolax
- click
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file biosites-1.0.0.tar.gz.
File metadata
- Download URL: biosites-1.0.0.tar.gz
- Upload date:
- Size: 302.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c9cc8504acb77a3ddd657da2db82df63588512943eabe038868b9732fed9ab3
|
|
| MD5 |
25d41f2fd07ce57999ae7742533e9599
|
|
| BLAKE2b-256 |
0b3b46d5497febcc65fa36a197f8a66b2d51c17b7aa624b76033adeaf32f99c5
|
Provenance
The following attestation bundles were made for biosites-1.0.0.tar.gz:
Publisher:
publish.yml on ssut/biosites
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biosites-1.0.0.tar.gz -
Subject digest:
7c9cc8504acb77a3ddd657da2db82df63588512943eabe038868b9732fed9ab3 - Sigstore transparency entry: 748691478
- Sigstore integration time:
-
Permalink:
ssut/biosites@e690db09856c2f89608f91e81c5f7b4beed69c8e -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/ssut
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e690db09856c2f89608f91e81c5f7b4beed69c8e -
Trigger Event:
push
-
Statement type:
File details
Details for the file biosites-1.0.0-py3-none-any.whl.
File metadata
- Download URL: biosites-1.0.0-py3-none-any.whl
- Upload date:
- Size: 28.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9abe9ad81728837d7c65078f31e8bd214a407dfcaff5957f288b7a4101f5c2af
|
|
| MD5 |
5eedf9a8e613f7f2ce6ce4206a9c0ad1
|
|
| BLAKE2b-256 |
80b82bd94178bc1aac9300dd91ee41bc61fcd040e8ee859e570ffcaf7f658e82
|
Provenance
The following attestation bundles were made for biosites-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on ssut/biosites
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
biosites-1.0.0-py3-none-any.whl -
Subject digest:
9abe9ad81728837d7c65078f31e8bd214a407dfcaff5957f288b7a4101f5c2af - Sigstore transparency entry: 748691479
- Sigstore integration time:
-
Permalink:
ssut/biosites@e690db09856c2f89608f91e81c5f7b4beed69c8e -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/ssut
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e690db09856c2f89608f91e81c5f7b4beed69c8e -
Trigger Event:
push
-
Statement type: