copyright-stats-extractor parses headlines/articles on digital copyright enforcement to auto‑extract stats like takedown counts, year, and parties.
Project description
Copyright Stats Extractor
A lightweight utility package that parses news headlines or short articles about digital copyright enforcement and automatically extracts key statistics such as the number of takedown requests processed, the year, and the entities involved.
The extractor uses a large language model (LLM) under the hood; by default it uses ChatLLM7 from the langchain_llm7 package, but you can plug in any LangChain chat model you prefer.
📦 Installation
pip install copyright_stats_extractor
🚀 Getting Started
from copyright_stats_extractor import copyright_stats_extractor
# Example text to analyse
user_input = """
In 2023, the Digital Society Agency issued 12,000 takedown requests against
unauthorized streaming sites. Major platforms such as StreamTop and IndiePlay
reported compliance with 95% of the requests. These actions were part of
the global crackdown on digital piracy led by the International Digital
Rights Alliance (IDRA).
"""
# Use the default LLM7 implementation
stats = copyright_stats_extractor(user_input)
print(stats)
Output
[ "year: 2023", "takedown_requests: 12,000", "platforms_involved: StreamTop, IndiePlay", "authority: International Digital Rights Alliance (IDRA)" ]
🔌 Using a Custom LLM
You can provide any LangChain chat model. Examples:
OpenAI
from langchain_openai import ChatOpenAI
from copyright_stats_extractor import copyright_stats_extractor
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
stats = copyright_stats_extractor(user_input, llm=llm)
Anthropic
from langchain_anthropic import ChatAnthropic
from copyright_stats_extractor import copyright_stats_extractor
llm = ChatAnthropic(model="claude-3-5-sonnet-20240620", temperature=0.2)
stats = copyright_stats_extractor(user_input, llm=llm)
Google Gemini
from langchain_google_genai import ChatGoogleGenerativeAI
from copyright_stats_extractor import copyright_stats_extractor
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.2)
stats = copyright_stats_extractor(user_input, llm=llm)
Note: Any LangChain-compliant chat model can be supplied via the
llmargument.
⚙️ Configuration
| Parameter | Type | Optional? | Default | Description |
|---|---|---|---|---|
user_input |
str |
Required | – | Text to analyze |
api_key |
Optional[str] |
Yes | None |
API key for the default ChatLLM7. If omitted, the package first looks for the LLM7_API_KEY environment variable, then falls back to "None" (you will get an error if no key). |
llm |
Optional[BaseChatModel] |
Yes | None |
Custom LangChain chat model to use instead of the default ChatLLM7. |
The default ChatLLM7 uses the free tier which is more than adequate for most use cases. For higher throughput, supply a personal API key:
export LLM7_API_KEY="your_api_key_here"
or pass it directly:
stats = copyright_stats_extractor(user_input, api_key="your_api_key_here")
You can obtain a free API key by registering at https://token.llm7.io/.
📄 Documentation of Output
The function returns a list of strings, each string containing a key‑value pair extracted from the input. The keys correspond to the statistics recognized by the model (e.g. year, takedown_requests, platforms_involved, authority). The format of each string is controlled by an internal prompt that enforces a regular‑expression pattern. If you need a different output structure, customize the prompt and the regex accordingly.
📈 Limitations
- The extraction accuracy depends on the quality of the LLM prompt and the input text length.
- The default free tier for ChatLLM7 may impose request limits; if you hit them, upgrade your API key.
🐛 Issues
Please file bugs or feature requests at the GitHub issues tracker:
https://github.com/chigwell/copyright-stats-extractor/issues
📢 Author
- Eugene Evstafev
Email: hi@euegne.plus
GitHub: https://github.com/chigwell
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file copyright_stats_extractor-2025.12.22110614.tar.gz.
File metadata
- Download URL: copyright_stats_extractor-2025.12.22110614.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03832f464e0cde7da40f72be7bd0c7889c039bfbd7f0dede414b8b6b33fdbe23
|
|
| MD5 |
ed67856bc2f9a9e3bec3f63b2ae1c320
|
|
| BLAKE2b-256 |
7800bacb22d06b9d8c6e5d83ef1382f1fdfc97021613f35550195abfd23f0bc0
|
File details
Details for the file copyright_stats_extractor-2025.12.22110614-py3-none-any.whl.
File metadata
- Download URL: copyright_stats_extractor-2025.12.22110614-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85bb9f33cddfe6bb95e05c8b915987784f0dcbae555c296e8534d87cd5f8c02e
|
|
| MD5 |
633b847220e50dfee7afe3082ae653c8
|
|
| BLAKE2b-256 |
769090d1785ec0a91dea8bcefdf2d1ed393fca20fe24e4aad11ccdffa4b027ca
|