Skip to main content

A standalone library for determining if works are likely in the public domain using multiple heuristics and validation methods

Project description

isitpublic

PyPI version License: AGPL v3 Dead Code: 0

A lightweight, standalone Python library for determining if works are likely in the public domain using multiple heuristics and validation methods.

Overview

The isitpublic library provides focused tools to assess whether a work is likely in the public domain based on:

  • Title and content analysis for public domain indicators
  • Heuristic checks for historical authors and time periods
  • Copyright calculations based on author death years or publication dates
  • Jurisdiction-specific copyright law analysis
  • Advanced analysis for databases, audio/video, and software
  • JSON-based data storage for configuration and results

Installation

pip install isitpublic

Quick Start

import asyncio
from isitpublic import PublicDomainValidator, ContentItem

# Create a validator instance
validator = PublicDomainValidator()

# Create a content item to validate
item = ContentItem(
    title="Shakespeare's Hamlet",
    content="A classic play by William Shakespeare",
    snippet="To be or not to be..."
)

# Check if the item is likely in the public domain (async function)
async def main():
    is_pd = await validator.is_likely_public_domain(item)
    print(f"Is likely public domain: {is_pd}")  # True

# Run the async function
asyncio.run(main())

Core Features

1. Content-based Validation

The library checks titles and content for public domain indicators:

import asyncio
from isitpublic import validate_public_domain_status, ContentItem

item = ContentItem(
    title="A Public Domain Work",
    content="This work is in the public domain"
)

# Async function for validation
async def main():
    is_pd = await validate_public_domain_status(item)
    print(is_pd)  # True

# Run the async function
asyncio.run(main())

2. Heuristic Analysis

The library applies heuristics based on:

  • Historical authors (Shakespeare, Darwin, etc.)
  • Time periods (19th century, ancient, etc.)
  • Content types (biblical, folk tales, etc.)

3. Copyright Calculation

Calculate public domain status based on copyright information:

from isitpublic import calculate_pd_from_metadata

metadata = {
    "author_death_year": 1601,  # Over 400 years ago!
    "publication_year": 1600,
    "country": "worldwide"
}

result = calculate_pd_from_metadata(metadata)
print(result)  # {'is_public_domain': True, 'pd_year': 1672, ...}

4. Jurisdiction-Specific Analysis

Comprehensive analysis across multiple jurisdictions:

from isitpublic import PublicDomainValidator

validator = PublicDomainValidator()

# Generate a comprehensive jurisdiction report
report = validator.generate_jurisdiction_report(
    author_death_year=1601,  # Shakespeare died in 1601
    work_title="Shakespeare's Works",
    work_type="individual"
)

print(f"PD in {report['risk_assessment']['public_domain_percentage']}% of jurisdictions")
print(f"Legal recommendation: {report['legal_recommendations'][0]}")

5. Database and Compilation Rights Recognition

Advanced analysis of database and compilation rights beyond standard copyright:

# Analyze database rights that exist in addition to copyright
db_analysis = validator.analyze_database_compilation_rights(
    title="Historical Database",
    creation_year=2000,
    compilation_type="database",
    jurisdiction="DE",  # EU jurisdiction with database rights
    substantial_investment_claim=True,
    is_licensed_dataset=False
)

print(f"Has database rights: {db_analysis['database_rights_analysis']['has_rights']}")
print(f"In public domain: {db_analysis['database_rights_analysis']['is_public_domain']}")
print(f"Risk level: {db_analysis['risk_level']}")

6. Audio/Video Copyright Analysis (Sampling & Fair Use)

Analyze copyright status for audio, video, and sampled content:

# Analyze audio/video content with sampling considerations
av_analysis = validator.analyze_audio_video_copyright(
    title="Musical Composition",
    creator="Artist Name",
    creation_year=1990,
    sampling_info={
        "sampled_from_year": 1950,  # Original sample source
        "sample_length_seconds": 5,  # Length of sample
        "sampled_from_work": "Old Song"
    },
    intended_use="commercial"  # "personal", "educational", "commercial"
)

print(f"Original work PD: {av_analysis['is_original_pd']}")
print(f"Sampling analysis: {av_analysis['sampling_analysis']}")
print(f"Risk level: {av_analysis['risk_level']}")
print(f"Recommendations: {av_analysis['recommendations']}")

7. Software and Source Code Analysis

Analyze software licenses and public domain status for code:

# Analyze if software is in public domain based on license
software_analysis = validator.analyze_software_source_pd(
    project_name="Open Source Project",
    license_type="MIT",  # or "GPL-3.0", "CC0", "Unlicense", etc.
    creation_year=2015,
    author_death_year=2000,  # For individual-authored software
    repository_info={
        "has_license_file": True,
        "license_spdx_id": "MIT-0",
        "copyright_holders": ["Author Name"]
    }
)

print(f"Is in public domain: {software_analysis['is_pd']}")
print(f"License analysis: {software_analysis['license_analysis']}")
print(f"Risk level: {software_analysis['risk_level']}")

8. Database Rights and Compilation Analysis

Handle special rights for databases beyond standard copyright:

# Analyze database rights which vary significantly by jurisdiction
db_rights = validator.analyze_database_compilation_rights(
    title="Statistical Database",
    creation_year=2010,
    compilation_type="database",
    jurisdiction="EU",  # EU has special database rights (sui generis)
    database_contents=["tables", "records", "statistics"],
    substantial_investment_claim=True
)

print(f"Database rights status: {db_rights['database_rights_analysis']['has_rights']}")
print(f"Years until PD: {db_rights['database_rights_analysis']['years_until_pd']}")
print(f"Protection type: {db_rights['database_rights_analysis']['protection_type']}")

9. Performance and Neighboring Rights

Analyze rights in performances beyond the underlying work:

# Analyze performance and neighboring rights (different from the underlying composition)
perf_analysis = validator.analyze_performance_neighboring_rights(
    title="Live Performance Recording",
    performer="Performer Name",
    performance_year=2000,
    recording_year=2001,
    jurisdiction="US"
)

print(f"Performance rights PD: {perf_analysis['performance_rights_analysis']['is_public_domain']}")
print(f"Recording rights PD: {perf_analysis['recording_rights_analysis']['is_public_domain']}")
print(f"Overall risk: {perf_analysis['risk_level']}")

10. Historical Copyright Law Timeline

Track and analyze changes in copyright law over time:

# Add historical law changes to track legal evolution
law_changes = [
    {
        "effective_date": "1995-01-01",
        "terms": 70,  # Extended from life+50 to life+70
        "description": "Extension of copyright term",
        "law_type": "standard",
        "change_reason": "International treaty obligation"
    }
]

# Track timeline of copyright law changes
timeline_result = validator.track_copyright_law_timeline(
    country="DE",
    law_changes=law_changes,
    source="official_government_record",
    is_historical_data=True
)

# Get historical law at a specific date
law_at_date = validator.get_copyright_law_at_date(
    country="DE",
    target_date="1998-06-01",
    law_type="standard"
)

print(f"Laws in effect in 1998: {law_at_date['current_terms']} years")

11. Historical Analysis Reports

Generate comprehensive reports showing law evolution over time:

# Generate historical analysis report for a time period
historical_report = validator.create_historical_analysis_report(
    country="FR",
    start_year=1980,
    end_year=2020,
    include_database_rights=True
)

print(f"Report for {historical_report['country']} ({start_year}-{end_year})")
print(f"Standard changes: {historical_report['standard_changes_count']}")
print(f"Database changes: {historical_report['database_changes_count']}")

# Access year-by-year analysis
for year in ["1985", "1995", "2005"]:
    if year in historical_report['analysis_by_year']:
        year_analysis = historical_report['analysis_by_year'][year]
        print(f"  {year}: Standard={year_analysis['standard_copyright']} years, "
              f"DB Rights={year_analysis['database_rights']} years")

12. Law Change Impact Analysis

Analyze how specific law changes affect work public domain status:

# Analyze the impact of a specific law change on a work
impact_analysis = validator.analyze_impact_of_law_change(
    country="UK",
    change_date="2013-01-01",  # When UK extended some terms
    work_creation_year=1940,
    author_death_year=1970
)

print(f"Impact of law change on {impact_analysis['change_date']}")
print(f"Work created in {impact_analysis['work_creation_year']}")
print(f"Author died in {impact_analysis['author_death_year']}")
print(f"Potential impact: {impact_analysis['potential_impact']}")

13. JSON Data Management

Save and load configuration and results in structured format:

# Save country copyright data to JSON
validator.save_country_copyright_data('data/copyright_terms.json')

# Validate multiple items and store results in an async function
async def validate_multiple():
    items = [
        ContentItem(title="Work 1", content="Content of work 1"),
        ContentItem(title="Work 2", content="Content of work 2")
    ]
    await validator.validate_and_store_results(items, 'data/validation_results.json')

# Run the async function
asyncio.run(validate_multiple())

# Load educational resources about public domain
pd_basics = validator.get_educational_resource('what_is_pd')
print(f"PD basics: {pd_basics['content'][0]['section']}")

### 14. Alternative Usage Without Async

For simpler use cases, the library also provides a synchronous function that doesn't require async/await:

```python
from isitpublic import calculate_pd_from_metadata

metadata = {
    "author_death_year": 1601,  # Over 400 years ago!
    "publication_year": 1600,
    "country": "worldwide"
}

result = calculate_pd_from_metadata(metadata)
print(result)  # {'is_public_domain': True, 'pd_year': 1672, ...}

Note that the more advanced validation methods like is_likely_public_domain() are async, while methods like calculate_pd_from_metadata() are synchronous and can be used directly without async/await.

API Reference

PublicDomainValidator

Main validator class with comprehensive validation methods.

Methods:

  • is_likely_public_domain(item, use_wikidata=False) - [ASYNC] Check if content is likely in public domain
  • is_likely_public_domain_with_details(item) - [ASYNC] Detailed analysis with confidence and explanations
  • calculate_pd_status_from_copyright_info(author_death_year=None, publication_year=None, country="worldwide", work_type="individual", is_government_work=False) - [SYNC] Calculate status from copyright data
  • generate_jurisdiction_report(author_death_year, publication_year, work_title, work_type, is_government_work) - [SYNC] Comprehensive jurisdiction analysis
  • assess_use_risk(author_death_year, publication_year, intended_jurisdictions, commercial_use) - [SYNC] Risk assessment for usage
  • save_country_copyright_data(filepath) - [SYNC] Save country copyright data to JSON
  • load_country_copyright_data(filepath) - [SYNC] Load country copyright data from JSON
  • validate_and_store_results(items, output_file, country, work_type, is_government_work) - [ASYNC] Validate multiple items and store results
  • store_pd_calculation_results(metadata_list, output_file) - [SYNC] Perform multiple calculations and store results
  • get_educational_resources(category) - [SYNC] Retrieve educational materials about public domain
  • get_educational_resource(resource_name) - [SYNC] Retrieve specific educational resource

ContentItem

Simple data class for content to be validated.

Attributes:

  • title: Title of the work
  • content: Full content (optional)
  • url: URL of the content (optional)
  • snippet: Snippet or excerpt (optional)

Standalone Functions

  • validate_public_domain_status(item, use_wikidata=False) - [ASYNC] Basic PD validation
  • validate_public_domain_with_explanation(item, country, work_type, is_government_work, use_wikidata) - [ASYNC] PD validation with detailed explanations
  • calculate_pd_from_metadata(metadata) - [SYNC] Calculate status from metadata dict

Async vs Sync Usage

When using async methods, wrap your code in an async function and use asyncio.run():

import asyncio
from isitpublic import PublicDomainValidator, ContentItem

async def main():
    validator = PublicDomainValidator()
    item = ContentItem(title="Shakespeare's Hamlet", content="To be or not to be...")
    is_pd = await validator.is_likely_public_domain(item)
    print(f"Is likely public domain: {is_pd}")

asyncio.run(main())

For sync methods, use them directly:

from isitpublic import calculate_pd_from_metadata

result = calculate_pd_from_metadata({
    "author_death_year": 1601,
    "country": "worldwide"
})
print(result)

About Directory

The library includes educational materials about public domain concepts stored in JSON format in the data/about/ directory:

  • what_is_pd.json - Basic definitions and concepts
  • copyright_terms.json - Information about copyright terms
  • jurisdiction_rules.json - Country-specific public domain rules
  • historical_works.json - Examples of famous public domain works
  • misconceptions.json - Common myths and misunderstandings
  • index.json - Master index of all educational resources

License

AGPLv3 License for code. See the LICENSE file for details.

Data files are licensed under Creative Commons Attribution Share Alike 4.0 International (CC BY-SA 4.0).

Development & Code Quality

This project uses Skylos for automated code quality and security analysis:

🔍 Code Quality Features

  • Dead Code Detection: Automatically identifies and removes unused imports, variables, and unreachable code
  • Security Scanning: Checks for potential vulnerabilities, path traversal issues, and hardcoded secrets
  • Pre-commit Integration: Automated quality checks before each commit
  • CI/CD Pipeline: GitHub Actions workflow for continuous quality monitoring

🛠️ Development Setup

# Install pre-commit hooks
pre-commit install

# Run skylos manually
uv run skylos src/ --secrets --danger

# Run with verbose output
uv run skylos src/ --verbose

📊 Quality Metrics

  • Dead Code: 0 detected ✅
  • Security Issues: Continuously monitored
  • Code Coverage: Maintained through automated testing
  • Type Safety: Pydantic models ensure data validation

Architecture Note

This is the core isitpublic library focused solely on public domain determination logic. Web API, GraphQL, and advanced application features have been separated into a dedicated application layer that builds upon this library, ensuring the core library remains lightweight and focused on its primary function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isitpublic-0.0.1a1.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isitpublic-0.0.1a1-py3-none-any.whl (42.3 kB view details)

Uploaded Python 3

File details

Details for the file isitpublic-0.0.1a1.tar.gz.

File metadata

  • Download URL: isitpublic-0.0.1a1.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for isitpublic-0.0.1a1.tar.gz
Algorithm Hash digest
SHA256 2927d6e569d326d72ad176e742e82d1c293bb9a639fe3013e47770c11ed9670c
MD5 1e88081d367c9cd47271b7cd2532f668
BLAKE2b-256 7b0e4fb8d8aa5a564f78196a20f7c71c26bfe908d93cd011b96b31d98fb493b6

See more details on using hashes here.

File details

Details for the file isitpublic-0.0.1a1-py3-none-any.whl.

File metadata

File hashes

Hashes for isitpublic-0.0.1a1-py3-none-any.whl
Algorithm Hash digest
SHA256 95c1f3ea90c0f8b636e3f6ae46d5063395158dc5fd786de1c687ec3f9fa80932
MD5 e76be4ca776efb634969e65ac3e3b5e1
BLAKE2b-256 436708cfa29dd5a30089585d3cd17551633024be763cc2f22b9b0eff1fcc87b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page