Skip to main content

A library for cleaning and sorting metadata

Project description

CleanSort Library

A simple and powerful library that helps you clean and organize metadata from websites.

What is this library for?

This library helps you:

  1. Take messy website metadata (like information about books, articles, or journals)
  2. Clean it up by keeping only the important parts (like titles, authors, ISBN numbers)
  3. Organize it neatly by category
  4. Store it in a database for later use

Step-by-Step Installation Guide

For Users (Using the Library)

  1. Make sure you have Python installed (version 3.7 or higher):

  2. Install the CleanSort library using pip:

    pip install cleansort
    

For Developers (Contributing to the Library)

  1. Clone the repository:

    git clone https://github.com/yourusername/cleansort
    cd cleansort
    
  2. Install dependencies:

    pip install -r requirements.txt
    

How to Use the Library

Simple Python Example

# Import the library
from cleansort import CleanSort

# Create a new CleanSort object
cleaner = CleanSort()

# Example metadata (this could be from a website)
metadata = """
<meta name="title" content="Harry Potter">
<meta name="author" content="J.K. Rowling">
<meta name="isbn" content="978-0-7475-3269-9">
<meta name="source_site" content="books.com">
"""

# Process the metadata
result = cleaner.process_metadata(metadata)

# See the organized results
print(result)

# Get everything from the database
stored_data = cleaner.get_stored_metadata()

Using the API from Any Programming Language

  1. First, start the API server:

    • Open a terminal/command prompt
    • Navigate to your project directory
    • Run:
      python run_server.py
      
    • You'll see a message saying the server is running
  2. Now you can use the library from any programming language!

JavaScript Example

// Using fetch in browser or Node.js
async function processMetadata(metadata) {
    const response = await fetch('http://localhost:5000/process', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ metadata })
    });
    return await response.json();
}

Java Example

// Using Java's HttpClient
String url = "http://localhost:5000/process";
String metadata = "<meta name=\"title\" content=\"My Book\">";
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
    .uri(URI.create(url))
    .POST(HttpRequest.BodyPublishers.ofString(
        "{\"metadata\": \"" + metadata + "\"}"))
    .header("Content-Type", "application/json")
    .build();

What Kind of Data Can It Process?

The library can handle metadata about:

  • Books
  • Articles
  • Journals
  • Book chapters

It looks for these specific pieces of information:

  • Names/titles
  • Author names
  • ISBN numbers
  • Website sources

Common Problems and Solutions

  1. "Import error when using the library"

    • Make sure you installed the library using pip
    • Check that Python is in your system PATH
  2. "Can't connect to the API"

    • Make sure the server is running (python run_server.py)
    • Check that you're using the correct URL (http://localhost:5000)
  3. "Getting empty results"

    • Check that your metadata follows the expected format
    • Make sure it contains at least one of the supported fields

Need Help?

If you run into any problems:

  1. Check the Common Problems section above
  2. Look at the example files in the 'examples' directory
  3. Create an issue on GitHub

License

MIT License - Feel free to use this library in your projects!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleansort-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cleansort-0.1.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file cleansort-0.1.0.tar.gz.

File metadata

  • Download URL: cleansort-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for cleansort-0.1.0.tar.gz
Algorithm Hash digest
SHA256 49d620be0ec5cb831d4bd7ecd25c35f035e5705338bde026d992a4aa73c9b9fc
MD5 28b681cf474688ee571d51b9618d151c
BLAKE2b-256 9d4fe991d110eea3ae9f4aa41e2bac558a075d72a61b46c30d3d2abf5d5adb55

See more details on using hashes here.

File details

Details for the file cleansort-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cleansort-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for cleansort-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9491efdec5cb84beb5eaf9a502623e79925861edbf3677fe9b76894d92ebfa76
MD5 a5c6f58008cf94d60dda5e7268396c37
BLAKE2b-256 87acc58f5421eb82c261e556125edc4d7e75806537be7fa6012360237ccfb3b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page