A library for cleaning and sorting metadata
Project description
CleanSort Library
A simple and powerful library that helps you clean and organize metadata from websites.
What is this library for?
This library helps you:
- Take messy website metadata (like information about books, articles, or journals)
- Clean it up by keeping only the important parts (like titles, authors, ISBN numbers)
- Organize it neatly by category
- Store it in a database for later use
Step-by-Step Installation Guide
For Users (Using the Library)
-
Make sure you have Python installed (version 3.7 or higher):
- Go to https://www.python.org/downloads/
- Download and install Python for your operating system
- Make sure to check "Add Python to PATH" during installation
-
Install the CleanSort library using pip:
pip install cleansort
For Developers (Contributing to the Library)
-
Clone the repository:
git clone https://github.com/yourusername/cleansort cd cleansort
-
Install dependencies:
pip install -r requirements.txt
How to Use the Library
Simple Python Example
# Import the library
from cleansort import CleanSort
# Create a new CleanSort object
cleaner = CleanSort()
# Example metadata (this could be from a website)
metadata = """
<meta name="title" content="Harry Potter">
<meta name="author" content="J.K. Rowling">
<meta name="isbn" content="978-0-7475-3269-9">
<meta name="source_site" content="books.com">
"""
# Process the metadata
result = cleaner.process_metadata(metadata)
# See the organized results
print(result)
# Get everything from the database
stored_data = cleaner.get_stored_metadata()
Using the API from Any Programming Language
-
First, start the API server:
- Open a terminal/command prompt
- Navigate to your project directory
- Run:
python run_server.py - You'll see a message saying the server is running
-
Now you can use the library from any programming language!
JavaScript Example
// Using fetch in browser or Node.js
async function processMetadata(metadata) {
const response = await fetch('http://localhost:5000/process', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ metadata })
});
return await response.json();
}
Java Example
// Using Java's HttpClient
String url = "http://localhost:5000/process";
String metadata = "<meta name=\"title\" content=\"My Book\">";
HttpClient client = HttpClient.newHttpClient();
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(url))
.POST(HttpRequest.BodyPublishers.ofString(
"{\"metadata\": \"" + metadata + "\"}"))
.header("Content-Type", "application/json")
.build();
What Kind of Data Can It Process?
The library can handle metadata about:
- Books
- Articles
- Journals
- Book chapters
It looks for these specific pieces of information:
- Names/titles
- Author names
- ISBN numbers
- Website sources
Common Problems and Solutions
-
"Import error when using the library"
- Make sure you installed the library using pip
- Check that Python is in your system PATH
-
"Can't connect to the API"
- Make sure the server is running (python run_server.py)
- Check that you're using the correct URL (http://localhost:5000)
-
"Getting empty results"
- Check that your metadata follows the expected format
- Make sure it contains at least one of the supported fields
Need Help?
If you run into any problems:
- Check the Common Problems section above
- Look at the example files in the 'examples' directory
- Create an issue on GitHub
License
MIT License - Feel free to use this library in your projects!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cleansort-0.1.0.tar.gz.
File metadata
- Download URL: cleansort-0.1.0.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49d620be0ec5cb831d4bd7ecd25c35f035e5705338bde026d992a4aa73c9b9fc
|
|
| MD5 |
28b681cf474688ee571d51b9618d151c
|
|
| BLAKE2b-256 |
9d4fe991d110eea3ae9f4aa41e2bac558a075d72a61b46c30d3d2abf5d5adb55
|
File details
Details for the file cleansort-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cleansort-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9491efdec5cb84beb5eaf9a502623e79925861edbf3677fe9b76894d92ebfa76
|
|
| MD5 |
a5c6f58008cf94d60dda5e7268396c37
|
|
| BLAKE2b-256 |
87acc58f5421eb82c261e556125edc4d7e75806537be7fa6012360237ccfb3b6
|