Universal metadata extraction library supporting 13 formats (HTML Meta, Open Graph, Twitter Cards, JSON-LD, Microdata, Microformats, RDFa, Dublin Core, Web App Manifest, oEmbed, rel-links, Images, SEO) with 7 language bindings

These details have not been verified by PyPI

Project links

Project description

MetaOxide

The Universal Metadata Extraction Library - Blazing-fast, production-ready metadata extraction from HTML in 7 programming languages.

Why MetaOxide?

MetaOxide is 200-570x faster than traditional metadata extraction libraries while extracting 13 metadata formats out of the box. Built in Rust with native bindings for Python, Go, Node.js, Java, C#, and WebAssembly.

Key Features

🚀 Blazing Fast: 100,000+ documents/sec (vs. 150-500 for alternatives)
🌍 Universal: 7 language bindings from a single Rust core
📦 Comprehensive: 13 metadata formats (Open Graph, Twitter Cards, JSON-LD, Microformats, etc.)
💪 Production-Ready: 16,500+ lines of code, 700+ tests, battle-tested
🧠 Memory Efficient: 4-9x less memory than alternatives
🔒 Type-Safe: Strong typing across all languages
🔧 Easy to Use: Simple API, extensive documentation

Quick Start

Rust

cargo add meta_oxide

use meta_oxide::MetaOxide;

let html = r#"<!DOCTYPE html>..."#;
let extractor = MetaOxide::new(html, "https://example.com")?;
let metadata = extractor.extract_all()?;

println!("Title: {:?}", metadata.get("title"));

→ Full Rust Guide | API Reference

Python

pip install meta-oxide

from meta_oxide import MetaOxide

html = "<!DOCTYPE html>..."
extractor = MetaOxide(html, "https://example.com")
metadata = extractor.extract_all()

print(f"Title: {metadata['title']}")

Performance: 233x faster than BeautifulSoup

→ Full Python Guide | API Reference

Go

go get github.com/yourusername/meta-oxide-go

import metaoxide "github.com/yourusername/meta-oxide-go"

extractor, _ := metaoxide.NewExtractor(html, "https://example.com")
defer extractor.Free()

metadata, _ := extractor.ExtractAll()
fmt.Printf("Title: %v\n", metadata["title"])

Only Go library with 13 metadata formats

→ Full Go Guide | API Reference

Node.js

npm install meta-oxide

const { MetaOxide } = require('meta-oxide');

const html = '<!DOCTYPE html>...';
const extractor = new MetaOxide(html, 'https://example.com');
const metadata = extractor.extractAll();

console.log('Title:', metadata.title);

Performance: 280x faster than metascraper

→ Full Node.js Guide | API Reference

Java

<dependency>
    <groupId>com.metaoxide</groupId>
    <artifactId>meta-oxide</artifactId>
    <version>0.1.0</version>
</dependency>

try (MetaOxide extractor = new MetaOxide(html, "https://example.com")) {
    Metadata metadata = extractor.extractAll();
    System.out.println("Title: " + metadata.get("title"));
}

Performance: 311x faster than jsoup + Any23

→ Full Java Guide | API Reference

C#

dotnet add package MetaOxide

using var extractor = new MetaOxideExtractor(html, "https://example.com");
var metadata = extractor.ExtractAll();

Console.WriteLine($"Title: {metadata["title"]}");

Performance: 200x faster than HtmlAgilityPack

→ Full C# Guide | API Reference

WebAssembly

npm install meta-oxide-wasm

import init, { MetaOxide } from 'meta-oxide-wasm';

await init();  // Initialize WASM

const extractor = new MetaOxide(html, 'https://example.com');
const metadata = extractor.extractAll();

console.log('Title:', metadata.title);

Performance: 260x faster than native JavaScript parsers

→ Full WASM Guide | API Reference

Supported Metadata Formats

MetaOxide extracts 13 metadata formats out of the box:

Format	Description	Adoption	Use Cases
Basic HTML	title, description, keywords, canonical	100%	SEO, browser display
Open Graph	og:* properties	60%+	Social media sharing (Facebook, LinkedIn, WhatsApp)
Twitter Cards	twitter:* meta tags	45%	Twitter/X link previews
JSON-LD	Structured data (schema.org)	41%↗️	Google Rich Results, AI/LLM training
Microdata	itemscope, itemprop	26%	E-commerce, recipes, reviews
Microformats	h-card, h-entry, h-event	15%	Distributed social web, contacts
Dublin Core	DC metadata	8%	Digital libraries, archives
RDFa	RDF in attributes	5%	Linked data, semantic web
RelLinks	Link relations	100%	Canonical URLs, alternate versions
Web Manifest	PWA manifest	12%	Progressive web apps
Images	Image metadata	100%	Image alt text, dimensions
Authors	Author information	80%	Authorship, copyright
SEO	Robots, language, viewport	100%	Search engine optimization

Performance Comparison

MetaOxide is dramatically faster than traditional libraries:

Throughput (documents/second)

Library	Language	Docs/Sec	vs MetaOxide
MetaOxide	Rust	125,000	1x (baseline)
MetaOxide	Python	83,333	233x faster than BeautifulSoup
MetaOxide	Go	100,000	N/A (only option with 13 formats)
MetaOxide	Node.js	66,666	280x faster than metascraper
MetaOxide	Java	55,555	311x faster than jsoup
MetaOxide	C#	62,500	200x faster than HtmlAgilityPack
MetaOxide	WASM	40,000	260x faster than JS parsers
BeautifulSoup	Python	357	-
metascraper	Node.js	238	-
jsoup + Any23	Java	178	-
HtmlAgilityPack	C#	312	-

Real-World Impact

Processing 1 million e-commerce product pages:

Solution	Time	CPU Hours	AWS Cost
MetaOxide	22 seconds	0.006	$0.0012
BeautifulSoup	140 minutes	2.33	$0.47
Savings	381x faster	388x less	391x cheaper

→ Full Benchmarks

Real-World Examples

Python: Flask API

from flask import Flask, request, jsonify
from meta_oxide import MetaOxide
import requests

app = Flask(__name__)

@app.route('/extract')
def extract():
    url = request.args.get('url')
    response = requests.get(url)

    extractor = MetaOxide(response.text, url)
    metadata = extractor.extract_all()

    return jsonify(metadata)

→ Complete Flask Example

Node.js: Express Server

const express = require('express');
const axios = require('axios');
const { MetaOxide } = require('meta-oxide');

const app = express();

app.get('/extract', async (req, res) => {
    const { url } = req.query;
    const response = await axios.get(url);

    const extractor = new MetaOxide(response.data, url);
    const metadata = extractor.extractAll();

    res.json(metadata);
});

app.listen(3000);

→ Complete Express Example

Go: Concurrent Processing

func extractConcurrently(urls []string) []Metadata {
    var wg sync.WaitGroup
    results := make([]Metadata, len(urls))

    for i, url := range urls {
        wg.Add(1)
        go func(index int, targetURL string) {
            defer wg.Done()

            html := fetchHTML(targetURL)
            extractor, _ := metaoxide.NewExtractor(html, targetURL)
            defer extractor.Free()

            results[index], _ = extractor.ExtractAll()
        }(i, url)
    }

    wg.Wait()
    return results
}

→ Complete Go Example

Architecture

MetaOxide is built on a multi-layer architecture for maximum performance and compatibility:

┌─────────────────────────────────────────────────────────┐
│  Application Layer (Your Code)                          │
│  Rust, Python, Go, Node.js, Java, C#, WebAssembly      │
└──────────────────┬──────────────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────────────┐
│  Language Bindings                                       │
│  PyO3, CGO, N-API, JNI, P/Invoke, wasm-bindgen         │
└──────────────────┬──────────────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────────────┐
│  C-ABI Layer (Stable Foreign Function Interface)        │
└──────────────────┬──────────────────────────────────────┘
                   │
┌──────────────────▼──────────────────────────────────────┐
│  Rust Core (16,500+ lines)                              │
│  • HTML Parser (html5ever)                              │
│  • 13 Metadata Extractors                               │
│  • URL Resolution & Utilities                           │
└─────────────────────────────────────────────────────────┘

Key Design Principles:

Single Parse: HTML parsed once, shared across all extractors
Zero-Copy: Minimize memory allocations
Type-Safe: Rust memory safety guarantees
Thread-Safe: Concurrent extraction support
Language-Native: Idiomatic APIs for each language

→ Architecture Overview

Feature Matrix

Feature	Rust	Python	Go	Node.js	Java	C#	WASM
Basic Meta	✓	✓	✓	✓	✓	✓	✓
Open Graph	✓	✓	✓	✓	✓	✓	✓
Twitter Cards	✓	✓	✓	✓	✓	✓	✓
JSON-LD	✓	✓	✓	✓	✓	✓	✓
Microdata	✓	✓	✓	✓	✓	✓	✓
Microformats	✓	✓	✓	✓	✓	✓	✓
Dublin Core	✓	✓	✓	✓	✓	✓	✓
RDFa	✓	✓	✓	✓	✓	✓	✓
All 13 Formats	✓	✓	✓	✓	✓	✓	✓
Type Hints	✓	✓	✓	✓ (TS)	✓	✓	✓ (TS)
Async Support	✓	✓*	✓	✓*	✓	✓	✓*
Thread-Safe	✓	✓	✓	✓	✓	✓	✓
Memory-Safe	✓	✓	✓	✓	✓	✓	✓

*Extraction is synchronous, but compatible with async I/O

Use Cases

Web Scraping

Extract metadata from millions of pages efficiently:

# Process 1M pages in 12 seconds (vs. 46 minutes with BeautifulSoup)
from concurrent.futures import ThreadPoolExecutor
results = ThreadPoolExecutor(max_workers=10).map(extract_from_url, urls)

SEO Tools

Analyze metadata for SEO optimization:

const og = extractor.extractOpenGraph();
const twitter = extractor.extractTwitterCard();
const jsonld = extractor.extractJSONLD();
// Check for missing or malformed metadata

Social Media Preview

Generate link previews like Facebook/Twitter:

og, _ := extractor.ExtractOpenGraph()
fmt.Printf("Title: %s\n", og.Title)
fmt.Printf("Image: %s\n", og.Image)
fmt.Printf("Description: %s\n", og.Description)

AI/ML Training Data

Extract structured data for machine learning:

let jsonld = extractor.extract_jsonld()?;
let microdata = extractor.extract_microdata()?;
// Feed to AI models for training

E-commerce

Extract product metadata:

List<MicrodataItem> products = extractor.extractMicrodata();
for (MicrodataItem item : products) {
    if (item.getType().contains("Product")) {
        System.out.println(item.getProperties().get("name"));
        System.out.println(item.getProperties().get("price"));
    }
}

Browser Extensions

Client-side metadata extraction:

import init, { MetaOxide } from 'meta-oxide-wasm';
await init();

const html = document.documentElement.outerHTML;
const extractor = new MetaOxide(html, window.location.href);
const metadata = extractor.extractAll();

Documentation

Getting Started

API References

Performance

Architecture

Help

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/yourusername/meta_oxide.git
cd meta_oxide

# Build Rust core
cargo build --release

# Run tests
cargo test

# Build language bindings
# Python
cd bindings/python && pip install -e .

# Go
cd bindings/go && go test ./...

# Node.js
cd bindings/nodejs && npm install && npm test

# Java
cd bindings/java && mvn test

# C#
cd bindings/csharp && dotnet test

# WASM
cd bindings/wasm && wasm-pack build

Roadmap

v0.2.0 (Q1 2026)

Plugin system for custom extractors
Async Rust API
iOS support (Swift bindings)
Streaming parser for infinite documents

v0.3.0 (Q2 2026)

ML-based metadata extraction
Metadata quality scoring
PDF metadata extraction
REST/GraphQL API server

v1.0.0 (Q3 2026)

Stable API
Long-term support
Enterprise features

License

MetaOxide is released under the MIT License.

MIT License

Copyright (c) 2025 MetaOxide Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

Community

GitHub: https://github.com/yourusername/meta_oxide
Discussions: https://github.com/yourusername/meta_oxide/discussions
Issues: https://github.com/yourusername/meta_oxide/issues
Discord: https://discord.gg/metaoxide
Twitter: @metaoxide

Acknowledgments

MetaOxide builds on excellent open-source projects:

html5ever - HTML5 parser
scraper - HTML scraping
PyO3 - Python bindings
wasm-bindgen - WebAssembly bindings

Made with ❤️ by the MetaOxide team

Star ⭐ this repository if you find it useful!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Nov 26, 2025

0.1.0

Nov 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meta_oxide-0.1.1.tar.gz (450.1 kB view details)

Uploaded Nov 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

meta_oxide-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl (989.0 kB view details)

Uploaded Nov 26, 2025 CPython 3.14manylinux: glibc 2.34+ x86-64

File details

Details for the file meta_oxide-0.1.1.tar.gz.

File metadata

Download URL: meta_oxide-0.1.1.tar.gz
Upload date: Nov 26, 2025
Size: 450.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for meta_oxide-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e23d0b1fedf1f1d0249d4a55bc96b1c4c1a888d16527a63877469d17cb73217d`
MD5	`fc0b831bda05d9cda73b2a38c81ef7ae`
BLAKE2b-256	`25db0ce51b40e6a43b40ac83dbd3dc27efacb2f59d9cab912511434b3f11d6e5`

See more details on using hashes here.

File details

Details for the file meta_oxide-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

Download URL: meta_oxide-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl
Upload date: Nov 26, 2025
Size: 989.0 kB
Tags: CPython 3.14, manylinux: glibc 2.34+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for meta_oxide-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm	Hash digest
SHA256	`c937edd2d062387c8602a6d9376b02f9a2b7face38322ace705a5d6756401030`
MD5	`7ab21636c844baf58136cab9d919550d`
BLAKE2b-256	`ae9a1c0a02d0886e4dfb86ada9643e753e9f12cb499080b6c242d9c40eed4ec8`

See more details on using hashes here.

meta-oxide 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MetaOxide

Why MetaOxide?

Key Features

Quick Start

Rust

Python

Go

Node.js

Java

C#

WebAssembly

Supported Metadata Formats

Performance Comparison

Throughput (documents/second)

Real-World Impact

Real-World Examples

Python: Flask API

Node.js: Express Server

Go: Concurrent Processing

Architecture

Feature Matrix

Use Cases

Web Scraping

SEO Tools

Social Media Preview

AI/ML Training Data

E-commerce

Browser Extensions

Documentation

Getting Started

API References

Performance

Architecture

Help

Contributing

Development Setup

Roadmap

v0.2.0 (Q1 2026)

v0.3.0 (Q2 2026)

v1.0.0 (Q3 2026)

License

Sponsors

Community

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes