Point it at a repo, get back 'this is an e-commerce app that does X' — pattern-based application functional-category inference from routes, data models, and README.

These details have not been verified by PyPI

Project links

Project description

app-classifier

Point it at a repo, get back "this is an e-commerce app that does X".

Pattern-based application functional-category inference from routes, data models, and README. Zero runtime dependencies (pure stdlib). Optional LLM polish — bring your own provider.

What problem this solves

Onboarding to a new repo, every engineer asks the same questions: "What does this thing do? Is it a CRUD app or a queue worker? What database? What ports does it need?" The README is usually wrong or stale. Codeowners are unavailable. You end up grepping for clues.

app-classifier answers those questions in under a second for any repo, on disk, with no network calls:

What kind of app is this? — e-commerce, blog, social network, admin panel, REST API, auth/SSO, file management, scheduling, or messaging (9 categories, weighted-pattern matching, confidence-scored)
What does it do? — a 2-3 sentence functional description, deterministically composed, optionally LLM-polished
How does it deploy? — runtime + version, framework, web server, databases, caches, ports, env vars, container base image, runtime CVEs

Quick start

pip install app-classifier
app-classifier ./my-repo

=== my-repo ===

Category:    e-commerce (78% confidence)
Runtime:     python 3.11
Framework:   FastAPI
Deploys as:  ASGI server (uvicorn / hypercorn / daphne)
Databases:   PostgreSQL, SQLAlchemy ORM
Cache/Queue: Redis, Celery
Features:    online shopping, messaging

📋 Summary: my-repo · python 3.11 · FastAPI · 23 HTTP route(s) · 5 data model(s) · DB: PostgreSQL, SQLAlchemy ORM

📝 What it does:
  my-repo is a e-commerce application. Primary functionality: online shopping, messaging.
  It models entities like Cart, Order, Product, User serving authenticated users.

🌐 HTTP Routes (23 found):
  GET    /products       →  list_products
  POST   /cart/add       →  add_to_cart
  POST   /checkout       →  checkout
  ...

Python API

from app_classifier import classify

result = classify("./my-repo")

print(result.app_category)              # 'e-commerce'
print(result.app_category_confidence)   # 0.78
print(result.detected_features)         # ['online shopping', 'messaging']
print(result.functional_description)    # "my-repo is a e-commerce application. ..."

# Full structured access
for route in result.routes:
    print(route.method, route.path, route.handler)

for model in result.data_models:
    print(model.name, model.framework, model.fields_hint)

# JSON-serializable
import json
print(json.dumps(result.to_dict(), indent=2))

Just the deployment data?

Skip the classifier, use hosting directly:

from app_classifier import analyze_hosting_requirements

report = analyze_hosting_requirements("./my-repo")
print(report.runtime)         # {'language': 'python', 'version': '3.11'}
print(report.web_server)      # {'framework': 'FastAPI', 'deployment_target': '...'}
print(report.databases)       # [{'name': 'PostgreSQL', ...}, ...]
print(report.ports)           # [{'port': 8000, 'source': 'Dockerfile', ...}]
print(report.web_server_vulnerabilities)  # CVEs on the container base image

Optional: LLM polish

classify_async accepts ANY async callable as the LLM provider — no SDK pinned. If the LLM gives a useful response, the deterministic functional_description is replaced with the polished version; on any failure (timeout / parse error / hallucination guard / no provider) the deterministic version is kept.

# OpenAI shim
async def my_openai_provider(prompt, max_tokens=400, temperature=0.2):
    import openai
    client = openai.AsyncOpenAI()
    resp = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        max_tokens=max_tokens, temperature=temperature,
    )
    return resp.choices[0].message.content


# Anthropic shim
async def my_anthropic_provider(prompt, max_tokens=400, temperature=0.2):
    import anthropic
    client = anthropic.AsyncAnthropic()
    resp = await client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=max_tokens, temperature=temperature,
        messages=[{"role": "user", "content": prompt}],
    )
    return resp.content[0].text


# Local llama.cpp / Ollama shim
async def my_ollama_provider(prompt, max_tokens=400, temperature=0.2):
    import httpx
    async with httpx.AsyncClient() as client:
        r = await client.post("http://localhost:11434/api/generate", json={
            "model": "llama3", "prompt": prompt, "stream": False,
            "options": {"num_predict": max_tokens, "temperature": temperature},
        })
        return r.json().get("response")


# Use any of the above
import asyncio
from app_classifier import classify_async
result = asyncio.run(classify_async("./my-repo", llm_provider=my_openai_provider))
print(result.functional_description)

What detection is supported

Runtimes

Python, Java (JDK 8+), Node.js, Go, Ruby, PHP, Rust — detected from manifest files, Dockerfiles, version files (.nvmrc, .python-version, .ruby-version).

Web frameworks (route extraction)

Language	Frameworks
Python	Flask, FastAPI, Django
Java	Spring Boot, Struts 2 (struts.xml), classic Spring
Node	Express, Fastify, NestJS, Next.js

Data model ORMs

ORM	Detected from
JPA / Hibernate	`@Entity`, `@Table` annotations
SQLAlchemy	`class X(Base)`
Django ORM	`class X(models.Model)`

Databases / caches

PostgreSQL, MySQL, MongoDB, H2, Oracle, SQL Server, MariaDB, Redis, RabbitMQ, Kafka, Elasticsearch, Celery.

Container/deployment

Dockerfile (FROM, EXPOSE, ENV), docker-compose, Kubernetes manifests, Helm charts, k8s deployment YAML, Heroku Procfile, Vercel / Netlify configs.

Runtime CVEs (web-server vulnerabilities)

Curated CVE manifest for nginx, Apache HTTPD, Tomcat, OpenJDK / Eclipse Temurin / Amazon Corretto. ~30 high-impact CVEs covered out of the box. PRs welcome.

App categories (functional fingerprints)

e-commerce, blog/content, social network, admin panel/dashboard, REST API service, authentication/SSO, file/document management, scheduling/booking, messaging/notification. Each is matched by a weighted regex pattern against routes + model names + README.

How it works

Walk every manifest/config file in the repo (capped at 800 files for speed)
Each file extracts language-specific signals (Maven artifact IDs, npm package names, Python deps, Dockerfile FROM, k8s containerPort, etc.) → HostingReport
Walk source files to extract HTTP routes + data models per framework
Pattern-match routes + model names + README purpose against 9 category fingerprints (weighted regex)
Compose the 2-3 sentence functional description deterministically
(Optional) Hand the structured signals to your LLM for a polished rewrite

Time budget: under 1 second on a 5K-file repo. Bounded scan caps file count + per-file read size.

Design principles

No network. Every signal comes from on-disk content. Bundled CVE manifest, no live API calls.
No SDK pin. The LLM step is provider-agnostic — bring your own callable. We never import openai.
No surprises. Failures on individual files don't kill the pass. Confidence is always reported; the consumer decides whether to trust it.
Pure read. We never modify the target repo.

Contributing

PRs welcome on three axes:

More category fingerprints — _CATEGORY_FINGERPRINTS in classifier.py. Each is { name, feature_label, signals: [(regex, weight), ...] }.
More CVE entries — data/web_server_cves.json. Schema is documented in the file header.
More framework extractors — route + model extraction for Ruby on Rails, Phoenix, ASP.NET Core, Gin, Rocket, etc. would all be welcome.

Run the test suite

pip install -e ".[test]"
pytest

Tests use fixture directories under tests/fixtures/ — point the classifier at each, assert the expected category + features.

What this is NOT

Not a security scanner. It surfaces runtime CVEs on the container base image, but the rest of the code is for understanding, not vulnerability detection.
Not a deployment tool. It tells you what the deployment looks like; it doesn't deploy anything.
Not a replacement for a README. It generates a structural sketch; humans still write the narrative.

If you want a full security analysis + fix pipeline that uses this internally, see Codefixer (closed-source).

License

MIT — see LICENSE. Use it however you want. Attribution appreciated but not required.

Acknowledgements

Extracted from Codefixer's hosting_requirements + app_description analyzers. The category-fingerprint approach was inspired by Sourcegraph's "what is this repo?" tooling and the way Backstage classifies services.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.3

May 25, 2026

0.5.2

May 24, 2026

0.5.1

May 22, 2026

0.5.0

May 22, 2026

0.2.0

May 21, 2026

This version

0.1.0

May 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

app_classifier-0.1.0.tar.gz (43.8 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

app_classifier-0.1.0-py3-none-any.whl (39.6 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file app_classifier-0.1.0.tar.gz.

File metadata

Download URL: app_classifier-0.1.0.tar.gz
Upload date: May 21, 2026
Size: 43.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for app_classifier-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2f1f995b5b0cb63b0947b1f52c0248260634c71db03694a74968f4ed5ee7ae5f`
MD5	`aa3c0cf6650cb449d9f7dba44b261d9d`
BLAKE2b-256	`5bcb42a0c1dd575d01349f7ef1b6aeb0ff6ea7a1f7c6bb40d312c9ae9219fa0d`

See more details on using hashes here.

File details

Details for the file app_classifier-0.1.0-py3-none-any.whl.

File metadata

Download URL: app_classifier-0.1.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 39.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for app_classifier-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d391fcc9bfbcd71d883b1cc5dcb35a7a4c252cf61ecada9ac07e6a0a77b02b0b`
MD5	`9efbbce70dbb851e2446d2aaa9afa99a`
BLAKE2b-256	`99316c1f898d1b9e6b713bdc9771296fc443374baf608035263b731092b7f761`

See more details on using hashes here.

app-classifier 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

app-classifier

What problem this solves

Quick start

Python API

Just the deployment data?

Optional: LLM polish

What detection is supported

Runtimes

Web frameworks (route extraction)

Data model ORMs

Databases / caches

Container/deployment

Runtime CVEs (web-server vulnerabilities)

App categories (functional fingerprints)

How it works

Design principles

Contributing

Run the test suite

What this is NOT

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes