Python Oriented Crawling Ongoing (POCONG): a simple crawling framework

These details have not been verified by PyPI

Project links

Homepage

Project description

POCONG Logo

POCONG 🪦

Python Oriented Crawling ON Going

POCONG is a lightweight web crawling framework built in Python.

Features

🔒 Get Free Proxy: Automatic proxy fetching, validation, and rotation from free proxy sources
🌐 Dynamic Media Web Scraping: Extract content, metadata, and media information from web pages with proxy support
📱 Social Media Scraping: Extract data from social media platforms (coming soon)
🛒 E-commerce Scraping: Extract product information from e-commerce websites (coming soon)

Installation

pip install pocong

Usage: Get Proxy from proxy_spiders

You can use the get_proxy and get_proxy_random methods from proxy_spiders to fetch working proxies.

from pocong.proxy_spiders import GetProxy

gp = GetProxy()

# Get the first working proxy
proxy = gp.get_proxy()
print("First working proxy:", proxy)

from pocong.proxy_spiders import GetProxy

gp = GetProxy()

# Get a random working proxy
random_proxy = gp.get_proxy_random()
print("Random working proxy:", random_proxy)

Sample output:

First working proxy: {'ip': '123.45.67.89', 'port': '8080', 'https': 'yes', ...}
Random working proxy: {'ip': '98.76.54.32', 'port': '3128', 'https': 'yes', ...}

You can use the returned proxy dictionary with the requests library, for example:

import requests

proxy = gp.get_proxy()
if proxy:
    proxies = {
        'http': f"http://{proxy['ip']}:{proxy['port']}",
        'https': f"http://{proxy['ip']}:{proxy['port']}"
    }
    response = requests.get('https://httpbin.org/ip', proxies=proxies)
    print(response.json())
else:
    print("No working proxy found.")

get_proxy() will return the first working proxy found.
get_proxy_random() will return a random working proxy (with up to 20 retries).

Both methods return a dictionary with proxy details (e.g., { 'ip': '...', 'port': '...', ... }) or None if no working proxy is found.

Usage: Dynamic Media Web Scraping

The DynamicScrapingNews class provides comprehensive web scraping capabilities with built-in proxy support for extracting content, metadata, and media information from web pages.

Basic Usage

from pocong.media_spiders import DynamicScrapingNews

# Simple scraping without proxy
scraper = DynamicScrapingNews("https://example.com", use_proxy=False)
result = scraper.scrape()

# Extract specific information
print(f"Title: {result['title']}")
print(f"URL: {result['url']}")
print(f"Media: {result['media']}")
print(f"Published: {result['published_date']}")
print(f"Text content: {result['text'][:200]}...")  # First 200 chars

Proxy Configuration Options

1. Automatic Proxy (Default)

# Uses automatic proxy fetching
scraper = DynamicScrapingNews("https://example.com")
result = scraper.scrape()

2. Manual Proxy Configuration

# Method 1: IP:Port format
scraper = DynamicScrapingNews("https://example.com", 
                              manual_proxy="192.168.1.1:8080")

# Method 2: Full URL format
scraper = DynamicScrapingNews("https://example.com", 
                              manual_proxy="http://192.168.1.1:8080")

# Method 3: Dictionary format
scraper = DynamicScrapingNews("https://example.com", 
                              manual_proxy={"ip": "192.168.1.1", "port": "8080"})

result = scraper.scrape()

3. No Proxy

# Disable proxy completely
scraper = DynamicScrapingNews("https://example.com", use_proxy=False)
result = scraper.scrape()

4. Manual Proxy Override

# Manual proxy overrides use_proxy setting
scraper = DynamicScrapingNews("https://example.com", 
                              use_proxy=False, 
                              manual_proxy="192.168.1.1:8080")
result = scraper.scrape()

Complete Example with Proxy Integration

from pocong.proxy_spiders import GetProxy
from pocong.media_spiders import DynamicScrapingNews

# Get a working proxy
proxy = GetProxy().get_proxy()
print(f"Using proxy: {proxy}")

# Use automatic proxy (default behavior)
scraper = DynamicScrapingNews("https://example.com")
result = scraper.scrape()

# Use manual proxy with ip:port format
scraper = DynamicScrapingNews("https://example.com", 
                              manual_proxy=f"{proxy['ip']}:{proxy['port']}")
result = scraper.scrape()

# Use manual proxy with dictionary format
scraper = DynamicScrapingNews("https://example.com", 
                              manual_proxy={"ip": proxy['ip'], "port": proxy['port']})
result = scraper.scrape()

Extracted Data Structure

The scrape() method returns a dictionary containing:

{
    'title': 'Page Title',           # Extracted from og:title or title tag
    'url': 'https://example.com',    # Canonical URL
    'image': 'https://...',          # Featured image URL
    'html': '<html>...</html>',      # Full HTML content
    'text': 'Clean text content',    # Processed text without HTML
    'media': 'example',              # Domain name extracted from URL
    'published_date': datetime(...)  # Publication date if found
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.1.0

Oct 11, 2025

1.0.1

Sep 21, 2025

1.0.0

Sep 21, 2025

0.0.1

Aug 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pocong-1.1.0.tar.gz (32.4 kB view details)

Uploaded Oct 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pocong-1.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Oct 11, 2025 Python 3

File details

Details for the file pocong-1.1.0.tar.gz.

File metadata

Download URL: pocong-1.1.0.tar.gz
Upload date: Oct 11, 2025
Size: 32.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pocong-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`00b01c0f5cd5bf49ea50677efab7d28e4736ae405821126c19564ba9e1548411`
MD5	`f791906201df697c48f3458b28376254`
BLAKE2b-256	`94437f67fb38ad09438e33daf7e091df1d38859778d98ac98e11bede7f86fa65`

See more details on using hashes here.

File details

Details for the file pocong-1.1.0-py3-none-any.whl.

File metadata

Download URL: pocong-1.1.0-py3-none-any.whl
Upload date: Oct 11, 2025
Size: 9.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pocong-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cead32b6a86e1b6e26068e47bd1b866d8c87c42989a30b2e7f78a0a38443af1`
MD5	`686e00041864f5b877d9c6dd535a45e4`
BLAKE2b-256	`82eb6c0a840cd67dcc6a1e94e0c2dc6e0cbfa822c738b8389781a048a71fb5c2`

See more details on using hashes here.

pocong 1.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

POCONG 🪦

Features

Installation

Usage: Get Proxy from proxy_spiders

Usage: Dynamic Media Web Scraping

Basic Usage

Proxy Configuration Options

1. Automatic Proxy (Default)

2. Manual Proxy Configuration

3. No Proxy

4. Manual Proxy Override

Complete Example with Proxy Integration

Extracted Data Structure

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes