Process management and monitoring library

Project description

hget-audio - Website Audio Downloader

[English]

Comprehensive Error Handling

hget-audio implements robust error handling throughout the application. When errors occur:

Non-verbose mode (default):
- Captures all exceptions and displays a user-friendly message
- Recommends using --verbose for detailed error information
- Provides a unique error code for reference
- Logs full error details to a file for later analysis
Verbose mode (--verbose):
- Displays complete error tracebacks
- Shows internal state information for debugging
- Includes additional diagnostic data
- Does not capture exceptions - allows full error propagation

Error Handling Examples

Without verbose flag: 2023-10-15 14:30:25 [ERROR] Download failed (Error Code: DL-102) Error: Connection timeout while downloading audio file. Solution: Try increasing timeout with --timeout option For more details, run with --verbose flag or check error log: errors_20231015_143025.log

text

With verbose flag: 2023-10-15 14:30:25 [ERROR] Full traceback: File "/path/to/hget_audio/pipelines.py", line 215, in media_downloaded response = super().media_downloaded(response, request, info, item=item) File "/path/to/scrapy/pipelines/files.py", line 320, in media_downloaded raise FileException("Connection timeout")

scrapy.exceptions.FileException: Connection timeout

Request details:

URL: https://example.com/audio/large.mp3

Referer: https://example.com/audio-page

Size: 150 MB (exceeds max size of 100 MB)

Format: audio/mpeg

Retry count: 2/3

System information:

Python: 3.9.12

Scrapy: 2.7.1

Platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.31

text

Error Code Reference

Code Range	Error Type	Example Codes
100-199	Network Errors	101: Connection, 102: Timeout
200-299	File Validation Errors	201: Invalid type, 202: Size
300-399	Configuration Errors	301: Invalid URL, 302: Invalid depth
400-499	Scraping Errors	401: Parser, 402: Spider
500-599	System Errors	501: Disk full, 502: Permissions

Error Logging

All errors are logged to timestamped files in the error_logs directory: error_logs/ ├── errors_20231015_143025.log ├── errors_20231016_093412.log └── errors_20231017_154723.log

text

Each log file contains:

Full error traceback
Request and response details
System environment information
Configuration settings at time of error
Memory usage statistics

Installation

Using pip

pip install hget-audio
From source
bash
git clone https://github.com/hyy-PROG/hget_audio.git
cd hget_audio
pip install .
Command Line Usage
Basic command
bash
hget-audio "https://example.com/audio-page" -o "my_audios"
Advanced options
bash
hget-audio "https://example.com" \
  -d 3 \
  -c 8 \
  -f "mp3,wav" \
  --exclude "admin,private" \
  --max-size 50 \
  --timeout 30 \
  --retries 3 \
  -o "filtered_audios"
Full options
bash
hget-audio --help
API Usage
python
from hget_audio.api import download_audio

# Download website audio
result = download_audio(
    url="https://example.com/audio-page",
    output_dir="my_audios",
    depth=2,
    formats="mp3,wav",
    verbose=True  # Enable detailed error reporting
)

print(f"Downloaded {result['audio_downloaded']} audio files")
print(f"Total size: {result['total_size'] / (1024*1024):.2f} MB")
Configuration Options
Option	Description	Default
-o, --output	Output directory	hget.output
-d, --depth	Crawl depth	2
-c, --concurrency	Concurrent requests	16
-f, --formats	Audio formats (comma-separated)	mp3,wav,ogg,m4a,flac,aac
--ignore-robots	Ignore robots.txt rules	False
--user-agent	Custom User-Agent	Default UA
--delay	Request delay (seconds)	0.5
--timeout	Request timeout (seconds)	30
--retries	Max retry attempts	3
--max-size	Max file size (MB)	100
--min-size	Min file size (KB)	1
--include	Include URL patterns (regex)	Empty
--exclude	Exclude URL patterns (regex)	logout,admin,login
--dry-run	Simulation mode (no download)	False
-v, --verbose	Verbose output and error reporting	False
Example Output
text
2023-10-15 14:30:25 [INFO] Starting crawl: https://example.com/audio-page
2023-10-15 14:30:26 [DEBUG] Parsing page (depth=0): https://example.com/audio-page
2023-10-15 14:30:27 [INFO] Audio found: https://example.com/audio/sample1.mp3
2023-10-15 14:30:28 [INFO] Download successful: my_audios/example_com/sample1.mp3
...
2023-10-15 14:31:05 [INFO] Spider closed
==================================================
Scraping Summary
==================================================
Website: https://example.com/audio-page
Output Directory: /path/to/my_audios
Total Pages Crawled: 42
Audio Files Found: 15
Audio Files Downloaded: 12
Audio Files Skipped: 3
Errors Encountered: 0
Total Download Size: 245.7 MB
Contribution Guidelines
Fork the repository

Create your feature branch (git checkout -b feature/your-feature)

Commit your changes (git commit -am 'Add some feature')

Push to the branch (git push origin feature/your-feature)

Create a Pull Request

License
This project is licensed under the MIT License - see the LICENSE file for details.

Contact
For issues or suggestions: support@hget-audio.example

[中文]

全面的错误处理
hget-audio 在整个应用程序中实现了强大的错误处理机制。当发生错误时：

非详细模式（默认）:

捕获所有异常并显示用户友好的消息

建议使用 --verbose 参数获取详细错误信息

提供唯一的错误代码供参考

将完整错误详情记录到文件以供后续分析

详细模式 (--verbose):

显示完整的错误跟踪信息

显示内部状态信息用于调试

包含额外的诊断数据

不捕获异常 - 允许错误完全传播

错误处理示例
不使用详细标志:

text
2023-10-15 14:30:25 [ERROR] 下载失败 (错误代码: DL-102)
错误: 下载音频文件时连接超时
解决方案: 尝试使用 --timeout 选项增加超时时间
更多详情请使用 --verbose 参数运行或查看错误日志: errors_20231015_143025.log
使用详细标志:

text
2023-10-15 14:30:25 [ERROR] 完整错误跟踪:
  File "/path/to/hget_audio/pipelines.py", line 215, in media_downloaded
    response = super().media_downloaded(response, request, info, item=item)
  File "/path/to/scrapy/pipelines/files.py", line 320, in media_downloaded
    raise FileException("连接超时")
    
scrapy.exceptions.FileException: 连接超时

请求详情:
- URL: https://example.com/audio/large.mp3
- 来源页面: https://example.com/audio-page
- 大小: 150 MB (超过最大 100 MB 限制)
- 格式: audio/mpeg
- 重试次数: 2/3

系统信息:
- Python: 3.9.12
- Scrapy: 2.7.1
- 平台: Linux-5.15.0-86-generic-x86_64-with-glibc2.31
错误代码参考
代码范围	错误类型	示例代码
100-199	网络错误	101: 连接错误, 102: 超时
200-299	文件验证错误	201: 无效类型, 202: 大小不符
300-399	配置错误	301: 无效URL, 302: 无效深度
400-499	抓取错误	401: 解析错误, 402: 爬虫错误
500-599	系统错误	501: 磁盘已满, 502: 权限错误
错误日志记录
所有错误都记录在 error_logs 目录的时间戳文件中：

text
error_logs/
├── errors_20231015_143025.log
├── errors_20231016_093412.log
└── errors_20231017_154723.log
每个日志文件包含：

完整的错误跟踪信息

请求和响应详情

系统环境信息

错误发生时的配置设置

内存使用统计

安装
使用 pip 安装
bash
pip install hget-audio
从源码安装
bash
git clone https://github.com/hyy-PROG/hget_audio.git
cd hget-audio
pip install .
命令行使用
基本命令
bash
hget-audio "https://example.com/audio-page" -o "my_audios"
高级选项
bash
hget-audio "https://example.com" \
  -d 3 \
  -c 8 \
  -f "mp3,wav" \
  --exclude "admin,private" \
  --max-size 50 \
  --timeout 30 \
  --retries 3 \
  -o "filtered_audios"
完整选项
bash
hget-audio --help
API 使用
python
from hget_audio.api import download_audio

# 下载网站音频
result = download_audio(
    url="https://example.com/audio-page",
    output_dir="my_audios",
    depth=2,
    formats="mp3,wav",
    verbose=True  # 启用详细错误报告
)

print(f"下载了 {result['audio_downloaded']} 个音频文件")
print(f"总大小: {result['total_size'] / (1024*1024):.2f} MB")
配置选项
选项	描述	默认值
-o, --output	输出目录	hget.output
-d, --depth	爬取深度	2
-c, --concurrency	并发请求数	16
-f, --formats	音频格式 (逗号分隔)	mp3,wav,ogg,m4a,flac,aac
--ignore-robots	忽略 robots.txt 规则	False
--user-agent	自定义 User-Agent	默认 UA
--delay	请求延迟 (秒)	0.5
--timeout	请求超时时间 (秒)	30
--retries	最大重试次数	3
--max-size	最大文件大小 (MB)	100
--min-size	最小文件大小 (KB)	1
--include	包含的 URL 模式 (正则)	空
--exclude	排除的 URL 模式 (正则)	logout,admin,login
--dry-run	模拟运行模式 (不下载)	False
-v, --verbose	详细输出和错误报告	False
示例输出
text
2023-10-15 14:30:25 [INFO] 开始爬取: https://example.com/audio-page
2023-10-15 14:30:26 [DEBUG] 解析页面 (depth=0): https://example.com/audio-page
2023-10-15 14:30:27 [INFO] 发现音频: https://example.com/audio/sample1.mp3
2023-10-15 14:30:28 [INFO] 下载成功: my_audios/example_com/sample1.mp3
...
2023-10-15 14:31:05 [INFO] 爬虫结束
==================================================
爬取统计
==================================================
网站: https://example.com/audio-page
输出目录: /path/to/my_audios
爬取页面: 42
发现音频: 15
下载音频: 12
跳过音频: 3
错误: 0
总下载大小: 245.7 MB
贡献指南
Fork 项目仓库

创建特性分支 (git checkout -b feature/your-feature)

提交更改 (git commit -am '添加新功能')

推送到分支 (git push origin feature/your-feature)

创建 Pull Request

许可证
本项目采用 MIT 许可证 - 详情请见 LICENSE 文件。

Project details

Release history Release notifications | RSS feed

1.0.2

Jul 28, 2025

1.0.1

Jul 28, 2025

This version

1.0.0

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hprocess-1.0.0.tar.gz (55.9 kB view details)

Uploaded Jul 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hprocess-1.0.0-py3-none-any.whl (54.0 kB view details)

Uploaded Jul 28, 2025 Python 3

File details

Details for the file hprocess-1.0.0.tar.gz.

File metadata

Download URL: hprocess-1.0.0.tar.gz
Upload date: Jul 28, 2025
Size: 55.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for hprocess-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`154010252e6414c59307efd15ff99665f78092e376ea67fa58651710394eebbb`
MD5	`349cfe9ce3c89ed9ede8ce7a7e0fde77`
BLAKE2b-256	`88e6f205098c9ce3622ea7cdfbe93b0296522df46a6e092eacbaedfa0412b63f`

See more details on using hashes here.

File details

Details for the file hprocess-1.0.0-py3-none-any.whl.

File metadata

Download URL: hprocess-1.0.0-py3-none-any.whl
Upload date: Jul 28, 2025
Size: 54.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for hprocess-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`71952b869d14ddb6879a7bc6af96d74b6b20a1014ca132305555cbf8606a00d6`
MD5	`d86a2b47d8614a84caf5eb1d83fec1ee`
BLAKE2b-256	`3509f451419327834edc00449eae7a3a87287c735653c807c938ad000bd5cea2`

See more details on using hashes here.

hprocess 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

hget-audio - Website Audio Downloader

Comprehensive Error Handling

Error Handling Examples

Error Code Reference

Error Logging

Installation

Using pip

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes