Skip to main content

Process management and monitoring library

Project description

hget-audio - Website Audio Downloader

License: MIT Python Version Scrapy Version

[English]

Comprehensive Error Handling

hget-audio implements robust error handling throughout the application. When errors occur:

  1. Non-verbose mode (default):

    • Captures all exceptions and displays a user-friendly message
    • Recommends using --verbose for detailed error information
    • Provides a unique error code for reference
    • Logs full error details to a file for later analysis
  2. Verbose mode (--verbose):

    • Displays complete error tracebacks
    • Shows internal state information for debugging
    • Includes additional diagnostic data
    • Does not capture exceptions - allows full error propagation

Error Handling Examples

Without verbose flag: 2023-10-15 14:30:25 [ERROR] Download failed (Error Code: DL-102) Error: Connection timeout while downloading audio file. Solution: Try increasing timeout with --timeout option For more details, run with --verbose flag or check error log: errors_20231015_143025.log

text

With verbose flag: 2023-10-15 14:30:25 [ERROR] Full traceback: File "/path/to/hget_audio/pipelines.py", line 215, in media_downloaded response = super().media_downloaded(response, request, info, item=item) File "/path/to/scrapy/pipelines/files.py", line 320, in media_downloaded raise FileException("Connection timeout")

scrapy.exceptions.FileException: Connection timeout

Request details:

URL: https://example.com/audio/large.mp3

Referer: https://example.com/audio-page

Size: 150 MB (exceeds max size of 100 MB)

Format: audio/mpeg

Retry count: 2/3

System information:

Python: 3.9.12

Scrapy: 2.7.1

Platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.31

text

Error Code Reference

Code Range Error Type Example Codes
100-199 Network Errors 101: Connection, 102: Timeout
200-299 File Validation Errors 201: Invalid type, 202: Size
300-399 Configuration Errors 301: Invalid URL, 302: Invalid depth
400-499 Scraping Errors 401: Parser, 402: Spider
500-599 System Errors 501: Disk full, 502: Permissions

Error Logging

All errors are logged to timestamped files in the error_logs directory: error_logs/ ├── errors_20231015_143025.log ├── errors_20231016_093412.log └── errors_20231017_154723.log

text

Each log file contains:

  1. Full error traceback
  2. Request and response details
  3. System environment information
  4. Configuration settings at time of error
  5. Memory usage statistics

Installation

Using pip

pip install hget-audio
From source
bash
git clone https://github.com/hyy-PROG/hget_audio.git
cd hget_audio
pip install .
Command Line Usage
Basic command
bash
hget-audio "https://example.com/audio-page" -o "my_audios"
Advanced options
bash
hget-audio "https://example.com" \
  -d 3 \
  -c 8 \
  -f "mp3,wav" \
  --exclude "admin,private" \
  --max-size 50 \
  --timeout 30 \
  --retries 3 \
  -o "filtered_audios"
Full options
bash
hget-audio --help
API Usage
python
from hget_audio.api import download_audio

# Download website audio
result = download_audio(
    url="https://example.com/audio-page",
    output_dir="my_audios",
    depth=2,
    formats="mp3,wav",
    verbose=True  # Enable detailed error reporting
)

print(f"Downloaded {result['audio_downloaded']} audio files")
print(f"Total size: {result['total_size'] / (1024*1024):.2f} MB")
Configuration Options
Option	Description	Default
-o, --output	Output directory	hget.output
-d, --depth	Crawl depth	2
-c, --concurrency	Concurrent requests	16
-f, --formats	Audio formats (comma-separated)	mp3,wav,ogg,m4a,flac,aac
--ignore-robots	Ignore robots.txt rules	False
--user-agent	Custom User-Agent	Default UA
--delay	Request delay (seconds)	0.5
--timeout	Request timeout (seconds)	30
--retries	Max retry attempts	3
--max-size	Max file size (MB)	100
--min-size	Min file size (KB)	1
--include	Include URL patterns (regex)	Empty
--exclude	Exclude URL patterns (regex)	logout,admin,login
--dry-run	Simulation mode (no download)	False
-v, --verbose	Verbose output and error reporting	False
Example Output
text
2023-10-15 14:30:25 [INFO] Starting crawl: https://example.com/audio-page
2023-10-15 14:30:26 [DEBUG] Parsing page (depth=0): https://example.com/audio-page
2023-10-15 14:30:27 [INFO] Audio found: https://example.com/audio/sample1.mp3
2023-10-15 14:30:28 [INFO] Download successful: my_audios/example_com/sample1.mp3
...
2023-10-15 14:31:05 [INFO] Spider closed
==================================================
Scraping Summary
==================================================
Website: https://example.com/audio-page
Output Directory: /path/to/my_audios
Total Pages Crawled: 42
Audio Files Found: 15
Audio Files Downloaded: 12
Audio Files Skipped: 3
Errors Encountered: 0
Total Download Size: 245.7 MB
Contribution Guidelines
Fork the repository

Create your feature branch (git checkout -b feature/your-feature)

Commit your changes (git commit -am 'Add some feature')

Push to the branch (git push origin feature/your-feature)

Create a Pull Request

License
This project is licensed under the MIT License - see the LICENSE file for details.

Contact
For issues or suggestions: support@hget-audio.example

[中文]

全面的错误处理
hget-audio 在整个应用程序中实现了强大的错误处理机制。当发生错误时:

非详细模式(默认):

捕获所有异常并显示用户友好的消息

建议使用 --verbose 参数获取详细错误信息

提供唯一的错误代码供参考

将完整错误详情记录到文件以供后续分析

详细模式 (--verbose):

显示完整的错误跟踪信息

显示内部状态信息用于调试

包含额外的诊断数据

不捕获异常 - 允许错误完全传播

错误处理示例
不使用详细标志:

text
2023-10-15 14:30:25 [ERROR] 下载失败 (错误代码: DL-102)
错误: 下载音频文件时连接超时
解决方案: 尝试使用 --timeout 选项增加超时时间
更多详情请使用 --verbose 参数运行或查看错误日志: errors_20231015_143025.log
使用详细标志:

text
2023-10-15 14:30:25 [ERROR] 完整错误跟踪:
  File "/path/to/hget_audio/pipelines.py", line 215, in media_downloaded
    response = super().media_downloaded(response, request, info, item=item)
  File "/path/to/scrapy/pipelines/files.py", line 320, in media_downloaded
    raise FileException("连接超时")
    
scrapy.exceptions.FileException: 连接超时

请求详情:
- URL: https://example.com/audio/large.mp3
- 来源页面: https://example.com/audio-page
- 大小: 150 MB (超过最大 100 MB 限制)
- 格式: audio/mpeg
- 重试次数: 2/3

系统信息:
- Python: 3.9.12
- Scrapy: 2.7.1
- 平台: Linux-5.15.0-86-generic-x86_64-with-glibc2.31
错误代码参考
代码范围	错误类型	示例代码
100-199	网络错误	101: 连接错误, 102: 超时
200-299	文件验证错误	201: 无效类型, 202: 大小不符
300-399	配置错误	301: 无效URL, 302: 无效深度
400-499	抓取错误	401: 解析错误, 402: 爬虫错误
500-599	系统错误	501: 磁盘已满, 502: 权限错误
错误日志记录
所有错误都记录在 error_logs 目录的时间戳文件中:

text
error_logs/
├── errors_20231015_143025.log
├── errors_20231016_093412.log
└── errors_20231017_154723.log
每个日志文件包含:

完整的错误跟踪信息

请求和响应详情

系统环境信息

错误发生时的配置设置

内存使用统计

安装
使用 pip 安装
bash
pip install hget-audio
从源码安装
bash
git clone https://github.com/hyy-PROG/hget_audio.git
cd hget-audio
pip install .
命令行使用
基本命令
bash
hget-audio "https://example.com/audio-page" -o "my_audios"
高级选项
bash
hget-audio "https://example.com" \
  -d 3 \
  -c 8 \
  -f "mp3,wav" \
  --exclude "admin,private" \
  --max-size 50 \
  --timeout 30 \
  --retries 3 \
  -o "filtered_audios"
完整选项
bash
hget-audio --help
API 使用
python
from hget_audio.api import download_audio

# 下载网站音频
result = download_audio(
    url="https://example.com/audio-page",
    output_dir="my_audios",
    depth=2,
    formats="mp3,wav",
    verbose=True  # 启用详细错误报告
)

print(f"下载了 {result['audio_downloaded']} 个音频文件")
print(f"总大小: {result['total_size'] / (1024*1024):.2f} MB")
配置选项
选项	描述	默认值
-o, --output	输出目录	hget.output
-d, --depth	爬取深度	2
-c, --concurrency	并发请求数	16
-f, --formats	音频格式 (逗号分隔)	mp3,wav,ogg,m4a,flac,aac
--ignore-robots	忽略 robots.txt 规则	False
--user-agent	自定义 User-Agent	默认 UA
--delay	请求延迟 ()	0.5
--timeout	请求超时时间 ()	30
--retries	最大重试次数	3
--max-size	最大文件大小 (MB)	100
--min-size	最小文件大小 (KB)	1
--include	包含的 URL 模式 (正则)	空
--exclude	排除的 URL 模式 (正则)	logout,admin,login
--dry-run	模拟运行模式 (不下载)	False
-v, --verbose	详细输出和错误报告	False
示例输出
text
2023-10-15 14:30:25 [INFO] 开始爬取: https://example.com/audio-page
2023-10-15 14:30:26 [DEBUG] 解析页面 (depth=0): https://example.com/audio-page
2023-10-15 14:30:27 [INFO] 发现音频: https://example.com/audio/sample1.mp3
2023-10-15 14:30:28 [INFO] 下载成功: my_audios/example_com/sample1.mp3
...
2023-10-15 14:31:05 [INFO] 爬虫结束
==================================================
爬取统计
==================================================
网站: https://example.com/audio-page
输出目录: /path/to/my_audios
爬取页面: 42
发现音频: 15
下载音频: 12
跳过音频: 3
错误: 0
总下载大小: 245.7 MB
贡献指南
Fork 项目仓库

创建特性分支 (git checkout -b feature/your-feature)

提交更改 (git commit -am '添加新功能')

推送到分支 (git push origin feature/your-feature)

创建 Pull Request

许可证
本项目采用 MIT 许可证 - 详情请见 LICENSE 文件。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hprocess-1.0.0.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hprocess-1.0.0-py3-none-any.whl (54.0 kB view details)

Uploaded Python 3

File details

Details for the file hprocess-1.0.0.tar.gz.

File metadata

  • Download URL: hprocess-1.0.0.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for hprocess-1.0.0.tar.gz
Algorithm Hash digest
SHA256 154010252e6414c59307efd15ff99665f78092e376ea67fa58651710394eebbb
MD5 349cfe9ce3c89ed9ede8ce7a7e0fde77
BLAKE2b-256 88e6f205098c9ce3622ea7cdfbe93b0296522df46a6e092eacbaedfa0412b63f

See more details on using hashes here.

File details

Details for the file hprocess-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: hprocess-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 54.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for hprocess-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71952b869d14ddb6879a7bc6af96d74b6b20a1014ca132305555cbf8606a00d6
MD5 d86a2b47d8614a84caf5eb1d83fec1ee
BLAKE2b-256 3509f451419327834edc00449eae7a3a87287c735653c807c938ad000bd5cea2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page