Skip to main content

Web/File Information Extractor

Project description

penterepTools

PTINSEARCHER - Web/File Information Extractor

ptinsearcher is a tool designed to extract information from sources such as URLs and files. It can retrieve HTML comments, email addresses, phone numbers, IP addresses, subdomains, HTML forms, links, and document metadata.

Installation

pip install ptinsearcher
sudo apt-get install libmagic1

Adding to PATH

If you're unable to invoke the script from your terminal, it's likely because it's not included in your PATH. You can resolve this issue by executing the following commands, depending on the shell you're using:

For Bash Users

echo "export PATH=\"`python3 -m site --user-base`/bin:\$PATH\"" >> ~/.bashrc
source ~/.bashrc

For ZSH Users

echo "export PATH=\"`python3 -m site --user-base`/bin:\$PATH\"" >> ~/.zshrc
source ~/.zshrc

Usage examples

   ptinsearcher -u https://www.example.com/
   ptinsearcher -u https://www.example.com/ --extract E        # Extract emails
   ptinsearcher -u https://www.example.com/ --extract UQX      # Extract internal URLs, internal URLs w/ parameters, external URLs
   ptinsearcher -f url_list.txt --grouping
   ptinsearcher -f url_list.txt --grouping-complete
   ptinsearcher -f url_list.txt
   ptinsearcher -u image.jpg -e M
   ptinsearcher -u images/*.jpg -e M

Options

   -u   --url                 <url>           Test URL or File
   -f   --file                <file>          Load list of URLs from file
   -d   --domain              <domain>        Domain - merge domain with filepath. Use when wordlist contains filepaths (e.g. /index.php)
   -e   --extract             <extract>       Specify data to extract:
                                 A              All (extracts everything - default option)
                                 E              Emails
                                 S              Subdomains
                                 C              Comments
                                 F              Forms
                                 I              IP addresses
                                 P              Phone numbers
                                 U              Internal urls
                                 Q              Internal urls with parameters
                                 X              External urls
                                 N              Insecure urls
                                 M              Metadata

   -ey  --extension-yes       <extensions>    Process only URLs from <list> that end with <extension-yes>
   -en  --extension-no        <extensions>    Process only URLs from <list> that do not end with <extension-no>
   -g   --grouping                            Group findings from multiple sources into one table
   -gc  --grouping-complete                   Group and merge findings from multiple sources into one result
   -gp  --group-parameters                    Group URL parameters
   -wp  --without-parameters                  Without URL parameters
   -op  --output-parts                        Save each extract-type to separate file
   -o   --output              <output>        Save output to file
   -p   --proxy               <proxy>         Set proxy (e.g. http://127.0.0.1:8080)
   -T   --timeout             <timeout>       Set timeout
   -c   --cookie              <cookie=value>  Set cookie
   -a   --user-agent          <user-agent>    Set User-Agent
   -t   --threads             <threads>       Set Threads
   -H   --headers             <header:value>  Set custom header(s)
   -r   --redirects                           Follow redirects (default False)
   -C   --cache                               Cache requests (load from tmp in future)
   -b   --binary                              Search also non-text Content-Type responses
   -v   --version                             Show script version and exit
   -h   --help                                Show this help message and exit
   -j   --json                                Output in JSON format

Dependencies

We use ExifTool to extract metadata.

ptlibs
bs4
lxml
pyexiftool
validators
python-magic

License

Copyright (c) 2025 Penterep Security s.r.o.

ptinsearcher is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

ptinsearcher is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with ptinsearcher. If not, see https://www.gnu.org/licenses/.

Warning

You are only allowed to run the tool against the websites which you have been given permission to pentest. We do not accept any responsibility for any damage/harm that this application causes to your computer, or your network. Penterep is not responsible for any illegal or malicious use of this code. Be Ethical!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptinsearcher-1.0.42.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ptinsearcher-1.0.42-py3-none-any.whl (41.4 kB view details)

Uploaded Python 3

File details

Details for the file ptinsearcher-1.0.42.tar.gz.

File metadata

  • Download URL: ptinsearcher-1.0.42.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for ptinsearcher-1.0.42.tar.gz
Algorithm Hash digest
SHA256 73e74e2b5995af0567a1e43b90209c724f5df1f9fe378a6f8a387e86e8aadba9
MD5 553fbe2cfcad0119cb89bd3752aa3239
BLAKE2b-256 48b44c6836b202bbd10662a877a3e7416d185cc5e39e1c15497225a23d4dd091

See more details on using hashes here.

File details

Details for the file ptinsearcher-1.0.42-py3-none-any.whl.

File metadata

  • Download URL: ptinsearcher-1.0.42-py3-none-any.whl
  • Upload date:
  • Size: 41.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for ptinsearcher-1.0.42-py3-none-any.whl
Algorithm Hash digest
SHA256 5864aa6fdcad1b9ae2a6de7db54481bac916c4585b55c2d050acb23ad98a1f55
MD5 0ff6d06b2336781fbfcd7bf1711f03a8
BLAKE2b-256 829222bd65a6ac9fea7c68da66932cff2b48b0d2c49b5986193e8bd21b6e4fb5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page