A set of open-source Information Security tools for the 🦜🔗 LangChain framework

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

package workflow

🤿 port43

⚠️ [work-in-progess]

A set of open-source Information Security tools for the 🦜🔗 LangChain framework

Premise

Port43 can help you build Information Security-based LLM applications.

A few use-cases include ...

Enabling Threat and SOC Analysts to query SIEM's using natural language
Parsing and extracting data from DNS, WHOIS, and RDAP queries
Gathering HTML, favicons, certificates, or screenshots from phishing sites on the internet
Connecting popular Information Security API's (shodan, virustotal, etc.) with LLM's

... or combining any or all of the steps above into a single workflow!

Quickstart

Check out the examples/ folder for each example's complete code.

Basic example: WHOIS

WHOIS is a query and response protocol that is used for querying databases that store an Internet resource's registered users or assignees - Wikipedia

Unlike the modern RDAP standard which uses a JSON schema, the format of WHOIS responses follow a semi-free text format. So in other words, WHOIS is "Fragile, unparseable, obsolete... and universally relied upon"

In order to parse WHOIS text responses from different registrars into a set of standardized key-value pairs that can be used by applications many open-source libraries have implemented a combination of regular expressions and text mining techniques. Despite some success the amount of edge-cases or registrars with unconventional implementations has caused an overall inconsistent feel for many developers wishing to integrate WHOIS data into their applications.

For example, here is the authoritative output of whois umich.edu, which doesn't necessary follow the conventional single line key:value format:

-------------------------------------------------------------

Domain Name: UMICH.EDU

Registrant:
	University of Michigan -- ITD
	ITCS, Arbor Lakes
	4251 Plymouth Road
	Ann Arbor, MI 48105-2785
	USA

Administrative Contact:
	Domain Admin
	University of Michigan
	ITS, Arbor Lakes
	4251 Plymouth Road
	Ann Arbor, MI 48105-3640
	USA
	+1.7347641817
	domainreg@umich.edu

Technical Contact:
	 
	University of Michigan
	ITS, Arbor Lakes
	4251 Plymouth Road
	Ann Arbor, MI 48105-3640
	USA
	+1.7347641817
	domainreg@umich.edu

Name Servers:
	UMICH-EDU.DNS.UMICH.COM
	UMICH-EDU.DNS.UMICH.ORG
	UMICH-EDU.DNS.UMICH.NET

Domain record activated:    07-Oct-1985
Domain record last updated: 04-Jan-2024
Domain expires:             31-Jul-2024

Fortunately, the ever-growing capabilities of LLM's have made it possible to frame this problem in terms of an "AI-assistant" (aka ChatModel) leading to impressive results with zero pre- and post-processing.

Here is some example code:

# get a blob of WHOIS text
text, _ = asyncwhois.whois("umich.edu", authoritative_only=True)
# craft a prompt to extract key/values from the whois text
# the prompt asks the LLM to take the text and convert it into a standardized JSON format
prompt = WhoisTextToJson  # port43.prompts.whois_text_to_json.py
# pull any open-source LLM from HuggingFace
# or use Ollama: model = llm = ChatOllama("mistral")
llm = HuggingFaceHub(
    repo_id="HuggingFaceH4/zephyr-7b-beta",
    task="text-generation",
    huggingfacehub_api_token=<HF_API_TOKEN>,
    model_kwargs={"max_new_tokens": 2048},
)
# wrapper for HuggingFace LLM's
model = ChatHuggingFace(llm=llm)
# LCEL
chain = prompt | model | StrOutputParser()
# view the result
pprint(chain.invoke(input={"data": text}))

View the Result

Note that there is absolutely no postprocessing of the LLM output. The LLM was able to match all keys/values on its own. Further processing could be added to convert timestamps, fill-in null values, or modify values for a specific use-case.

{
  "admin_address": "University of Michigan -- ITD\\nITCS, Arbor Lakes\\n4251 Plymouth Road\\nAnn Arbor, MI 48105-2785\\nUSA",
  "admin_city": "Ann Arbor",
  "admin_country": "USA",
  "admin_email": "domainreg@umich.edu",
  "admin_fax": "+1.7347641817",
  "admin_id": "",
  "admin_name": "",
  "admin_organization": "University of Michigan -- ITD",
  "admin_phone": "+1.7347641817",
  "admin_state": "",
  "admin_zipcode": "48105-3640",
  "billing_address": "University of Michigan -- ITD\\nITCS, Arbor Lakes\\n4251 Plymouth Road\\nAnn Arbor, MI 48105-3640\\nUSA",
  "billing_city": "Ann Arbor",
  "billing_country": "USA",
  "billing_email": "",
  "billing_fax": "+1.7347641817",
  "billing_id": "",
  "billing_name": "",
  "billing_organization": "University of Michigan -- ITD",
  "billing_phone": "+1.7347641817",
  "billing_state": "",
  "billing_zipcode": "48105-3640",
  "created": "07-Oct-1985",
  "dnssec": "",
  "domain_name": "UMICH.EDU",
  "expires": "31-Jul-2024",
  "name_servers": [
    "UMICH-EDU.DNS.UMICH.ORG",
    "UMICH-EDU.DNS.UMICH.NET",
    "UMICH-EDU.DNS.UMICH.COM"
  ],
  "registrant_address": "University of Michigan -- ITD\\nITCS, Arbor Lakes\\n4251 Plymouth Road\\nAnn Arbor, MI 48105-2785\\nUSA",
  "registrant_city": "Ann Arbor",
  "registrant_country": "USA",
  "registrant_email": "",
  "registrant_fax": "+1.7347641817",
  "registrant_id": "",
  "registrant_name": "",
  "registrant_organization": "University of Michigan -- ITD",
  "registrant_phone": "+1.7347641817",
  "registrant_state": "",
  "registrant_zipcode": "48105-2785",
  "registrar": "",
  "registrar_abuse_email": "",
  "registrar_abuse_phone": "",
  "registrar_iana_id": "",
  "registrar_url": "",
  "status": [
    "active"
  ],
  "tech_address": "University of Michigan\\nITS, Arbor Lakes\\n4251 Plymouth Road\\nAnn Arbor, MI 48105-3640\\nUSA",
  "tech_city": "Ann Arbor",
  "tech_country": "USA",
  "tech_email": "",
  "tech_fax": "+1.7347641817",
  "tech_id": "",
  "tech_name": "",
  "tech_organization": "University of Michigan",
  "tech_phone": "+1.7347641817",
  "tech_state": "",
  "tech_zipcode": "48105-3640",
  "updated": "04-Jan-2024"
}

This whois example is just scratching the surface of what kind of problems LLM's can tackle. Again, the goal of Port43 is to highlight more use-cases and expand AI-first information security workflows.

Basic Agent: Finding DNS Records

# add some tools
tools = [DNSTool(), WHOISTool()]
# get the ReAct prompt
prompt = get_react_json_prompt(tools, render_args=True)
# init any LLM; in this example we're using mistral via Ollama
# figure out how to use Ollama here: https://ollama.com
llm = ChatOllama(model="mistral", temperature=0)
# have the model stop after solving the exercise
chat_model_with_stop = llm.bind(stop=["\nObservation"])
# create the agent
agent = (
    {
        "input": lambda x: x["input"],
        "chat_history": lambda x: (
            _format_chat_history(x["chat_history"]) if x.get("chat_history") else []
        ),
        "agent_scratchpad": lambda x: format_log_to_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | chat_model_with_stop
    | ReActJsonSingleInputOutputParser()
)
# create an executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
pprint(
    agent_executor.invoke(
        {
            "input": "How many DNS records does google.com have? What are the MX records?"
        }
    )
)

View the Result

examples/scripts/basic_react_agent_01.py

"""
> Entering new AgentExecutor chain...
 Thought: I need to find out how many DNS records google.com has and what its MX records are. I can use the dns_search tool for this.
Action:```json
{
    "action": "dns_search",
    "action_input": {
        "hostname": "google.com"
    }
}
```{
  "A": "142.250.191.142",
  "NS": "ns4.google.com.",
  "SOA": "ns1.google.com. dns-admin.google.com. 611883130 900 900 1800 60",
  "MX": "10 smtp.google.com.",
  "TXT": "\"apple-domain-verification=30afIBcvSuDV2PLX\"",
  "AAAA": "2607:f8b0:4009:818::200e",
  "CAA": "0 issue \"pki.goog\""
} Observation: The DNS records for google.com include one A record, two NS records, one SOA record, one MX record, one TXT record, one AAAA record, and one CAA record. The MX record is "10 smtp.google.com."
Thought: I now have the information to answer the original question.
Final Answer: Google.com has a total of 7 DNS records, including 1 A record, 2 NS records, 1 SOA record, 1 MX record, 1 TXT record, 1 AAAA record, and 1 CAA record. The MX records are "10 smtp.google.com."

> Finished chain.
{'input': 'How many DNS records does google.com have? What are the MX records?',
 'output': 'Google.com has a total of 7 DNS records, including 1 A record, 2 '
           'NS records, 1 SOA record, 1 MX record, 1 TXT record, 1 AAAA '
           'record, and 1 CAA record. The MX records are "10 smtp.google.com."'}
"""

Advanced use-case: Threat Hunting using Natural Language

coming soon...

Advanced use-case: Domain Monitoring & Phishing Detection

coming soon...

Roadmap

Continue to expand the number of Tools
- common interface for SIEM query integrations (Splunk, Elasticsearch, SumoLogic, etc.)
- popular infosec API's (shodan, virustotal, ..., etc.)
- popular open-source cli libraries (dnstwist, ..., etc.)
Add examples for advanced use-cases
Abstract some of the LangChain Agent setup

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.dev0 pre-release

Mar 5, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

port43-0.1.dev0.tar.gz (17.9 kB view details)

Uploaded Mar 5, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

port43-0.1.dev0-py3-none-any.whl (18.9 kB view details)

Uploaded Mar 5, 2024 Python 3

File details

Details for the file port43-0.1.dev0.tar.gz.

File metadata

Download URL: port43-0.1.dev0.tar.gz
Upload date: Mar 5, 2024
Size: 17.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for port43-0.1.dev0.tar.gz
Algorithm	Hash digest
SHA256	`3766f949c8743b5448be96f62c70de6261d64b46b5ad3987225edc608354480a`
MD5	`e03fe81192741e213a4fd5e2f97834c1`
BLAKE2b-256	`9b4db65426aa9418ed2140dfcfc1ac5fb02764db93a4bf1bf1bdfacb4ecdb688`

See more details on using hashes here.

File details

Details for the file port43-0.1.dev0-py3-none-any.whl.

File metadata

Download URL: port43-0.1.dev0-py3-none-any.whl
Upload date: Mar 5, 2024
Size: 18.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for port43-0.1.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d32140497ee0ba409c2cfc7ef1f599e4a9c63eeaddd2a0172e86efd8ee4b0c8`
MD5	`80ebad66b9beb6784196ff24f6275b7f`
BLAKE2b-256	`c98608d07b0ad021f035d191f7a6a6d55c415d6170f95c6545c73bfcd44d1ef5`

See more details on using hashes here.

port43 0.1.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🤿 port43

Premise

Quickstart

Basic example: WHOIS

Basic Agent: Finding DNS Records

Advanced use-case: Threat Hunting using Natural Language

Advanced use-case: Domain Monitoring & Phishing Detection

Roadmap

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes