Skip to main content

An API for working with IP addresses in Apache Spark.

Project description

license

PySparkIP

An API for working with IP addresses in Apache Spark. Built on top of ipaddress.

Usage

License

This project is licensed under the Apache License. Please see LICENSE file for more details.

Tutorial

Initialize

Before using, initialize PySparkIP by passing spark to SparkIPInit

from pyspark.sql import SparkSession
from src.SparkIP.SparkIP import *

spark = SparkSession.builder.appName("ipTest").getOrCreate()
SparkIPInit(spark)

SparkSQL Functions

Check address types

# Multicast
spark.sql("SELECT * FROM IPAddresses WHERE isMulticast(IPAddress)")

# Private
spark.sql("SELECT * FROM IPAddresses WHERE isPrivate(IPAddress)")

# Global
spark.sql("SELECT * FROM IPAddresses WHERE isGlobal(IPAddress)")

# Unspecified
spark.sql("SELECT * FROM IPAddresses WHERE isUnspecified(IPAddress)")

# Reserved
spark.sql("SELECT * FROM IPAddresses WHERE isReserved(IPAddress)")

# Loopback
spark.sql("SELECT * FROM IPAddresses WHERE isLoopback(IPAddress)")

# Link Local
spark.sql("SELECT * FROM IPAddresses WHERE isLinkLocal(IPAddress)")

# IPv4 Mapped
spark.sql("SELECT * FROM IPAddresses WHERE isIPv4Mapped(IPAddress)")

# 6to4
spark.sql("SELECT * FROM IPAddresses WHERE is6to4(IPAddress)")

# Teredo
spark.sql("SELECT * FROM IPAddresses WHERE isTeredo(IPAddress)")

# IPv4
spark.sql("SELECT * FROM IPAddresses WHERE isIPv4(IPAddress)")

# IPv6
spark.sql("SELECT * FROM IPAddresses WHERE isIPv6(IPAddress)")

Output address in different formats

# Exploded
spark.sql("SELECT explodedIP(IPAddress) FROM IPAddresses")

# Compressed
spark.sql("SELECT compressedIP(IPAddress) FROM IPAddresses")

# Teredo
spark.sql("SELECT teredo(IPAddress) FROM IPAddresses")

# IPv4 Mapped
spark.sql("SELECT IPv4Mapped(IPAddress) FROM IPAddresses")

# 6to4
spark.sql("SELECT sixtofour(IPAddress) FROM IPAddresses")

Sort or compare IP Addresses

# SparkSQL doesn't support values > LONG_MAX
# To sort or compare IPv6 addresses, use ipAsBinary
# To sort or compare IPv4 addresses, use either ipv4AsNum or ipAsBinary
# But ipv4AsNum is more efficient

# Compare
spark.sql("SELECT * FROM IPAddresses WHERE ipAsBinary(IPAddress) > ipAsBinary('192.209.45.194')")

# Sort
spark.sql("SELECT * FROM IPAddresses SORT BY ipAsBinary(IPAddress)")

# Sort ONLY IPv4
spark.sql("SELECT * FROM IPv4 SORT BY ipv4AsNum(IPAddress)")

IP network functions

# Network contains
spark.sql("SELECT * FROM IPAddresses WHERE networkContains(IPAddress, '195.0.0.0/16')")

IP Set

Create IP Sets using:

  • IP addresses
ip = ipaddress.ip_address("189.118.188.64")
ipSet = IPSet(ip)
  • IP networks
net = ipaddress.ip_network('::/16')
ipSet = IPSet(ip)
  • strings representing IP addresses or IP networks
ipStr = '192.0.0.0'
ipSet = IPSet(ipStr)
  • lists, tuples, or sets containing any/all of the above
setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"}
ipSet = IPSet(setOfIPs)
  • Or a mixture of any/all/none of the above!
setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"}
ipStr = '192.0.0.0'
net = ipaddress.ip_network('::/16')
ip = ipaddress.ip_address("189.118.188.64")
ipSet = IPSet(setOfIPs, '0.0.0.0', ipStr, net, ip)

Register IP Sets for use in SparkSQL:

Before using IP Sets in SparkSQL, register it by passing it to SparkIPSets

ipSet = IPSet('::')
ipSet2 = IPSet()

# Pass the set, then the set name
SparkIPSets.add(ipSet, 'ipSet')
SparkIPSets.add(ipSet2, 'ipSet2')

Remove IP Sets from registered sets in SparkSQL:

SparkIPSets.remove('ipSet', 'ipSet2')

Use IP Sets in SparkSQL:

# Note you have to pass the variable name using SparkSQL, not the actual variable

# Initialize an IP Set
setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"}
ipSet = IPSet(setOfIPs)

# Register it
SparkIPSets.add(ipSet, 'ipSet')

#Use it!
# Set Contains
spark.sql("SELECT * FROM IPAddresses WHERE setContains(IPAddress, 'ipSet')")

# Show sets available to use
SparkIPSets.setsAvailable()

# Remove a set
SparkIPSets.remove('ipSet')

# Clear sets available
SparkIPSets.clear()

IP Set functions (outside of SparkSQL):

ipSet = IPSet()

# Add
ipSet.add('0.0.0.0', '::/16')

# Remove
ipSet.remove('::/16')

# Contains
ipSet.contains('0.0.0.0', '::')

# Clear
ipSet.clear()

# Show all
ipSet.showAll()

# Union
ipSet2 = ('2001::', '::33', 'ffff::f')
ipSet.union(ipSet2)

# Intersects
ipSet.intersects(ipSet2)

# Diff
ipSet.diff(ipSet2)

# Is empty
ipSet.isEmpty()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PySparkIP-1.0.7.tar.gz (7.8 kB view hashes)

Uploaded Source

Built Distribution

PySparkIP-1.0.7-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page