An API for working with IP addresses in Apache Spark.
Project description
PySparkIP
An API for working with IP addresses in Apache Spark. Built on top of ipaddress.
Usage
- pip install -i https://test.pypi.org/simple/ PySparkIP==1.0.2
- from SparkIP.SparkIP import *
License
This project is licensed under the Apache License. Please see LICENSE file for more details.
Tutorial
Initialize
Before using, initialize PySparkIP by passing spark
to SparkIPInit
from pyspark.sql import SparkSession
from src.SparkIP.SparkIP import *
spark = SparkSession.builder.appName("ipTest").getOrCreate()
SparkIPInit(spark)
SparkSQL Functions
Check address types
# Multicast
spark.sql("SELECT * FROM IPAddresses WHERE isMulticast(IPAddress)")
# Private
spark.sql("SELECT * FROM IPAddresses WHERE isPrivate(IPAddress)")
# Global
spark.sql("SELECT * FROM IPAddresses WHERE isGlobal(IPAddress)")
# Unspecified
spark.sql("SELECT * FROM IPAddresses WHERE isUnspecified(IPAddress)")
# Reserved
spark.sql("SELECT * FROM IPAddresses WHERE isReserved(IPAddress)")
# Loopback
spark.sql("SELECT * FROM IPAddresses WHERE isLoopback(IPAddress)")
# Link Local
spark.sql("SELECT * FROM IPAddresses WHERE isLinkLocal(IPAddress)")
# IPv4 Mapped
spark.sql("SELECT * FROM IPAddresses WHERE isIPv4Mapped(IPAddress)")
# 6to4
spark.sql("SELECT * FROM IPAddresses WHERE is6to4(IPAddress)")
# Teredo
spark.sql("SELECT * FROM IPAddresses WHERE isTeredo(IPAddress)")
# IPv4
spark.sql("SELECT * FROM IPAddresses WHERE isIPv4(IPAddress)")
# IPv6
spark.sql("SELECT * FROM IPAddresses WHERE isIPv6(IPAddress)")
Output address in different formats
# Exploded
spark.sql("SELECT explodedIP(IPAddress) FROM IPAddresses")
# Compressed
spark.sql("SELECT compressedIP(IPAddress) FROM IPAddresses")
# Teredo
spark.sql("SELECT teredo(IPAddress) FROM IPAddresses")
# IPv4 Mapped
spark.sql("SELECT IPv4Mapped(IPAddress) FROM IPAddresses")
# 6to4
spark.sql("SELECT sixtofour(IPAddress) FROM IPAddresses")
Sort or compare IP Addresses
# SparkSQL doesn't support values > LONG_MAX
# To sort or compare IPv6 addresses, use ipAsBinary
# To sort or compare IPv4 addresses, use either ipv4AsNum or ipAsBinary
# But ipv4AsNum is more efficient
# Compare
spark.sql("SELECT * FROM IPAddresses WHERE ipAsBinary(IPAddress) > ipAsBinary('192.209.45.194')")
# Sort
spark.sql("SELECT * FROM IPAddresses SORT BY ipAsBinary(IPAddress)")
# Sort ONLY IPv4
spark.sql("SELECT * FROM IPv4 SORT BY ipv4AsNum(IPAddress)")
IP network functions
# Network contains
spark.sql("SELECT * FROM IPAddresses WHERE networkContains(IPAddress, '195.0.0.0/16')")
IP Set
Create IP Sets using:
- IP addresses
ip = ipaddress.ip_address("189.118.188.64")
ipSet = IPSet(ip)
- IP networks
net = ipaddress.ip_network('::/16')
ipSet = IPSet(ip)
- strings representing IP addresses or IP networks
ipStr = '192.0.0.0'
ipSet = IPSet(ipStr)
- lists, tuples, or sets containing any/all of the above
setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"}
ipSet = IPSet(setOfIPs)
- Or a mixture of any/all/none of the above!
setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"}
ipStr = '192.0.0.0'
net = ipaddress.ip_network('::/16')
ip = ipaddress.ip_address("189.118.188.64")
ipSet = IPSet(setOfIPs, '0.0.0.0', ipStr, net, ip)
Register IP Sets for use in SparkSQL:
Before using IP Sets in SparkSQL, register it by passing it to SparkIPSets
ipSet = IPSet('::')
ipSet2 = IPSet()
# Pass the set, then the set name
SparkIPSets.add(ipSet, 'ipSet')
SparkIPSets.add(ipSet2, 'ipSet2')
Remove IP Sets from registered sets in SparkSQL:
SparkIPSets.remove('ipSet', 'ipSet2')
Use IP Sets in SparkSQL:
# Note you have to pass the variable name using SparkSQL, not the actual variable
# Initialize an IP Set
setOfIPs = {"192.0.0.0", "5422:6622:1dc6:366a:e728:84d4:257e:655a", "::"}
ipSet = IPSet(setOfIPs)
# Register it
SparkIPSets.add(ipSet, 'ipSet')
#Use it!
# Set Contains
spark.sql("SELECT * FROM IPAddresses WHERE setContains(IPAddress, 'ipSet')")
# Show sets available to use
SparkIPSets.setsAvailable()
# Remove a set
SparkIPSets.remove('ipSet')
# Clear sets available
SparkIPSets.clear()
IP Set functions (outside of SparkSQL):
ipSet = IPSet()
# Add
ipSet.add('0.0.0.0', '::/16')
# Remove
ipSet.remove('::/16')
# Contains
ipSet.contains('0.0.0.0', '::')
# Clear
ipSet.clear()
# Show all
ipSet.showAll()
# Union
ipSet2 = ('2001::', '::33', 'ffff::f')
ipSet.union(ipSet2)
# Intersects
ipSet.intersects(ipSet2)
# Diff
ipSet.diff(ipSet2)
# Is empty
ipSet.isEmpty()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PySparkIP-1.0.8.tar.gz
(7.8 kB
view hashes)
Built Distribution
Close
Hashes for PySparkIP-1.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ebb8d863c70674c43f55eaf3ab8636c07ac6d5db3d3c6c091d4bc78c4de96fc |
|
MD5 | 53926a6c696ea2af2b2ce6cce0175ade |
|
BLAKE2b-256 | 1888eef0d5c711964fae1e93b431d697554676c8deef8d16a7a05b1435d5e80b |