Transform GraphQL queries into Pandas data-frames.
Project description
Pluck 🚀 🍊
Pluck is a GraphQL client that transforms queries into Pandas data-frames.
Installation
Install Pluck from PyPi:
pip install pluck-graphql
Introduction
The easiest way to get started is to run pluck.execute
with a query.
Let's read the first five SpaceX launches into a data-frame:
import pluck
SpaceX = "https://api.spacex.land/graphql"
query = """
{
launches(limit: 5) {
mission_name
launch_date_local
rocket {
rocket_name
}
}
}
"""
frame, = pluck.execute(query, url=SpaceX)
frame
launches.mission_name | launches.launch_date_local | launches.rocket.rocket_name |
---|---|---|
Thaicom 6 | 2014-01-06T14:06:00-04:00 | Falcon 9 |
AsiaSat 6 | 2014-09-07T01:00:00-04:00 | Falcon 9 |
OG-2 Mission 2 | 2015-12-22T21:29:00-04:00 | Falcon 9 |
FalconSat | 2006-03-25T10:30:00+12:00 | Falcon 1 |
CRS-1 | 2012-10-08T20:35:00-04:00 | Falcon 9 |
Implicit Mode
The query above uses implicit mode. This is where the entire response is normalized into a single data-frame.
The return value from execute
is an instance of pluck.Response
. This object is iterable and enumerates the
data-frames in the query. Because this query uses implicit mode, the iterator contains only a single data-frame (note
that the trailing comma is still required).
@frame directive
But Pluck is more powerful than implicit mode because it provides a custom @frame
directive.
The @frame
directive specifies portions of the GraphQL response that we want to transform into data-frames. The
directive is removed before the query is sent to the GraphQL server.
Using the same query, rather than use implicit mode, let's pluck the launches
field from the response:
query = """
{
launches(limit: 5) @frame {
mission_name
launch_date_local
rocket {
rocket_name
}
}
}
"""
launches, = pluck.execute(query, url=SpaceX)
launches
mission_name | launch_date_local | rocket.rocket_name |
---|---|---|
Thaicom 6 | 2014-01-06T14:06:00-04:00 | Falcon 9 |
AsiaSat 6 | 2014-09-07T01:00:00-04:00 | Falcon 9 |
OG-2 Mission 2 | 2015-12-22T21:29:00-04:00 | Falcon 9 |
FalconSat | 2006-03-25T10:30:00+12:00 | Falcon 1 |
CRS-1 | 2012-10-08T20:35:00-04:00 | Falcon 9 |
The column names are no longer prefixed with launches
because it is now the root of the data-frame.
Multiple @frame directives
We can also pluck multiple data-frames from a single GraphQL query.
Let's query the first five SpaceX rockets
as well:
query = """
{
launches(limit: 5) @frame {
mission_name
launch_date_local
rocket {
rocket_name
}
}
rockets(limit: 5) @frame {
name
type
company
height {
meters
}
mass {
kg
}
}
}
"""
launches, rockets = pluck.execute(query, url=SpaceX)
Now we have the original launches
and a new rockets
data-frame:
rockets
name | type | company | height.meters | mass.kg |
---|---|---|---|---|
Falcon 1 | rocket | SpaceX | 22.25 | 30146 |
Falcon 9 | rocket | SpaceX | 70 | 549054 |
Falcon Heavy | rocket | SpaceX | 70 | 1420788 |
Starship | rocket | SpaceX | 118 | 1335000 |
Lists
When a response includes a list, the data-frame is automatically expanded to include one row per item in the list. This is repeated for every subsequent list in the response.
For example, let's query the first five capsules
and which missions they have been used for:
query = """
{
capsules(limit: 5) @frame {
id
type
status
missions {
name
}
}
}
"""
capsules, = pluck.execute(query, url=SpaceX)
capsules
id | type | status | missions.name |
---|---|---|---|
C105 | Dragon 1.1 | unknow n | CRS-3 |
C101 | Dragon 1.0 | retired | COTS 1 |
C109 | Dragon 1.1 | destroyed | CRS-7 |
C110 | Dragon 1.1 | active | CRS-8 |
C110 | Dragon 1.1 | active | CRS-14 |
C106 | Dragon 1.1 | active | CRS-4 |
C106 | Dragon 1.1 | active | CRS-11 |
C106 | Dragon 1.1 | active | CRS-19 |
Rather than five rows, we have seven; each row contains a capsule and a mission.
Nested @frame directives
Frames can also be nested and if a nested @frame
is within a list, the rows are combined into a single data-frame.
For example, we can pluck the top five cores
and their missions
:
query = """
{
cores(limit: 5) @frame {
id
status
missions @frame {
name
flight
}
}
}
"""
cores, missions = pluck.execute(query, url=SpaceX)
Now we have the cores
:
cores
id | status | missions.name | missions.flight |
---|---|---|---|
B1015 | lost | CRS-6 | 22 |
B0006 | lost | CRS-1 | 9 |
B1034 | lost | Inmarsat-5 F4 | 40 |
B1016 | lost | TürkmenÄlem 52°E / MonacoSAT | 23 |
B1025 | inactive | CRS-9 | 32 |
B1025 | inactive | Falcon Heavy Test Flight | 55 |
And we also have the missions
data-frame that has been combined from every item in cores
:
missions
name | flight |
---|---|
CRS-6 | 22 |
CRS-1 | 9 |
Inmarsat-5 F4 | 40 |
TürkmenÄlem 52°E / MonacoSAT | 23 |
CRS-9 | 32 |
Falcon Heavy Test Flight | 55 |
Aliases
Column names can be modified using normal GraphQL aliases.
For example, let's tidy-up the field names in the launches
data-frame:
query = """
{
launches(limit: 5) @frame {
mission: mission_name
launch_date: launch_date_local
rocket {
name: rocket_name
}
}
}
"""
launches, = pluck.execute(query, url=SpaceX)
launches
mission | launch_date | rocket.name |
---|---|---|
Thaicom 6 | 2014-01-06T14:06:00-04:00 | Falcon 9 |
AsiaSat 6 | 2014-09-07T01:00:00-04:00 | Falcon 9 |
OG-2 Mission 2 | 2015-12-22T21:29:00-04:00 | Falcon 9 |
FalconSat | 2006-03-25T10:30:00+12:00 | Falcon 1 |
CRS-1 | 2012-10-08T20:35:00-04:00 | Falcon 9 |
Column names
Column are named according to the JSON path of the element in the response.
However, we can use a different naming strategy by specifying column_names
to execute
.
For example, let's use short
for the column names:
query = """
{
launches: launches(limit: 5) @frame {
name: mission_name
launch_date: launch_date_local
rocket {
rocket: rocket_name
}
}
}
"""
launches, = pluck.execute(query, column_names="short", url=SpaceX)
launches
name | launch_date | rocket |
---|---|---|
Thaicom 6 | 2014-01-06T14:06:00-04:00 | Falcon 9 |
AsiaSat 6 | 2014-09-07T01:00:00-04:00 | Falcon 9 |
OG-2 Mission 2 | 2015-12-22T21:29:00-04:00 | Falcon 9 |
FalconSat | 2006-03-25T10:30:00+12:00 | Falcon 1 |
CRS-1 | 2012-10-08T20:35:00-04:00 | Falcon 9 |
If the short column name results in a conflict (two or more columns with the same name), the conflict is resolved by prefixing the name with the name of it's parent.
The naming strategy can also be changed per data-frame by specifying a dict[str, str]
where the key is name of the
data-frame.
Leaf fields
The @frame
directive can also be used on leaf fields.
For example, we can extract only the name of the mission from past launches:
query = """
{
launchesPast(limit: 5) {
mission: mission_name @frame
}
}
"""
launches, = pluck.execute(query, url=SpaceX)
launches
mission |
---|
Starlink-15 (v1.0) |
Sentinel-6 Michael Freilich |
Crew-1 |
GPS III SV04 (Sacagawea) |
Starlink-14 (v1.0) |
Responses
Most of the time, Pluck is used to transform the GraphQL query directly into one or more data-frames. However, it is also possible to retreive the the raw GraphQL response (as well as the data-frames) by not immeadiately iterating over the return value.
The return value is a pluck.Response
object and contains the data
and errors
from the raw GraphQL response and
map of Dict[str, DataFrame]
containing each data-frame in the query. The name of the frame corresponds to the field
on which the @frame
directive is placed or default
when using implicit mode.
query = """
{
launches(limit: 5) @frame {
id
mission_name
rocket {
rocket_name
}
}
landpads(limit: 5) @frame {
id
full_name
location {
region
latitude
longitude
}
}
}
"""
response = pluck.execute(query, url=SpaceX)
# print(response.data.keys())
# print(response.errors)
# print(response.frames.keys())
launches, landpads = response
landpads
id | full_name | location.region | location.latitude | location.longitude |
---|---|---|---|---|
LZ-1 | Landing Zone 1 | Florida | 28.4858 | -80.5444 |
LZ-2 | Landing Zone 2 | Florida | 28.4858 | -80.5444 |
LZ-4 | Landing Zone 4 | California | 34.633 | -120.615 |
OCISLY | Of Course I Still Love You | Florida | 28.4104 | -80.6188 |
JRTI-1 | Just Read The Instructions V1 | Florida | 28.4104 | -80.6188 |
pluck.create
Pluck also provides a create
factory function which returns a customized execute
function which closes over the
url
and other configuration.
gql = pluck.create(url=SpaceX)
query = """
{
launches(limit: 5) @frame {
id
mission_name
rocket {
rocket_name
}
}
landpads(limit: 5) @frame {
id
full_name
location {
region
latitude
longitude
}
}
}
"""
launches, landpads = gql(query)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pluck_graphql-0.3.2.tar.gz
.
File metadata
- Download URL: pluck_graphql-0.3.2.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.9 Darwin/23.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 493cfc663fa8468aa8a2283802023b1d525d0f9a0db46c7588e2f517ecef7013 |
|
MD5 | 25a7d46ce2dca6087fedc33c1d6d3ac5 |
|
BLAKE2b-256 | 25cde7b12f593b90f6d77794cfe01028ad0763930c344d85c8b33f82749b020b |
File details
Details for the file pluck_graphql-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: pluck_graphql-0.3.2-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.10.9 Darwin/23.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3974d8ff33cb9ee9827cd6e77601ca659de91a4ca9b24d09c5c5df32a6174742 |
|
MD5 | 74e6386d6df1a54cdbbe4109f258cedb |
|
BLAKE2b-256 | 768a0785498b022642257bae5d5a0e76585c9a01722a666f1a0cb82f73c10d50 |