azure-devops-pyspark

A productive library to extract data from Azure Devops and apply agile metrics.

Project description

Azure Devops PySpark: A productive library to extract data from Azure Devops and apply agile metrics.

What is it?

Azure Devops PySpark is a Python package that provides the most productive way to extract data from Azure Devops and build agile metrics. It runs on PySpark, enabling all the features the technology makes available.

Main Features

Get authenticated quickly and simply.
All columns of the project are automatically mapped, just the ones you want to form your dataframes with.
SparkSession already created in spark variable.
Get all your organization's backlogs with the method all_backlog.
Get all your organization's teams with the method all_teams.
Get all your organization's iterations with the method all_iterations.
Get all your organization's members with the method all_members.
Get all your organization's items with the method all_items.
Get all your organization's tags with the method all_tags.
Explore the simplicity of Agile class to build powerful metrics for your organization.

How to install?

pip install azure-devops-pyspark

For local use it is necessary to install pyspark>=3.2.1 and also configure the necessary environment variables. If you don't know, click here.

The Code

The code and issue tracker are hosted on GitHub: https://github.com/gusantos1/azure-devops-pyspark

Quick example

from AzureDevopsPySpark import Azure, Agile
from pyspark.sql.functions import datediff #use in agile metrics

devops = Azure('ORGANIZATION', 'PROJECT', 'TOKEN')

## Filter columns
devops.filter_columns([
    'IterationPath', 'Id', 'State', 'WorkItemType',
    'CreatedDate', 'ClosedDate', 'Iteration_Start_Date', 'Iteration_End_Date'
])

## Basic data structures
df_members = devops.all_members().data
df_backlog = devops.all_backlog().data
df_iterations = devops.all_iterations().data
df_items = devops.all_items().data

## or

## Pyspark Dataframe data structure
df_members = devops.all_members().df
df_backlog = devops.all_backlog().df
df_iterations = devops.all_iterations().df
df_items = devops.all_items().df

## Agile Metrics
agile = Agile()

## A new dataframe
df_agil = df.items.join(df_iterations, 'IterationPath')

## Metrics

## Average time between CreatedDate and ClosedDate of items in the last 90 days.
lead_time = lead_time = agil.avg(
    df=df_agil,
    ref=[datediff, 'ClosedDate', 'CreatedDate'], # The day difference between the CreatedDate and ClosedDate of each item.
    iteration_path='IterationPath', # GroupBy.
    new='LeadTimeDays', # New column name.
    literal_filter=['ClosedDate >= 90'], # Filtering items from the last 90 days.
    filters={'WorkItemType': 'Task', 'State': 'Closed'} # Custom filters for metric.
).df

In this link you will find a notebook with examples of applications using the library's methods.

How it works?

Azure Methods

All public methods of this class return a Response object containing data and df attributes, data is Python basic data structure, and df is PySpark dataframe structure.

all_backlog

Returns all backlog work items within a project.
```
all_backlog(self)
```
filter_columns

Mapped columns that are not in the list passed as an argument will be excluded.
```
filter_columns(self, only: List[str])
```

all_iterations

Returns all iterations in the project.

all_iterations(self, only: List[str] = None, exclude: List[str] = None)

all_items

Returns all work items in the project. It is possible to filter by SQL in the query parameter set to None. Ex: Where [System.WorkItemType] = 'Task' AND [System.AssignedTo] = 'Guilherme Silva'. Returns all tasks associated with Guilherme Silva.
```
all_items(self, query:str = None, params_endpoint:str = None)
```

all_members

Returns all members in the project.

all_members(self, only: List[str] = None, exclude: List[str] = None, params_endpoint: str = None)

all_tags

Returns all tags registered in the project.
```
all_tags(self)
```

all_teams

Returns all teams registered in the project.

all_teams(self, only: List[str] = None, exclude: List[str] = None, params_endpoint:str = None)

Response Methods

show

Show a pyspark dataframe.

show(self, select: List[str] = None, truncate: bool = True)

data

Returns data in its basic structure
```
data(self)
```
df

Returns a pyspark dataframe
```
df(self)
```

table

Creates a table on the cluster with delta format and overwrite mode by default.

table(self, database: str, table: str, format = 'delta', mode = 'overwrite')

parquet

Creates parquet files in path and overwrite mode by default.

parquet(self, path: str, mode: str = 'overwrite', partitionBy: str = None, compression: str = None)

view

Creates a table view.
```
view(self, name: str)
```

Agile Methods

The Agile class receives any PySpark dataframe, it is formed by aggregation methods and types of filters that make customization flexible to apply agile metrics. Agile doesn't have, for example, a cycle time method, but it is possible to create from the avg method with your customizations.

All public methods of this class return a Detail object containing detail and df attributes, detail is the dataframe version before aggregation and df is the dataframe already aggregated.

avg, count, max, min, sum

After filtering a dataframe, it performs the operation on the column passed as an argument in ref.

avg(self, df, ref: Union[str, list], iteration_path: str, new: str, literal_filter: List[str] = None, between_date: Dict[str, str] = None, group_by: List[str] = None, **filters)

custom

Agile.custom receives two PySpark dataframes and the information needed to merge and the signature string of a Python operator that will do the operation between the two columns. Supported operators: is_, is_not, add, and_, truediv, floordiv, mod, mul, pow, sub and ceil (Pyspark).
```
custom(self, df_left, def_right, left: str, right: str, how: str, op: operator, left_ref: str, right_ref: str, new: str)
```
multiple_join

Receives a list of dataframes and merges using the same column name between them.
```
multiple_join(self, dfs: list, on: List[str], how: str = 'left')
```

Dependencies

certifi >= 2021.10.8

charset-normalizer >= 2.0.12

idna >= 3.3

requests >= 2.27.1

urllib3 >= 1.26.9

python-dateutil >= 2.8.2

Author

The azure-devops-pyspark library was written by Guilherme Silva < https://www.linkedin.com/in/gusantosdev/ > in 2022.

https://github.com/gusantos1/azure-devops-pyspark

License

GNU General Public License v3.0.

Project details

Release history Release notifications | RSS feed

1.0.5

Oct 2, 2022

1.0.4

Oct 1, 2022

This version

0.0.1

May 24, 2022

0.0.0.35

Oct 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure-devops-pyspark-0.0.1.tar.gz (24.8 kB view hashes)

Uploaded May 24, 2022 Source

Hashes for azure-devops-pyspark-0.0.1.tar.gz

Hashes for azure-devops-pyspark-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5fd5acc557049dd1282bb917972c415eb19423103bdfccbd1c89b2a5e2be10ca`
MD5	`5e830e8b2b1cf1119639166ded5b5050`
BLAKE2b-256	`bebdda2603c029394f06edc2549e935ceb71e626e27d67302f46b29aa4163b70`

azure-devops-pyspark 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta