Skip to main content

A productive library to extract data from Azure Devops and apply agile metrics.

Project description

Azure Devops PySpark: A productive library to extract data from Azure Devops and apply agile metrics.

What is it?

Azure Devops PySpark is a Python package that provides the most productive way to extract data from Azure Devops and build agile metrics. It runs on PySpark, enabling all the features the technology makes available.

Main Features

  • Get authenticated quickly and simply.

  • All columns of the project are automatically mapped, just the ones you want to form your dataframes with.

  • SparkSession already created in spark variable.

  • Get all your organization's backlogs with the method all_backlog.

  • Get all your organization's teams with the method all_teams.

  • Get all your organization's iterations with the method all_iterations.

  • Get all your organization's members with the method all_members.

  • Get all your organization's items with the method all_items.

  • Get all your organization's tags with the method all_tags.

  • Explore the simplicity of Agile class to build powerful metrics for your organization.

How to install?

pip install azure-devops-pyspark

For local use it is necessary to install pyspark>=3.2.1 and also configure the necessary environment variables. If you don't know, click here.

The Code

The code and issue tracker are hosted on GitHub: https://github.com/gusantos1/azure-devops-pyspark

Quick example

from AzureDevopsPySpark import Azure, Agile
from pyspark.sql.functions import datediff #use in agile metrics

devops = Azure('ORGANIZATION', 'PROJECT', 'TOKEN')
## Filter columns
devops.filter_columns([
    'IterationPath', 'Id', 'State', 'WorkItemType',
    'CreatedDate', 'ClosedDate', 'Iteration_Start_Date', 'Iteration_End_Date'
])

## Basic data structures
df_members = devops.all_members().data
df_backlog = devops.all_backlog().data
df_iterations = devops.all_iterations().data
df_items = devops.all_items().data

## or

## Pyspark Dataframe data structure
df_members = devops.all_members().df
df_backlog = devops.all_backlog().df
df_iterations = devops.all_iterations().df
df_items = devops.all_items().df
## Agile Metrics
agile = Agile()

## A new dataframe
df_agil = df.items.join(df_iterations, 'IterationPath')

## Metrics

## Average time between CreatedDate and ClosedDate of items in the last 90 days.
lead_time = lead_time = agil.avg(
    df=df_agil,
    ref=[datediff, 'ClosedDate', 'CreatedDate'], # The day difference between the CreatedDate and ClosedDate of each item.
    iteration_path='IterationPath', # GroupBy.
    new='LeadTimeDays', # New column name.
    literal_filter=['ClosedDate >= 90'], # Filtering items from the last 90 days.
    filters={'WorkItemType': 'Task', 'State': 'Closed'} # Custom filters for metric.
).df

In this link you will find a notebook with examples of applications using the library's methods.

How it works?

Azure Methods

All public methods of this class return a Response object containing data and df attributes, data is Python basic data structure, and df is PySpark dataframe structure.

  • all_backlog

    Returns all backlog work items within a project.
    all_backlog(self)
    
  • filter_columns

    Mapped columns that are not in the list passed as an argument will be excluded.
    filter_columns(self, only: List[str])
    
  • all_iterations

    Returns all iterations in the project.
    all_iterations(self, only: List[str] = None, exclude: List[str] = None)
    
  • all_items

    Returns all work items in the project. It is possible to filter by SQL in the query parameter set to None. Ex: Where [System.WorkItemType] = 'Task' AND [System.AssignedTo] = 'Guilherme Silva'. Returns all tasks associated with Guilherme Silva.
    all_items(self, query:str = None, params_endpoint:str = None)
    
  • all_members

    Returns all members in the project.
    all_members(self, only: List[str] = None, exclude: List[str] = None, params_endpoint: str = None)
    
  • all_tags

    Returns all tags registered in the project.
    all_tags(self)
    
  • all_teams

    Returns all teams registered in the project.
    all_teams(self, only: List[str] = None, exclude: List[str] = None, params_endpoint:str = None)	
    

Response Methods

  • show

    Show a pyspark dataframe.
    show(self, select: List[str] = None, truncate: bool = True)
    
  • data

    Returns data in its basic structure
    data(self)
    
  • df

    Returns a pyspark dataframe
    df(self)
    
  • table

    Creates a table on the cluster with delta format and overwrite mode by default.
    table(self, database: str, table: str, format = 'delta', mode = 'overwrite')
    
  • parquet

    Creates parquet files in path and overwrite mode by default.
    parquet(self, path: str, mode: str = 'overwrite', partitionBy: str = None, compression: str = None)
    
  • view

    Creates a table view.

    view(self, name: str)
    

Agile Methods

The Agile class receives any PySpark dataframe, it is formed by aggregation methods and types of filters that make customization flexible to apply agile metrics. Agile doesn't have, for example, a cycle time method, but it is possible to create from the avg method with your customizations.

All public methods of this class return a Detail object containing detail and df attributes, detail is the dataframe version before aggregation and df is the dataframe already aggregated.

  • avg, count, max, min, sum

    After filtering a dataframe, it performs the operation on the column passed as an argument in ref.
    avg(self, df, ref: Union[str, list], iteration_path: str, new: str, literal_filter: List[str] = None, between_date: Dict[str, str] = None, group_by: List[str] = None, **filters)
    
  • custom

    Agile.custom receives two PySpark dataframes and the information needed to merge and the signature string of a Python operator that will do the operation between the two columns. Supported operators: is_, is_not, add, and_, truediv, floordiv, mod, mul, pow, sub and ceil (Pyspark).
    custom(self, df_left, def_right, left: str, right: str, how: str, op: operator, left_ref: str, right_ref: str, new: str)
    
  • multiple_join

    Receives a list of dataframes and merges using the same column name between them.
    multiple_join(self, dfs: list, on: List[str], how: str = 'left')
    

Dependencies

certifi >= 2021.10.8

charset-normalizer >= 2.0.12

idna >= 3.3

requests >= 2.27.1

urllib3 >= 1.26.9

python-dateutil >= 2.8.2

Author

The azure-devops-pyspark library was written by Guilherme Silva < https://www.linkedin.com/in/gusantosdev/ > in 2022.

https://github.com/gusantos1/azure-devops-pyspark

License

GNU General Public License v3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure-devops-pyspark-0.0.1.tar.gz (24.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page