Skip to main content

Client library for the Affinda API

Project description

Python Client Library for Affinda Document Parser API

affinda logo

pypi ver pypi pyver pypi dlm license

codestyle

Open in Visual Studio Code

This is a python client for the Affinda document parsing API which wraps all available endpoints and handles authentication and signing. You may also want to refer to the full API documentation for additional information.

Installation

pip install affinda

API Version Compatibility

The Affinda API is currently on v3, with breaking changes meant the release of new versions of the client library. Please see below for which versions are compatible with which API version.

Affinda API version affinda-python versions
v2 0.1.0 - 3.x.x
v3 >= 4.x.x

Quickstart

If you don't have an API token, obtain one from affinda.com.

from pathlib import Path
from pprint import pprint

from affinda import AffindaAPI, TokenCredential
from affinda.models import WorkspaceCreate, CollectionCreate

token = "REPLACE_API_TOKEN"
file_pth = Path("PATH_TO_DOCUMENT.pdf")

credential = TokenCredential(token=token)
client = AffindaAPI(credential=credential)

# First get the organisation, by default your first one will have free credits
my_organisation = client.get_all_organizations()[0]

# And within that organisation, create a workspace, for example for Recruitment:
workspace_body = WorkspaceCreate(
    organization=my_organisation.identifier,
    name="My Workspace",
)
recruitment_workspace = client.create_workspace(body=workspace_body)

# Finally, create a collection that will contain our uploaded documents, for example resumes, by selecting the
# appropriate extractor
collection_body = CollectionCreate(
    name="Resumes", workspace=recruitment_workspace.identifier, extractor="resume"
)
resume_collection = client.create_collection(collection_body)

# Now we can upload a resume for parsing
with open(file_pth, "rb") as f:
    resume = client.create_document(file=f, file_name=file_pth.name, collection=resume_collection.identifier)

pprint(resume.as_dict())

Samples

Samples for all operations using the client can be found here.

API reference

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[4.7.2] - 2023-07-07

Changed

  • Set CustomFieldConfig default to 0.5

Fixed

  • Fixed serialisation of Document to Invoice, Resume etc in get_document()

[4.7.1] - 2023-06-28

Added

  • Add xml response to api spec to GET /v3/documenets to match existing functionality

[4.7.0] - 2023-06-27

Added

  • Allow create/update data point's parent and displayEnumValue property
  • Allow explicitly set a document as low_priority

Changed

  • Make slug and organization required when creating data point

Removed

  • Remove data point's similarTo property

[4.6.0] - 2023-06-16

Added

  • Add tailoredExtractorRequested to Collection
  • Add endpoint for update resumes and JD data

[4.5.1] - 2023-06-14

Added

  • Add rawText to invoice data

[4.5.0] - 2023-06-09

Added

  • Ability to post/patch languages for resumes in v2
  • Add include_public parameter to /data_points endpoint
  • Add base_extractor parameter to collection creation endpoint

Changed

  • Make extractor a non required field (internal use)

[4.4.0] - 2023-06-07

Added

  • Endpoints for add/remove tag for documents
  • Identifier field in DocumentUpdate model
  • Allow setting region_bias when uploading document
  • rawText field to JobDescription Model
  • Required fields for resthook subscriptions
  • Add fieldsLayout to Collection schema

Deprecated

  • Deprecate Collection.fields in favor of Collection.fieldsLayout

[4.3.5] - 2023-05-09

Changed

  • Nest line item table rows correctly.

[4.3.4] - 2023-05-09

Changed

  • Nest line item table rows correctly.

[4.3.3] - 2023-05-09

Added

  • Add Organization.validationToolConfig for configuration of the embeddable validation tool
  • Phone number details to Resume Candidate info
  • Add some filters to GET /documents endpoint: failed, ready, validatable
  • Custom fields to Job Descriptions
  • Add custom data to job description search results
  • Add international_country_code to phone number details

Changed

  • Provide additional filters for data point choices, and allow data point choices to be specified for any existing text field.
  • Allow custom resume fields to be nullable
  • Allow custom job description fields to be nullable
  • Make "pdf" property in SearchResults nullable

Removed

  • Remove include_child filter from /data_points endpoint

Fixed

  • Update python_requires to be PEP compliant

[4.3.2] - 2023-04-20

Changed

  • rawText is now not nullable
  • OccupationGroupSearchResult.children is now optional

Fixed

  • Allow rejectDuplicates to be null

[4.3.1] - 2023-03-29

Added

  • Add whitelistIngestAddresses to Workspace

Fixed

  • Make search config action fields required

[4.3.0] - 2023-03-28

Added

  • Adding group annotation content type
  • Add rejectDuplicates setting to workspace
  • Add hideToolbar to resume & JD search config
  • Add ExtractorConfig object to Collection

[4.2.0] - 2023-03-20

Fixed

  • fixed - Use OccupationGroupResult for v3 SearchAndMatch detail
  • Fixed return type for InvoiceData.currencyCode

Changed

  • Don't require Field.slug

Added

  • Add redactedText field to ResumeData

[4.1.0] - 2023-03-15

Fixed

  • Fixed type and path of data_point and data_point_choices
  • Fixed missing data field on base Document type
  • Fixed search and match return types
  • fixed document error return types
  • Ensure list endpoints have 'results' and 'count' properties required

Changed

  • Minor re-ordering of API spec paths
  • Change Document API tag from Document API - Upload Documents to Document API - Document

[4.0.1] - 2023-03-10

Fixed

  • Fixed resume search response object

[4.0.0] - 2023-03-09

Added

  • Add resthook subscription endpoints
  • Add py.typed marker file
  • Add link to affinda help docs for resthook creation

Changed

  • Remove extractor's id field, use identifier field instead

Removed

  • Remove extractor's id field
  • Removed v2 endpoints

[2.1.0] - 2023-02-06

Added

  • Add document.collection.extractor.identifier to DocumentMeta
  • Add cell to valid content types
  • Add EU API server to api docs
  • Add latitude and longtitude to Location
  • Add expectedremuneration, jobtitle, language, skill and yearsexperience to AnnotationContentType
  • re-add DataPoint.simlarTo
  • Add exclude parameter to /documents query
  • add ingest email to Workspace and Collection

Changed

  • Updated endpoints for old v2 and newer v3 to point to the correct places.
  • Changed Document top level structure to more closely resemble api v2 with top level keys of meta, data and error
  • ResumeSearchParamaters.resume, ResumeSearchParameters.jobdescription, JobDescriptionSearchParameters.resume, DataPoint.organization
  • Update azure-core version in setup.cfg and pin setuptools as latest version doens't buld

Fixed

  • Fixed various nullable fields not being nullable, and vice versa

Removed

  • Master/child accounts endpoints

[2.0.0] - 2023-01-13

Added

  • Added endpoints: Organization, Membership, Invitation, tags
  • Added name, organization to DataPoint, change id to identifier
  • Add new objects schemas Organization, OrganizationMembership, Invitation

Changed

  • Identifier instead of id as URL param
  • Update data point filters
  • Allow unlimited nesting in field config
  • Change document state from "export" to "archive"

Fixed

  • Collection identifier should be nullable
  • Don't paginate extractors endpoint
  • Fix avatar uploads
  • Allow writing resthookSignatureKey

[1.9.0] - 2023-01-12

  • Yanked as this was a breaking release, see newer release for more info

[1.8.0] - 2023-01-12

Changed

  • Allow non TLS http requests

[1.7.0] - 2023-01-10

Added

  • Add rectangles to Annotation, add position to referee, add actions to JobDescriptionSearchConfig

[1.6.0] - 2023-01-09

Fixed

  • Bump version to force new release

[1.5.1] - 2023-01-08

Changed

  • Allowing a few more fields in ResumeData to be null

[1.5.0] - 2022-11-17

Fixed

  • Document meta pages without images should be nullable
  • Small fixes for accreditiation and education return objects
  • Various nullable fields in the API spec

Security

  • Bumped package versions for patch reasons

Added

  • Add reject_duplicates to document upload endpoint
  • XML 404 response schema
  • CustomData to resume search spec
  • suggest skills and job titles endpoints

Changed

  • Update spec to allow XML content-type return from resumes, make totalYearsExperience nullable
  • Allow additonalproperties for custom data upload (resumes) and search

[1.4.2] - 2022-09-23

Changed

  • Update API spec to match API response.

[1.4.1] - 2022-09-23

Added

  • Add job description search config and embed endpoints
  • Update index endpoint with document type parameter

Fixed

  • Fix casing of some properties to match API response.

[1.4.0] - 2022-08-25

Changed

  • Update modelerfour version to latest
  • Update types of objects for some endpoints using AllOf attributes for better client library generation
  • Changed and updated tag order to better match documentation needs
  • Updated autorest client version

Deprecated

  • Depreciated resume_formats and reformatted_resumes endpoints

Added

  • Reverse match functionality - search job descriptions with a resume, or with a set of parameters.

[1.3.1] - 2022-08-10

Added

  • Add search expression to 1v1 match

[1.3.0] - 2022-07-27

Added

  • Add ability to find other candidates that have similar attributes to a resume
  • Add an endpoint to get the matching score between a resume and a job description

[1.2.0] - 2022-07-04

Added

  • add "tables" property to InvoiceData

[1.1.0] - 2022-07-03

Added

  • Ability to update resume data in the search system
  • New endpoint for creating and managing users within a master account

[1.0.2] - 2022-05-07

Fixed

  • Make expiry time native date time

[1.0.1] - 2022-05-01

Added

  • Add review URL in the invoice response that allows embedding of the Affinda Invoice Review UI

[1.0.0] - 2022-04-28

Added

  • added confidence

Changed

  • changed strings to objects

[0.4.1] - 2022-04-19

Fixed

  • Fixes bug in create_invoice when URL is not specified

[0.4.0] - 2022-04-13

Changed

  • Update autorest depedencies

[0.3.0] - 2022-04-06

Added

  • Resume search

[0.2.2] - 2022-03-25

Added

  • Add iso 3166 country code to locations

[0.2.1] - 2021-12-09

Added

  • Bump version

[0.2.0] - 2021-10-06

Added

  • Invoices endpoint

Removed

  • Removed 'url' format from url strings in api spec

[0.1.13] - 2021-10-05

Changed

  • Pin azure-core to 1.18.0

[0.1.12] - 2021-10-05

Changed

  • Pin azure-core

[0.1.11] - 2021-10-05

Changed

  • Pinning azure-core dependency due to incompatible changes in 1.19

[0.1.10] - 2021-09-30

Added

  • Adding LinkedIn to ResumeData

Changed

  • Reformatted code with black
  • Minor changes
  • Very minor formatting changes

[0.1.9] - 2021-09-08

Added

  • Profession in ResumeData model
  • Unified Error models

[0.1.8] - 2021-09-06

Fixed

  • wait=true in API spec

[0.1.7] - 2021-09-05

Fixed

  • Code samples naming conversion

[0.1.6] - 2021-09-05

Changed

[0.1.5] - 2021-08-25

Added

  • Added flake, editorconfig, tox.ini etc files to match best practices for existing Draftable/Affinda projects (thanks
  • @ralish!)

[0.1.4] - 2021-08-18

Fixed

  • Update README.md to fix install instructions

[0.1.3] - 2021-08-18

Fixed

  • Update README.md to hard link to github hosted logo to fix display on PyPi

[0.1.2] - 2021-08-18

  • Initial release

The MIT License (MIT)

Copyright (c) Affinda

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details


Release history Release notifications | RSS feed

This version

4.7.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

affinda-4.7.2.tar.gz (259.3 kB view details)

Uploaded Source

Built Distribution

affinda-4.7.2-py3-none-any.whl (202.9 kB view details)

Uploaded Python 3

File details

Details for the file affinda-4.7.2.tar.gz.

File metadata

  • Download URL: affinda-4.7.2.tar.gz
  • Upload date:
  • Size: 259.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.3 tqdm/4.65.0 importlib-metadata/6.7.0 keyring/24.2.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.10.11

File hashes

Hashes for affinda-4.7.2.tar.gz
Algorithm Hash digest
SHA256 50211a11bf58868899f316fd27b0f7e669087b7a5532f9db139d26231155f0b3
MD5 512fc6830fd78af505e261a5e29a6d3a
BLAKE2b-256 e4d483a065be4919c0ff30f3048d1e0f5836cdd8661737e41437be61944e2457

See more details on using hashes here.

File details

Details for the file affinda-4.7.2-py3-none-any.whl.

File metadata

  • Download URL: affinda-4.7.2-py3-none-any.whl
  • Upload date:
  • Size: 202.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/40.0 requests/2.31.0 requests-toolbelt/1.0.0 urllib3/2.0.3 tqdm/4.65.0 importlib-metadata/6.7.0 keyring/24.2.0 rfc3986/1.5.0 colorama/0.4.6 CPython/3.10.11

File hashes

Hashes for affinda-4.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0abc7a2bb6add13626c916e56b6c90a364e998e060c459f170c687402e500698
MD5 4fd0dbc0c7af5894912dbc9079be8ea1
BLAKE2b-256 26214524a9f614b5ff68fda8afe4b7f169accdeacef7e4d5ac3602ca4e5d0b28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page