Skip to main content

Retrieves information from Sinta (http://sinta.ristekbrin.go.id) via scraping.

Project description

Sinta Scraper

Retrieves information from Sinta (http://sinta.ristekbrin.go.id) via scraping.

Code Sample

Code sample for all functions is available as a Google Colab Notebook: Open In Colab

Installation

pip install sinta-scraper

Dependencies: bs4, requests, dicttoxml, and dict2xml. These will be automatically installed by pip with the above command.

Importing

import sinta_scraper as sinta

Available Functions

  • author()

Retrieves a single author's information by Sinta ID. For example:

author_id = '5975467'
author = sinta.author(author_id)

print(author['name'])
# Output: Agus Zainal Arifin

The output format is the Python dictionary. The structure is given in the following sample output.

{
    'id': '5975467',
    'name': 'AGUS ZAINAL ARIFIN',
    'url': 'http://sinta.ristekbrin.go.id/authors/detail?id=5975467&view=overview',
    'affiliation': {
        'id': '417',
        'name': 'Institut Teknologi Sepuluh Nopember',
        'url': 'http://sinta.ristekbrin.go.id/affiliations/detail/?id=417&view=overview'
    },
    'department': 'Teknik Informatika',
    'areas': [
        'computer vision',
        'image processing',
        'information retrieval',
        'medical imaging',
        'machine learning'
    ],
    'score': {
        'overall': 38.24,
        '3_years': 8.36,
        'overall_v2': 3485.0,
        '3_years_v2': 1345.0
    },
    'rank': {
        'national': 596,
        '3_years_national': 509,
        'affiliation': 28,
        '3_years_affiliation': 22
    },
    'scopus': {
        'documents': 52,
        'citations': 355,
        'h-index': 8,
        'i10-index': 6,
        'g-index': 14,
        'articles': 28,
        'conferences': 24,
        'others': 0,
        'Q1': 5,
        'Q2': 11,
        'Q3': 9,
        'Q4': 2,
        'undefined': 25
    },
    'scholar': {
        'documents': 232,
        'citations': 1087,
        'h-index': 13,
        'i10-index': 25,
        'g-index': 25
    },
    'wos': {
        'documents': 1,
        'citations': null,
        'h-index': null,
        'i10-index': null,
        'g-index': null
    },
    'sinta': {
        'S0': 1,
        'S1': 3,
        'S2': 1,
        'S3': 2,
        'S4': 0,
        'S5': 0,
        'uncategorized': 225
    },
    'books': 0,
    'ipr': 2
}
  • authors()

Retrieves several author's information by Sinta ID. For example:

author_ids = ['5975467', '6005015', '29555']
authors = sinta.authors(author_ids)

print(authors[1]['name'])
# Output: MAURIDHI HERY PURNOMO

The output is a list of dictionaries with the same structure given by the author() function.

  • dept_authors()

Retrieves a list of authors associated with a department. Department ID and affiliation ID must be specified. The output structure is different from that given by the previous function. This function retrieves only the ID's and names of each author. For example:

dept_id = '55001'
affil_id = '417'
authors = sinta.dept_authors(dept_id, affil_id)

print(authors[:3])
# Output: [{'id': '29555', 'name': 'Riyanarto Sarno'}, {'id': '5975467', 'name': 'Agus Zainal Arifin'}, {'id': '6023328', 'name': 'Nanik Suciati'}]
  • depts_authors()

Does the same thing as dept_authors() except that you can specify a list of department ID's as argument. For example:

dept_ids = ['55001', '20201', '24201']
affil_id = '417'
authors = sinta.depts_authors(dept_ids, affil_id)

print(authors[:-3])
# Output: [{'id': '6674726', 'name': 'Cahayahati'}, {'id': '6690103', 'name': 'Ari Santoso'}, {'id': '6199111', 'name': 'Lucky Putri Rahayu'}]
  • affil()

Retrieves information about an affiliation. For example:

affil_id = '417'
affil = sinta.affil(affil_id)

print(affil)

# Output:
{
    'name': 'Institut Teknologi Sepuluh Nopember',
    'url': 'https://its.ac.id',
    'score': {
        'overall': 34332,
        'overall_v2': 514402,
        '3_years': 6569,
        '3_years_v2': 202157
    },
    'rank': {
        'national': 7,
        '3_years_national': 10
    },
    'journals': 18,
    'verified_authors': 1084,
    'lecturers': 961
}
  • affils()

Retrieves information about several affiliations. For example:

affil_ids = ['417', '404']
affils = sinta.affils(affil_ids)

print(affils)

# Output:

Other Output Formats

Other formats can be used by specifying the output_format argument:

author = sinta.author(id, output_format='json')

Avalable output formats:

  • 'dictionary' (default)
  • 'json'
  • 'xml'

JSON output can be pretty-printed by setting pretty_print=True:

author = sinta.author(id, output_format='json', pretty_print=True)

For XML output, there are two library options which can be specified in the xml_library argument. These libraries give different output formats. The options are:

  • dicttoxml (default)
  • dict2xml

For example:

author = sinta.author(id, output_format='xml', xml_library='dict2xml')

If you want the XML output to be pretty-printed, you need to choose dict2xml instead of xmltodict since the latter does not produce pretty-printed XML output.

Todo

  • Other output formats: CSV.
  • affil(affil_id) function.
  • find_affil(keyword) function.
  • affil_depts(affil_id) function.
  • affil_authors(affil_id) function.
  • dept(dept_id) function.
  • find_dept(keyword) function.
  • author_scopus_docs(author_id) function.
  • author_wos_docs(author_id) function.
  • author_books(author_id) function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for sinta-scraper, version 0.13.1
Filename, size File type Python version Upload date Hashes
Filename, size sinta_scraper-0.13.1-py3-none-any.whl (25.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size sinta-scraper-0.13.1.tar.gz (9.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page