Handle Web of Science export files

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development

Project description

wosfile

wosfile is a Python package designed to read and handle data exported from Clarivate Analytics Web of Science™. It supports both tab-delimited files and so-called ‘plain text’ files.

The point of wosfile is to read export files from WoS and give you a simple data structure—essentially a dict—that can be further analyzed with tools available in standard Python or with third-party packages. If you're looking for a ‘one-size-fits-all’ solution, this is probably not it.

Pros:

It has no requirements beyond Python 3.6+ and the standard library.
Completely iterator-based, so useful for working with large datasets. At no point should we ever have more than one single record in memory.
Simple API: usually one needs just one function wosfile.records_from().

Cons:

Pure Python, so might be slow.
At the moment, wosfile does little more than reading WoS files and generating Record objects for each record. While it does some niceties like parsing address fields, it does not have any analysis functionality.

Examples

These examples use a dataset exported from Web of Science in multiple separate files(the maximum number of exported records per file is 500).

Subject categories in our data

import glob
import wosfile
from collections import Counter

subject_cats = Counter()
# Create a list of all relevant files. Our folder may contain multiple export files.
files = glob.glob("data/savedrecs*.txt")

# wosfile will read each file in the list in turn and yield each record
# for further handling
for rec in wosfile.records_from(files):
    # Records are very thin wrappers around a standard Python dict,
    # whose keys are the WoS field tags.
    # Here we look at the SC field (subject categories) and update our counter
    # with the categories in each record.
    subject_cats.update(rec.get("SC"))

# Show the five most common subject categories in the data and their number.
print(subject_cats.most_common(5))

Citation network

For this example you will need the NetworkX package. The data must be exported as ‘Full Record and Cited References’.

import networkx as nx
import wosfile

# Create a directed network (empty at this point).
G = nx.DiGraph()
nodes_in_data = set()

for rec in wosfile.records_from(files):
    # Each record has a record_id, a standard string uniquely identifying the reference.
    nodes_in_data.add(rec.record_id)
    # The CR field is a list of cited references. Each reference is formatted the same
    # as a record_id. This means that we can add citation links by connecting the record_id
    # to the reference.
    for reference in rec.get("CR", []):
        G.add_edge(rec.record_id, reference)

# At this point, our network also contains all references that were not in the original data.
# The line below ensures that we only retain publications from the original data set.
G.remove_nodes_from(set(G) - nodes_in_data)
# Show some basic statistics and save as Pajek file for visualization and/or further analysis.
print(nx.info(G))
nx.write_pajek(G, 'network.net')

Other Python packages

The following packages also read WoS files (+ sometimes much more):

Other packages query WoS directly through the API and/or by scraping the web interface:

pywos (elsewhere called wos-statistics)
wos
wosclient

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Software Development

Release history Release notifications | RSS feed

This version

0.6

Apr 26, 2022

0.5

Apr 28, 2021

0.4.2

Feb 24, 2020

0.4.1

Jul 8, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wosfile-0.6.tar.gz (11.0 kB view details)

Uploaded Apr 26, 2022 Source

Built Distribution

wosfile-0.6-py3-none-any.whl (10.1 kB view details)

Uploaded Apr 26, 2022 Python 3

File details

Details for the file wosfile-0.6.tar.gz.

File metadata

Download URL: wosfile-0.6.tar.gz
Upload date: Apr 26, 2022
Size: 11.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for wosfile-0.6.tar.gz
Algorithm	Hash digest
SHA256	`25f2b6b81c22d8212ddaf3162649e5205c0fc93d1fc0a6933b3e9b15523be506`
MD5	`e1978f9e7ec654d61fb59b0a7c491b70`
BLAKE2b-256	`e8688f533f52352024db6ac2f1ffb182cff2b9b08d8653672068c9874fffe590`

See more details on using hashes here.

File details

Details for the file wosfile-0.6-py3-none-any.whl.

File metadata

Download URL: wosfile-0.6-py3-none-any.whl
Upload date: Apr 26, 2022
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for wosfile-0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cb152fc24af3a28c0cc6ea37abc42f086df0d8fcec7e4264f1dfe289e40614f`
MD5	`2441673a86f7abd42608436c716ab3de`
BLAKE2b-256	`3fdb3ef34ad065596e7fb28926cf67d8057110c2aa94c93b35129833a33dc1ab`

See more details on using hashes here.

wosfile 0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

wosfile

Examples

Subject categories in our data

Citation network

Other Python packages

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes