ArXiv-Miner: Mine/Scrape Arxiv-Papers To Structured Datasets

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

ArXiv-Miner

ArXiv Miner is a toolkit for mining research papers on CS ArXiv.

What is ArXiv-Miner

arxiv-miner is a quick handy library that helps power Sci-Genie. Sci-Genie is a search engine for quickly searching through full text of papers on CS ArXiv. arxiv-miner helps extract and parse LaTeX documents from CS ArXiv. It also supports storage and search of those parsed documents using Elasticsearch. The library can be applicable for all other domains like Math, Physics, Biology etc.

Documentation

All documentation on how to install and use arxiv-miner is provided in the documentation website or inside the docs folder. Contribution guidelines are also provided there.

Why was ArXiv-Miner created ?

ArXiv Miner was created for easily scraping, parsing and searching research content on ArXiv. This library was created after stitching together solutions from the code of various tools like arxiv-sanity, arxiv-vanity/engrafo, arxivscraper, tex2py, cso-classifier and axcell. Parsed structure of the content can be useful in search or any scientific research mining/AI applications as a heuristic baseline.

Core Components of ArXiv-Miner

Scraping
Parsing
Indexing/Storage

Family Of Projects With ArXiv-Miner

arxiv-table-miner : Coming Soon.
arxiv-table-ml-models : Coming Soon.
semantic-scholar-data-pipeline : https://github.com/valayDave/semantic-scholar-data-pipeline

Disclaimer

This project was developed like a Cowboy coder over the COVID-19 pandemic. Hence, this may have bugs and not the most well optimized code. The primary reason for development was to aid CS and Machine Learning/AI research, but this tool can be extended to all 3M+ documents on ArXiv.

Call For Contributors

Any help with contributions to improve the project or fix bugs are completely welcome. Please read the contribution guide in the documentation.

Credits and Appreciation

This project like all others has been built on shoulders of giants. A big thanks to the creators of the following libraries/open source projects that aided the development of arxiv-miner, and it's family of projects:

Licence

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

2.0.3

Jul 7, 2021

2.0.1

May 28, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arxiv_miner-2.0.3.tar.gz (57.8 kB view details)

Uploaded Jul 7, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arxiv_miner-2.0.3-py3-none-any.whl (49.0 kB view details)

Uploaded Jul 7, 2021 Python 3

File details

Details for the file arxiv_miner-2.0.3.tar.gz.

File metadata

Download URL: arxiv_miner-2.0.3.tar.gz
Upload date: Jul 7, 2021
Size: 57.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.2

File hashes

Hashes for arxiv_miner-2.0.3.tar.gz
Algorithm	Hash digest
SHA256	`8239aafe164bf3791a3113ba6ddbc1d08dead0cc1bc87b86efc9e184e01895df`
MD5	`563e841236adfe2c4406fb827389089b`
BLAKE2b-256	`ade2d65585c7b8c4499c00dac2013b6d4e92fc1933f3a51f81daa6be89f73a88`

See more details on using hashes here.

File details

Details for the file arxiv_miner-2.0.3-py3-none-any.whl.

File metadata

Download URL: arxiv_miner-2.0.3-py3-none-any.whl
Upload date: Jul 7, 2021
Size: 49.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.3.1 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.2

File hashes

Hashes for arxiv_miner-2.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d47bc3b29a2f46769d701596fdbdef99799d0cfa312672662d869db81e1269f7`
MD5	`8026274003d27e1966e97e37a118fc7f`
BLAKE2b-256	`e175b4923e31637ff7cfc137b9fdd13ad1f34854ae0a96b38a1567c5af771df2`

See more details on using hashes here.

arxiv-miner 2.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ArXiv-Miner

What is ArXiv-Miner

Documentation

Why was ArXiv-Miner created ?

Core Components of ArXiv-Miner

Family Of Projects With ArXiv-Miner

Disclaimer

Call For Contributors

Credits and Appreciation

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes