wiki2neo

Create import CSVs for a Neo4j Wikipedia Page graph

None

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# wiki2neo

[![PyPI version shields.io](https://img.shields.io/pypi/v/wiki2neo.svg)](https://pypi.python.org/pypi/wiki2neo/)

Produce [Neo4j](https://neo4j.com/) import CSVs from [Wikipedia database dumps](https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-language_Wikipedia)
to build a graph of links between Wikipedia pages.

## Installation

```bash
$ pip install wiki2neo
```

## Usage

```
Usage: wiki2neo [OPTIONS] [WIKI_XML_INFILE]

Parse Wikipedia pages-articles-multistream.xml dump into two Neo4j import
CSV files:

Node (Page) import, headers=["title:ID", "wiki_page_id"]
Relationships (Links) import, headers=[":START_ID", ":END_ID"]

Reads from stdin by default, pass [WIKI_XML_INFILE] to read from file.

Options:
-p, --pages-outfile FILENAME Node (Pages) CSV output file [default:pages.csv]
-l, --links-outfile FILENAME Relationships (Links) CSV output file [default: links.csv]
--help Show this message and exit.

Import resulting CSVs into Neo4j:
$ neo4j-admin import --nodes:Page pages.csv \
--relationships:LINKS_TO links.csv \
--ignore-duplicate-nodes --ignore-missing-nodes --multiline-fields
```

Project details

None

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.3

Feb 23, 2019

This version

0.0.2

Feb 19, 2019

0.0.1

Feb 19, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiki2neo-0.0.2.tar.gz (2.6 kB view hashes)

Uploaded Feb 19, 2019 Source

Built Distribution

wiki2neo-0.0.2-py2.py3-none-any.whl (3.5 kB view hashes)

Uploaded Feb 19, 2019 Python 2 Python 3

Hashes for wiki2neo-0.0.2.tar.gz

Hashes for wiki2neo-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`eac3cc6efcd8090f715451d6e96288ee5d2243f0ab94750fb7ab8090fbfa34dc`
MD5	`0c28fc081106836c29ffa12a4a586d1d`
BLAKE2b-256	`a8046906bcbae292870ef5a98fdd085eaaeaf684b2469ca107575a4099d092f0`

Hashes for wiki2neo-0.0.2-py2.py3-none-any.whl

Hashes for wiki2neo-0.0.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`728a4b785753a2e1f61ae652d6427b3c08a49329ea1830af7f0f92cb8acf9b43`
MD5	`2b2c4e56bed206466f8cdc855bad43a2`
BLAKE2b-256	`04e04796a6cf2b38fb803d5e58dfc4b1365f133ee6faa5139898b1496b3f690b`