Skip to main content

Create import CSVs for a Neo4j Wikipedia Page graph

Project description

wiki2neo

PyPI version shields.io

Produce Neo4j import CSVs from Wikipedia database dumps to build a graph of links between Wikipedia pages.

Installation

$ pip install wiki2neo

Usage

Usage: wiki2neo [OPTIONS] [WIKI_XML_INFILE]

  Parse Wikipedia pages-articles-multistream.xml dump into two Neo4j import
  CSV files:

      Node (Page) import, headers=["title:ID", "id"]
      Relationships (Links) import, headers=[":START_ID", ":END_ID"]

  Reads from stdin by default, pass [WIKI_XML_INFILE] to read from file.

Options:
  -p, --pages-outfile FILENAME  Node (Pages) CSV output file  [default:pages.csv]
  -l, --links-outfile FILENAME  Relationships (Links) CSV output file [default: links.csv]
  --help                        Show this message and exit.

Import resulting CSVs into Neo4j:
$ neo4j-admin import --nodes:Page pages.csv \
        --relationships:LINKS_TO links.csv \
        --ignore-duplicate-nodes --ignore-missing-nodes --multiline-fields

Downloads from Wikipedia are in compressed xml.bz2 format. Simplest usage is to pip extraction output straight into wiki2neo:

$ bzcat pages-articles-multistream.xml.dz2 | wiki2neo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wiki2neo-0.0.3.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

wiki2neo-0.0.3-py2.py3-none-any.whl (2.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file wiki2neo-0.0.3.tar.gz.

File metadata

  • Download URL: wiki2neo-0.0.3.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for wiki2neo-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b5aa13d070b7bb184c663b873de121e240fb0e1db18ec55bba32915a2302733d
MD5 4d2f1514934c378c887e69613c0e43e9
BLAKE2b-256 42bf0aafbbffef69b36aacf69afc5177eceebd262f9122461224d35ab596e0ba

See more details on using hashes here.

File details

Details for the file wiki2neo-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: wiki2neo-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 2.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for wiki2neo-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 be81aa623ad52ead3415a26bb2fcaa3ad1e72173c516f9f2512485a163c01090
MD5 c56855c07b4a7202f87ae7ae39a1401c
BLAKE2b-256 a95a93dc8634b60808a00fa2b8eab2e54e48fa396817c27c1e653ed11fbf7885

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page