Skip to main content

No project description provided

Project description

# NetworkDisk

Aim is to provide a generic class for manupulating graphs and networks on disk.
The final goal is to provide:
* compatibility with networkx;
* tools dedicated to perform searches into on graph stored on (local or remote) Disk;
* evaluating complex queries by interfacing with on disk databases.

In order to keep the class versatile and simple to use,
we define abstract class to overload with methods dedicated to handle
specific "ondisk" technologies.

Examples include:
* naive implementation on RAM;
* SQLite local implementation;
* PostgreSQL implementation;
* Cassandra distributed implementation (TODO);
* Any efficient database implementing abstract functions
will do but to get acceptable performance, some tuning with
a dedicated work on index and data representation have to be
done.

Efficiency of the package heavily relies on the implementation.
From PostgreSQL implementation, the package inherit ACID transactions
and concurrent access to the graph by several clients.

The package however provides some examples of generic implementation
in SQL as well as methods for improving searches and (TODO) support
of query languages.

## The Graphs abstraction methods

## Searches

Searches in graphs can be an intensive task when it come to very large
graphs. Few methods exists to index data in order to find path and
subgraphs between nodes.
Furthermore, those searches can be restricted by filter and automaton
aggregation (TODO) using datas that labels every node.

In this package, searches return an intermediate data-structure, called
a _multilayeredDAG_. From a multilayeredDAG, it is easy (linear) to
compute paths, to enumerate shortest paths and to count the number of
paths.

## Embedding
(TODO)

The NetworkDisk package offer some methods to compute some topology,
called embeddings, for a given network.
Some of the computations are intensive and require a lot of RAM and
are not suitable for dynamic graphs, others can be performed directly
on disk and may also be maintained under update.


When an embedding is available, it becomes possible to improve
searches within the graph by best-first search like algorithm.

# Embedding-guided Search

Efficiently answering regular path queries on very large graphs, by
guiding the search through distances in an embedding of the graph.
This embedding is precomputed in quasi-linear time and space,
independently of the regular query.

# TODO list

## TODO on Abstract class

3. TODO `remove_embedding_from_node` and `remove_embedding` to EmbeddedGraph
4. Add `type` to edges (that become multiegdes?). Maybe extend Graph for that.
5. Import `RDF`
7. Add for `landmarks` maintenance for new nodes.
9. TODO: Introduced other kind of embeddings.
10. TODO: Should we care about properties values type?
11. TODO: Bulk loading implementation

## TODO add EmbeddedDataGraph

From now, we don't need EmbeddedDataGraph (which is just a merge of Data
Graph and embeddedGraph). However some specific methods can be useful here:

1. Data biais random_walk
2. Data biais landmarks computations

## TODO on SQLish Implementations

6. Implement on disk embedding computation for landmarks embeddings
7. See with Magnet for ondisk implementation of word2vec? (not existing)
8. TODO: Implement functions in SQL local to database instead of python version when possible.
9. TODO/ Use the edges/nodes/properties pass through the `__init__` to perform a
bulkload (desactivate index, load datas, add index)

## TODO on sqlite Implementations

1. Look for SQLite Extensions allowing to deal with nicer data Type.
i.e. Maybe for indexation of json documents?
2. DONE: Management of default sqlite scripts: move it to an external module?
3. DONE: Add table for degrees.

## TODO on PostgreSQL Implementation

1. DONE Postgresql
2. TODO: distant embedding and search computation
3. Use of JSON_B indexation
4. Use of full_text_search indexes.

## TODO On multilayerDag

1. DONE: Exctract root-to-root betweeness centrality for nodes (#nb of shortest path
from root to root going through the node ). Should be linear in the
multilayerDag.

2. DONE: Multilateral Breath First Search

3. TODO: merge bilatéral BFS with multilateral BFS.

Project details


Release history Release notifications

This version
History Node

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
networkdisk-0.2.tar.gz (32.1 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page