Using BioNetDB

Populating BioNetDB

Before running queries and analysis over BioNetDB, you have to populate the database by using the administration command line, i.e., the bionetdb-admin.sh script.

$ ./build/bin/bionetdb-admin.sh 

Program:     BioNetDB (OpenCB)

Description: BioNetDB implements a storage engine to work with biological networks using a NoQSL Graph database

Usage:       bionetdb-admin.sh [-h|--help] [--version] <command> [options]

Commands:
            download  Download all different data sources provided in the configuration.yml file
               build  Build the data models in CSV format files
              import  Import the built data models in format CSV files into the BioNetDB database

BioNetDB is designed to allow users to insert a huge amount of data. In order to make this process as efficient as possible, BioNetDB uses the Neo4j's bulk import tool: neo4j-admin import that loads large data sets by importing a collection of CSV files.

In order to populate BioNetDB services follow the next steps:

  1. Download biological data, i.e.: genes, proteins, disease panels, variants, pathways,...

  2. Create the CSV files from the biological data. This step is called build.

  3. Import the CSV files into the BioNetDB database.

Let's see how to perform those steps using the bionetdb-admin.sh command line.

Download

The BioNetDB configuration file contains a section called download where users indicate the different locations to the biological data to download.

Execute the following command line to download the biological data in the directory ~/data.

Build

Once data is downloaded, it has to be saved in CSV format files before importing. The CSV files are created using the build command:

Import

The CSV files created previously are loaded into the database by executing the import command:

In addition to populate the database, the import command creates the indexes on the main nodes in order to speed-up further queries and analysis.

Last updated

Was this helpful?