Database servers: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
Line 84: Line 84:


=== MySQL connectivity for BioPerl ===
=== MySQL connectivity for BioPerl ===
BioPerl is a collection of open source Perl tools for bioinformatics, genomics and life science.
Documentation can be found at [http://bioperl.org/howtos/ bioperl.org].


To connect to a MySQL server from a Perl module, for example, from one of the [http://bioperl.org/index.html BioPerl] modules which is usually used to upload data to a database, the BioPerl command line should contain the -d option as follows:
There are several BioPerl modules which can be used to upload data to a database. To connect to a MySQL server from one of these, the BioPerl command line should contain the -d option as follows:


  -d [database name]:mysql_read_default_file=.my.cnf ....
  -d [database name]:mysql_read_default_file=.my.cnf ....

Revision as of 17:49, 22 March 2018

Database servers available for researchers

Compute Canada offers access to MySQL and Postgres database servers for researchers:

  • Cedar MySQL server
    • Description: General purpose server for the researcher wanting to set up SQL tables in MySQL and issue SQL commands against them.
    • Server name: cedar-mysql-vm.int.cedar.computecanada.ca
    • Short server name: cedar-mysql-vm (can be used instead of long name on most compute nodes)
    • Version: MariaDB version 10.2 Community Edition
    • Documentation: http://www.mariadb.com
  • Cedar Postgres server
    • Description: General purpose server for the researcher wanting to set up SQL tables in MySQL and issue SQL commands against them. Includes a PostGIS extension available for those needing to do geocoding.
    • Server name: cedar-pgsql-vm.int.cedar.computecanada.ca
    • Short server name: cedar-pgsql-vm (can be used instead of long name on most compute nodes)
    • Version: PostgreSQL version 10.1, PostGIS version 2.4 extension available
    • Documentation: https://www.postgresql.org and https://postgis.net/documentation


Cedar MySQL server

The Cedar MySQL server runs as a VM called "cedar-mysql-vm" (full name: cedar-mysql-vm.int.cedar.computecanada.ca) on a database machine. Users who have accounts on the MySQL server are able to connect only through the Cedar headnode (cedar.computecanada.ca), Cedar compute nodes and joffre machine.

For security, users cannot make an SSH connection to the database server directly.

MySQL account and connection

If you need the privileges to create your own database, you will need a MySQL account. To get a MySQL account on the Cedar MySQL server, please send a request to support@computecanada.ca with the following information:

  • Name
  • Compute Canada account
  • Amount of database space needed for your project

Once the account is created, all required information for connection to the MySQL server will be stored in a file called .my.cnf located in your home directory at Cedar. The file contains:

  • the MySQL username which is the same as your Compute Canada username,
  • the password, which is a random string of characters, not the same as your Compute Canada password,
  • the name of the machine which runs the MySQL server.

The file is confidential and readable only by the user. Please do not change the permissions for this file, nor delete it.

To create an interactive connection to the MySQL server, you should run the latest version of the MySQL client which is available for you to load to your account because by default the version on the server is older and will not offer all the latest features available on the server. Here are the steps to load the latest mysql client tool.

  • step 1
  • step 2

Here is how to run the client

Question.png
[name@server ~]$ mysql

Please do not use the -p option as an argument in running mysql. The required password will be automatically taken from your .my.cnf file if you do not use -p.

It is acceptable to submit a long-running SQL command from the Cedar head node, as the work is being done on the database server. However, if you are running a script which is issuing SQL commands, then it needs to be submitted as a job to the scheduler. See Running jobs for details.

Rules to create a MySQL database

In order to be able to set up MySQL tables and query them, you need to create your own database. You can create multiple MySQL databases. To create a database, the name of the database is arbitrary but it must start with

[username]_

For example, if your username were "david" the name of the database must start with "david_" and the commands to create a database called "david_db1" would be:

Question.png
[name@server ~]$ mysql
 mysql> CREATE DATABASE david_db1;
 mysql> quit

Here is an example of how to work with your new database and create a table in it, populate it, and query it:

Question.png
[name@server ~]$ mysql
 mysql> USE david_db1;
 mysql> CREATE TABLE fubar (age integer, id varchar(10));
 mysql> INSERT INTO fubar VALUES (34, '1122');
 mysql> INSERT INTO fubar VALUES (22, '2233');
 mysql> SELECT * FROM fubar WHERE age > 30;
 mysql> SELECT age FROM fubar WHERE id = '1122';
 mysql> quit

The created database will automatically be accessible from the Cedar head node, compute nodes, and joffre, so you should not need to do any other grant. However, if you want another user with a MySQL account on the server to view tables in your database you can issue this MySQL command:

Question.png
[name@server ~]$ mysql
 mysql> GRANT SELECT ON [database name].* TO '[username 2]'@'172.%';
 mysql> quit

where [username 2] is the MySQL user who takes that grant.

MySQL connectivity for BioPerl

BioPerl is a collection of open source Perl tools for bioinformatics, genomics and life science. Documentation can be found at bioperl.org.

There are several BioPerl modules which can be used to upload data to a database. To connect to a MySQL server from one of these, the BioPerl command line should contain the -d option as follows:

-d [database name]:mysql_read_default_file=.my.cnf ....

MySQL connectivity for GBrowse

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes.

Documentation: http://gmod.org/wiki/GBrowse

In order to connect to MySQL from gbrowse, the corresponding line to connect to MySQL should contain:

db_args       =     -adaptor DBI::mysql
                    -dsn [database name];mysql_read_default_file=/home/[username]/.my.cnf
                    -user [username]

where [username] is the corresponding user name and [database name] is the name of the database.

Cedar PostgreSQL server

The Cedar PostgreSQL server runs as a VM called "cedar-pgsql-vm" (full name: cedar-pgsql-vm.int.cedar.computecanada.ca) on a database machine. Users who have accounts on the PostgreSQL server are able to connect only through the Cedar headnode (cedar.computecanada.ca), Cedar compute nodes and joffre machine.

For security, users cannot make an SSH connection to the database server directly.

To get an account and database on the Cedar PostgreSQL server, send a request to support@computecanada.ca with the following information:

  • Name
  • Compute Canada account
  • Amount of database space needed for your project
  • PostGIS extension required for the database?

PostgreSQL account and connection

PostgreSQL uses IDENT authentication for connection from compute nodes which means a password does not need to be supplied to your PostgreSQL account. When access is from the head node, however, PostgreSQL uses PAM authentication which means that you will be prompted for your Compute Canada password during an interactive session. Example:

Question.png
[name@server ~]$ psql -h cedar-pgsql-vm -d db_[username]

where db_[username] is the name of the database that was set up for you with your PostgreSQL account. (If you require more databases to be set up for your PostgreSQL account to use, please send a request to support@computecanada.ca).

The example above runs an older version of the psql Postgres interactive client that by default is installed on all the nodes:

Question.png
[name@server ~]$ psql --version
psql (PostgreSQL) 9.6.2

You can load the most recent version of the psql Postgres client which will stay loaded until you log off from your session. Example:

Question.png
[name@server ~]$ module load postgresql
Question.png
[name@server ~]$ psql --version
psql (PostgreSQL) 10.2

PostgreSQL connectivity for BioPerl

To connect to Postgres from a Perl module, for example from one of BioPerl modules, the command line should contain "-a" and "-d" options as follows:

-a DBI::Pg  -d dbi:Pg:dbname=[database name] ....

where [database name] is the name of your database.

PostgreSQL connectivity for gbrowse

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes.

Documentation: http://gmod.org/wiki/GBrowse

In order to connect to postres from gbrowse, the corresponding line in the configuration file should contain:

db_args       =    -dsn dbi:Pg:dbname=[database name ]
                   -user [username]