Database servers: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 17: Line 17:
== Cedar MySQL server ==
== Cedar MySQL server ==


The Cedar MySQL server runs as a VM called "cedar-mysql-vm" (full name: cedar-mysql-vm.int.cedar.computecanada.ca) on a database machine.
The Cedar MySQL server offers MariaDB version 10.2 which is compatible with other flavours of MySQL. For information on compatibility, see https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/ . For information on the latest features in MariaDB version 10.2, see https://mariadb.com/resources/blog/whats-new-mariadb-server-102 .
 
The Cedar MySQL MariaDB server runs as a VM called "cedar-mysql-vm" (full name: cedar-mysql-vm.int.cedar.computecanada.ca) on a database machine.
Users who have accounts on the MySQL server are able to connect only through the Cedar headnode
Users who have accounts on the MySQL server are able to connect only through the Cedar headnode
(cedar.computecanada.ca), Cedar compute nodes and cedar portal (https://gateway.cedar.computecanada.ca).
(cedar.computecanada.ca), Cedar compute nodes and cedar portal (https://gateway.cedar.computecanada.ca).
Line 36: Line 38:
It is acceptable to submit a long-running SQL command from the Cedar head node, as most of the CPU usage is taken from the database server side. However, if you are running a script which is issuing SQL commands and uses lots of CPU, then it needs to be submitted as a job to the scheduler. See [[Running jobs]] for details.
It is acceptable to submit a long-running SQL command from the Cedar head node, as most of the CPU usage is taken from the database server side. However, if you are running a script which is issuing SQL commands and uses lots of CPU, then it needs to be submitted as a job to the scheduler. See [[Running jobs]] for details.


=== Rules to create a MySQL database ===
=== How to set up your MySQL database ===


In order to be able to set up MySQL tables and query them, you need to create your own database. To create a database, the name of the database is arbitrary but it must start with "<username>_"
In order to be able to set up MySQL tables and query them, you need to create your own database. To create a database, the name of the database is arbitrary but it must start with "<username>_"


where <username> is your MySQL username. For example, if your username were "david" the name of the database '''must''' start with "david_" and the commands to create a database called "david_db1" would be:
where <username> is your MySQL username. For example, if your username were "david" the name of the database '''must''' start with "david_" and the commands to create a database called "david_db1", for example, would be:


{{Command|mysql
{{Command|mysql
Line 48: Line 50:


Users can create multiple MySQL databases as long as they all start with "<username>_".  
Users can create multiple MySQL databases as long as they all start with "<username>_".  
The created databases will automatically be accessible only to you from the cedar head node, cedar compute nodes, and cedar portal (gateway.cedar.computecanada.ca).
The created databases will automatically be accessible only to you from the cedar head node, cedar compute nodes, and cedar portal (gateway.cedar.computecanada.ca) and you will have full privileges to create SQL objects such as tables, views, etc..
 
=== Examples of working with a MYSQL database ===


The following are examples of working with a sample database called david_db1.
=== How to work with your MYSQL database ===


==== Example: Create a table, populate it, and query it: ====
Suppose you have account "david" and have created a database called "david_db1" and want to use it. Here's how:


{{Command|mysql
{{Command|mysql
  mysql> -- List available databases to you
  mysql> -- List available databases. Confirm david_db1 is in the list
  mysql> SHOW DATABASES;
  mysql> SHOW DATABASES;
  mysql> -- Suppose we see david_db1 in the list and want to work with it
  mysql> -- Get into the database
mysql> USE david_db1;
mysql> -- Create a new table with 4 fields in the database
mysql> CREATE TABLE fubar (age integer, lastname varchar(50), firstname varchar(50), middlename varchar(50));
mysql> -- Insert records into the table
mysql> INSERT INTO fubar VALUES (34, 'Jones','Mary','Sue');
mysql> INSERT INTO fubar VALUES (22, 'Smith','Robert','A.');
mysql> INSERT INTO fubar VALUES (55, 'Lee','Charles','E.');
mysql> -- Execute some queries against the table
mysql> SELECT * FROM fubar WHERE age > 30;
mysql> SELECT age,lastname FROM fubar WHERE age {{=}} 34;
mysql> quit
}}
 
==== Example: Create an index ====
 
If the table gets large, to optimize queries against the age field, look at creating an index for it like this:
 
{{Command|mysql
  mysql> USE david_db1;
  mysql> USE david_db1;
  mysql> -- List the tables you own
  mysql> ... Issue SQL commands. See below for information.
mysql> SHOW TABLES;
mysql> -- Create a new index called fubar_idx1 on the fubar table
mysql> CREATE INDEX fubar_idx1 ON fubar(age);
mysql> -- Issue queries again
mysql> SELECT * FROM fubar WHERE age {{=}} 34;
  mysql> quit
  mysql> quit
}}
}}


==== Example: Create a View ====
Useful websites for information on using MariaDB:
 
* https://mariadb.com/kb/en/ - MariaDB Knowledgebase
For a more complex query that you plan to issue frequently, consider creating a view for it.
* https://mariadb.com/kb/en/library/training-tutorials/ - MariaDB Training and Tutorials
Example:
* https://mariadb.com/kb/en/library/sql-statements-structure/ - MariaDB SQL Statement Structure
 
* https://mariadb.com/kb/en/library/optimization-and-indexes/ - MariaDB Optimization and Indexes
{{Command|mysql
mysql> USE david_db1;
mysql> CREATE VIEW fubar_sortover30 AS SELECT lastname, firstname, middlename, age
    -> FROM fubar WHERE age > 30 ORDER BY lastname, firstname, middlename;
mysql> SELECT * FROM fubar_sortover30;
mysql> quit
}}


=== MySQL connectivity for BioPerl ===
=== MySQL connectivity for BioPerl ===
Line 124: Line 94:


== Cedar PostgreSQL server ==
== Cedar PostgreSQL server ==
The Cedar Postgres server offers Postgres version 10.1 along with the PostGIS extension version 2.4 . Postgres 10.1 mainly is an upgrade to correct bugs in version 10. For information on the latest features in Postgres version 10, see https://wiki.postgresql.org/wiki/New_in_postgres_10 .


The Cedar PostgreSQL server runs as a VM called "cedar-pgsql-vm" (full name: cedar-pgsql-vm.int.cedar.computecanada.ca) on a database machine.
The Cedar PostgreSQL server runs as a VM called "cedar-pgsql-vm" (full name: cedar-pgsql-vm.int.cedar.computecanada.ca) on a database machine.
Line 134: Line 106:
* Your Compute Canada username
* Your Compute Canada username
* Amount of database space needed for your project
* Amount of database space needed for your project
* A name for the database. If you supply no name, we will construct a name for the database from your username, such as "db_david". The name must not already be in use by anyone else.
* Do you need the PostGIS 2.4 extension for the database?
* Do you need the PostGIS extension for the database?


=== PostgreSQL account and connection ===
=== PostgreSQL account and connection ===


Your created PostgreSQL account will have the same username as your computecanada. You will be given a database. The name of the database is arbitrary but it must start with "<username>_" where <username> is your computecanada username. Users are not able to create database. If you need more database please send us a request. PostgreSQL uses IDENT authentication for connection from cedar compute nodes which means no  password is needed to be
Your created PostgreSQL account will have the same username as your Compute Canada username. You will be given a database. The name of the database will typically be "db_<username>" where <username> is your Compute Canada username. Users are not able to create databases. If you need more databases please send us a request at support@computecanada.ca . PostgreSQL uses IDENT authentication for connection from cedar compute nodes which means no  password is needed to be supplied to your PostgreSQL account. For security reasons your Compute Canada password is to NEVER be required or used in a script. This also means that user A from a cedar compute node cannot make a PostgreSQL connection to user B.
supplied to your PostgreSQL account. This also means that user A from cedar computenodes cannot make PostgreSQL connection to user B. Using passwordless connection from computenodes is adventurous because if you have a job script in which you need a PostgreSQL connection you do not need to hard code your computecanada password into your job script which would be potentially a big security concern.  
   
   
When access is from the head node, however, PostgreSQL uses PAM authentication which means that you will be prompted
When access is from the head node, however, PostgreSQL uses PAM authentication which means that you will be prompted
Line 154: Line 124:
psql (PostgreSQL) 9.6.2}}
psql (PostgreSQL) 9.6.2}}


You can load the most recent version of the psql Postgres client which will stay loaded until you log off from your session. Example:
You should load the most recent version of the psql Postgres client which will stay loaded until you log off from your session. Example:


{{Command|module load postgresql}}
{{Command|module load postgresql}}

Revision as of 19:35, 28 March 2018

Database servers available for researchers

Compute Canada offers access to MySQL and Postgres database servers for researchers:

  • Cedar MySQL server
    • Description: General purpose server for the researcher wanting to set up SQL tables in MySQL and issue SQL commands against them.
    • Server name: cedar-mysql-vm.int.cedar.computecanada.ca
    • Short server name: cedar-mysql-vm (can be used instead of long name on most compute nodes)
    • Version: MariaDB version 10.2 Community Edition
    • Documentation: http://www.mariadb.com
  • Cedar Postgres server
    • Description: General purpose server for the researcher wanting to set up SQL tables in MySQL and issue SQL commands against them. Includes a PostGIS extension available for those needing to do geocoding.
    • Server name: cedar-pgsql-vm.int.cedar.computecanada.ca
    • Short server name: cedar-pgsql-vm (can be used instead of long name on most compute nodes)
    • Version: PostgreSQL version 10.1, PostGIS version 2.4 extension available
    • Documentation: https://www.postgresql.org and https://postgis.net/documentation


Cedar MySQL server

The Cedar MySQL server offers MariaDB version 10.2 which is compatible with other flavours of MySQL. For information on compatibility, see https://mariadb.com/kb/en/library/mariadb-vs-mysql-compatibility/ . For information on the latest features in MariaDB version 10.2, see https://mariadb.com/resources/blog/whats-new-mariadb-server-102 .

The Cedar MySQL MariaDB server runs as a VM called "cedar-mysql-vm" (full name: cedar-mysql-vm.int.cedar.computecanada.ca) on a database machine. Users who have accounts on the MySQL server are able to connect only through the Cedar headnode (cedar.computecanada.ca), Cedar compute nodes and cedar portal (https://gateway.cedar.computecanada.ca).

For security, users cannot make an SSH connection to the database server directly.

MySQL account and connection

If you need the privileges to create your own database, you will need a MySQL account. To get a MySQL account on the Cedar MySQL server, please send a request to support@computecanada.ca with the following information:

  • Your Compute Canada username
  • Amount of database space needed for your project

We will then create an MySQL account with the same username as your computecanada username and a 16 digit random string password. The username, password, and database server name and other required information to make a MySQL connection will be stored in a file called .my.cnf located in your home directory on Cedar. The file is confidential and owned by root and only user can read that, e.g., user cannot modify it, however user can delete that. So, please do not delete this file.

Command

Question.png
[name@server ~]$ mysql

from cedar head node can be used to connect to the MySQL server. Please do not use the -p or -h option as an argument in running mysql. The required password and server name is this case and any other information will be automatically taken from your .my.cnf file if you do not use such options.

It is acceptable to submit a long-running SQL command from the Cedar head node, as most of the CPU usage is taken from the database server side. However, if you are running a script which is issuing SQL commands and uses lots of CPU, then it needs to be submitted as a job to the scheduler. See Running jobs for details.

How to set up your MySQL database

In order to be able to set up MySQL tables and query them, you need to create your own database. To create a database, the name of the database is arbitrary but it must start with "<username>_"

where <username> is your MySQL username. For example, if your username were "david" the name of the database must start with "david_" and the commands to create a database called "david_db1", for example, would be:

Question.png
[name@server ~]$ mysql
 mysql> CREATE DATABASE david_db1;
 mysql> quit

Users can create multiple MySQL databases as long as they all start with "<username>_". The created databases will automatically be accessible only to you from the cedar head node, cedar compute nodes, and cedar portal (gateway.cedar.computecanada.ca) and you will have full privileges to create SQL objects such as tables, views, etc..

How to work with your MYSQL database

Suppose you have account "david" and have created a database called "david_db1" and want to use it. Here's how:

Question.png
[name@server ~]$ mysql
 mysql> -- List available databases. Confirm david_db1 is in the list
 mysql> SHOW DATABASES;
 mysql> -- Get into the database
 mysql> USE david_db1;
 mysql> ... Issue SQL commands. See below for information.
 mysql> quit

Useful websites for information on using MariaDB:

MySQL connectivity for BioPerl

BioPerl is a collection of open source Perl tools for bioinformatics, genomics and life science. Documentation can be found at bioperl.org.

There are several BioPerl modules which can be used to upload data to a database. To connect to a MySQL server from one of these, the BioPerl command line should contain the -d option as follows:

-d [database name]:mysql_read_default_file=.my.cnf ....

MySQL connectivity for GBrowse

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes.

Documentation: http://gmod.org/wiki/GBrowse

In order to connect to MySQL from gbrowse, the corresponding line to connect to MySQL should contain:

db_args       =     -adaptor DBI::mysql
                    -dsn [database name];mysql_read_default_file=/home/[username]/.my.cnf
                    -user [username]

where [username] is the corresponding user name and [database name] is the name of the database.

Cedar PostgreSQL server

The Cedar Postgres server offers Postgres version 10.1 along with the PostGIS extension version 2.4 . Postgres 10.1 mainly is an upgrade to correct bugs in version 10. For information on the latest features in Postgres version 10, see https://wiki.postgresql.org/wiki/New_in_postgres_10 .

The Cedar PostgreSQL server runs as a VM called "cedar-pgsql-vm" (full name: cedar-pgsql-vm.int.cedar.computecanada.ca) on a database machine. Users who have accounts on the PostgreSQL server are able to connect only through the Cedar headnode (cedar.computecanada.ca), Cedar compute nodes and Cedar portal machine.

For security, users cannot make an SSH connection to the database server directly.

To get an account and database on the Cedar PostgreSQL server, send a request to support@computecanada.ca with the following information:

  • Your Compute Canada username
  • Amount of database space needed for your project
  • Do you need the PostGIS 2.4 extension for the database?

PostgreSQL account and connection

Your created PostgreSQL account will have the same username as your Compute Canada username. You will be given a database. The name of the database will typically be "db_<username>" where <username> is your Compute Canada username. Users are not able to create databases. If you need more databases please send us a request at support@computecanada.ca . PostgreSQL uses IDENT authentication for connection from cedar compute nodes which means no password is needed to be supplied to your PostgreSQL account. For security reasons your Compute Canada password is to NEVER be required or used in a script. This also means that user A from a cedar compute node cannot make a PostgreSQL connection to user B.

When access is from the head node, however, PostgreSQL uses PAM authentication which means that you will be prompted for a password. Your password for PostgreSQL connection is the same as your Compute Canada password for this case. Example:

Question.png
[name@server ~]$ psql -h cedar-pgsql-vm -d db_[username]

where db_[username] is the name of the database that was set up for you with your PostgreSQL account. (If you require more databases to be set up for your PostgreSQL account to use, please send a request to support@computecanada.ca).

The example above runs an older version of the psql Postgres interactive client that by default is installed on all the nodes:

Question.png
[name@server ~]$ psql --version
psql (PostgreSQL) 9.6.2

You should load the most recent version of the psql Postgres client which will stay loaded until you log off from your session. Example:

Question.png
[name@server ~]$ module load postgresql
Question.png
[name@server ~]$ psql --version
psql (PostgreSQL) 10.2

Examples of working with a PostgreSQL database

The following are examples of working with a sample database called db_david.

Example: Create a table, populate it, and query it:

Question.png
[name@server ~]$ psql -h cedar-pgsql-vm -d db_david
 Password:<-- Enter Compute Canada password
 db_david=> -- List relations (i.e. tables) command
 db_david=> \l
 db_david=> -- Create a new table with 4 fields in the database
 db_david=> CREATE TABLE fubar (age integer, lastname varchar(50), firstname varchar(50), middlename varchar(50));
 db_david=> -- Insert records into the table
 db_david=> INSERT INTO fubar VALUES (34, 'Jones','Mary','Sue');
 db_david=> INSERT INTO fubar VALUES (22, 'Smith','Robert','A.');
 db_david=> INSERT INTO fubar VALUES (55, 'Lee','Charles','E.');
 db_david=> -- Execute some queries against the table
 db_david=> SELECT * FROM fubar WHERE age > 30;
 db_david=> SELECT age,lastname FROM fubar WHERE age = 34;
 db_david=> -- Quit command
 db_david=> \q

PostgreSQL connectivity for BioPerl

To connect to Postgres from a Perl module, for example from one of BioPerl modules, the command line should contain "-a" and "-d" options as follows:

-a DBI::Pg  -d dbi:Pg:dbname=[database name] ....

where [database name] is the name of your database.

PostgreSQL connectivity for gbrowse

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes.

Documentation: http://gmod.org/wiki/GBrowse

In order to connect to postres from gbrowse, the corresponding line in the configuration file should contain:

db_args       =    -dsn dbi:Pg:dbname=[database name ]
                   -user [username]