Galaxy: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
Line 28: Line 28:
=== Galaxy server management ===
=== Galaxy server management ===


Starting Galaxy server is the first thing that needs to be done by users. Galaxy server should NOT be run on cedar headnode or any compute node. We have a dedicated server called  "gateway" and it contains a web server with relevant Cedar filesystems, <code>/project</code> and <code>/home</code> directories mounted on it. Users cannot make a SSH connection to this machine due to security reasons, however we have designed a web site on this machine that can start/stop  your galaxy server. The website also allows you to user Galaxy web interface to communicate with the server. To do that please go to the website [https://gateway.cedar.computecanada.ca/ https://gateway.cedar.computecanada.ca/] and click on Galaxy link. You will be asked to enter your username and password. Your username and password is the same as your computecanada one. Once you authenticate then you will be automatically redirect to your galaxy server manager website where you can Start/Stop your server or use Galaxy web interface.
Starting Galaxy server is the first thing that needs to be done by users. Galaxy server should NOT be run on cedar login node or any compute node. We have a dedicated server called  "gateway" and it is used for this purpose. It contains a web server with relevant Cedar filesystems, <code>/project</code> and <code>/home</code> directories mounted on it. Users cannot make a SSH connection to this machine due to security reasons, however we have designed a web site on this machine that allows users to start/stop  their own galaxy server. The website also allows users to user Galaxy web interface to communicate with the server. To do that please go to the website [https://gateway.cedar.computecanada.ca/ https://gateway.cedar.computecanada.ca/] and click on Galaxy link. You will be asked to enter your username and password. Your username and password is the same as your computecanada one. Once you authenticate then you will be automatically redirect to your galaxy server manager website where you can manage your server or use Galaxy web interface.


=== Galaxy configuration ===
=== Galaxy configuration ===

Revision as of 08:40, 14 January 2021

Introduction

Galaxy is an open source, web-based platform for data-intensive biomedical research. It aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain-agnostic and is now used as a general workflow management system in bioinformatics.

The list of tutorials here suggests the range of applications of Galaxy.

Galaxy on Cedar

On cedar we provide one Galaxy instance for every research group. Galaxy installation requires a special setup that needs to be done by Compute Canada (CC) staff. If you need Galaxy for your group please write an email to support team.

Galaxy directory structure

Galaxy is usually installed on the project directory of the group and it contains several sub-directories. The name of the Galaxy top directory is determined by taking the first two character of PI username + "glxy". For example if PI username is "davidc" the Galaxy top directory will "daglxy" and it is located in /project/group name/ were group name is the default group name of PI, eg., def-davidc. Galaxy main directory contains the following sub-directories which is slightly different than the original Galaxy package.

  • config: It contains all required configuration files to set up and optimize the Galaxy server. Below we explain some basic concepts of some of configuration files that need to be set up in order to be compatible with our HPC environment, however, we will not cover all concepts.
  • galaxy: It contains the core Galaxy package which is written mostly in Python.
  • logs: Contains two files, galaxy.log and server.log. All messages during startup or shutdown of the server are written in server.log while all messages during the run are written in galaxy.log.
  • plugins: Contains all plugins. In original Galaxy package this directory is located in the galaxy directory.
  • tmp: Contains all temporary files that Galaxy needs for compiling and installing tool sheds.
  • venv: It is a Python virtual environment directory and it contains all Python package dependencies.
  • tool-data: It contains data used by tools. See the samples in data-integration
  • tool-dependencies: It contains all packages needed for tool sheds. By default packages in this directory are installed using Anaconda.
  • database: It contains input, output, and error files of all jobs that run on cluster nodes.

Galaxy files ownership and modification

All files of your Galaxy instance belong to a "pseudo-account", a shared account that is generated by an administrator at installation time. A pseudo-account does not belong to an individual person, but belongs to a specific group. Everyone in the group can log in to the pseudo-account using SSH keys. The name of the pseudo-account in this case is the same name as the top Galaxy directory explained above, eg., daglxy. In order to modify any file of your Galaxy instance, e.g. configuration files, you first need to log in to the pseudo-account. Before you can log in you must generate an SSH key pair, store your public key somewhere in your home directory, and let the administrator know about that. The administrator will store your public key in an appropriate place, after which you can log in to your pseudo-account.

Galaxy server management

Starting Galaxy server is the first thing that needs to be done by users. Galaxy server should NOT be run on cedar login node or any compute node. We have a dedicated server called "gateway" and it is used for this purpose. It contains a web server with relevant Cedar filesystems, /project and /home directories mounted on it. Users cannot make a SSH connection to this machine due to security reasons, however we have designed a web site on this machine that allows users to start/stop their own galaxy server. The website also allows users to user Galaxy web interface to communicate with the server. To do that please go to the website https://gateway.cedar.computecanada.ca/ and click on Galaxy link. You will be asked to enter your username and password. Your username and password is the same as your computecanada one. Once you authenticate then you will be automatically redirect to your galaxy server manager website where you can manage your server or use Galaxy web interface.

Galaxy configuration

Files in the config directory are used to configure your Galaxy server. Configuring and optimizing Galaxy is tricky and explaining all the configuration files is beyond the scope of this article. Here we explain some configuration details which you should know about. We recommend you go though these configuration files and set them up appropriately. However, we strongly recommend not to overwrite the following variables that are set by the admin. Here are some of those variables:

  • File galaxy.yml: its most important and the main configuration file. The following variables are set in this file:
    • http: contains your unique port number
    • database_connection is the name of your Galaxy database and your database server.
    • virtualenv is the path to a Python virtual environment in the gateway machine
    • file_path, new_file_path, tool_config_file, shed_tool_config_file, tool_dependency_dir, tool_data_path, visualization_plugins_directory, job_working_directory, cluster_files_directory, template_cache_path, citation_cache_data_dir, citation_cache_lock_dir are appropriate paths for tools, tool sheds and dependencies.
  • job_conf.xml: Variables in this file are used for submitting jobs to Slurm. Various packages have different job specifications. For example, package "spades" uses 8 cores with a wall-time of 3 hours and the job will be submitted under your default group name def-xxxxx. Please take a look at this file and set up your desired job specifications. Note that any change in these configuration files requires the server to be restarted.