Galaxy: Difference between revisions
Line 10: | Line 10: | ||
=== Galaxy Directory Structure === | === Galaxy Directory Structure === | ||
Galaxy is usually installed on the project directory of the group and it contains several sub-directories. The name of the Galaxy top directory is determined by taking the first two character of PI username + " | Galaxy is usually installed on the project directory of the group and it contains several sub-directories. The name of the Galaxy top directory is determined by taking the first two character of PI username + "glxy". For example if PI username is "davidc" the Galaxy top directory will "daglxy" and it is located in <code>/project/group name/</code> were <code>group name</code> is the default group name of PI, eg., <code>def-davidc</code>. Galaxy main directory contains the following sub-directories which is slightly different than the original Galaxy package. | ||
* config: It contains all | * config: It contains all required configuration files to set up and optimize the Galaxy server. Below we explain some basic concepts of some of configuration files that need to be set up in order to be compatible with our HPC environment, however, we will not cover all concepts. | ||
* galaxy: It contains the core Galaxy package which is written mostly in | * galaxy: It contains the core Galaxy package which is written mostly in Python. | ||
* logs: Contains two files, <code>galaxy.log</code> and | * logs: Contains two files, <code>galaxy.log</code> and <code>server.log</code>. All messages during startup or shutdown of the server are written in <code>server.log</code> while all messages during the run are written in <code>galaxy.log</code>. | ||
* plugins: Contains all plugins. In original | * plugins: Contains all plugins. In original Galaxy package this directory is located in the <code>galaxy</code> directory. | ||
* tmp: Contains all temporary files that | * tmp: Contains all temporary files that Galaxy needs for compiling and installing tool sheds. | ||
* venv: It is a | * venv: It is a Python virtual environment directory and it contains all Python package dependencies. | ||
* tool-data: It contains data used by tools See the samples in [https://galaxyproject.org/admin/data-integration data-integration] | * tool-data: It contains data used by tools. See the samples in [https://galaxyproject.org/admin/data-integration data-integration] | ||
* tool-dependencies: It contains all | * tool-dependencies: It contains all packages needed for tool sheds. By default packages in this directory are installed using Anaconda. | ||
* database: It contains input | * database: It contains input, output, and error files of all jobs that run on cluster nodes. | ||
=== Galaxy Files ownership and modification === | === Galaxy Files ownership and modification === |
Revision as of 15:35, 13 January 2021
Introduction
Galaxy is an open source, web-based platform for data-intensive biomedical research. It aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain-agnostic and is now used as a general workflow management system in bioinformatics.
The list of tutorials here suggests the range of applications of Galaxy.
Galaxy on Cedar
On cedar we provide one Galaxy instance for every research group. Galaxy installation requires a special setup that needs to be done by Compute Canada (CC) staff. If you need Galaxy for your group please write an email to support team.
Galaxy Directory Structure
Galaxy is usually installed on the project directory of the group and it contains several sub-directories. The name of the Galaxy top directory is determined by taking the first two character of PI username + "glxy". For example if PI username is "davidc" the Galaxy top directory will "daglxy" and it is located in /project/group name/
were group name
is the default group name of PI, eg., def-davidc
. Galaxy main directory contains the following sub-directories which is slightly different than the original Galaxy package.
- config: It contains all required configuration files to set up and optimize the Galaxy server. Below we explain some basic concepts of some of configuration files that need to be set up in order to be compatible with our HPC environment, however, we will not cover all concepts.
- galaxy: It contains the core Galaxy package which is written mostly in Python.
- logs: Contains two files,
galaxy.log
andserver.log
. All messages during startup or shutdown of the server are written inserver.log
while all messages during the run are written ingalaxy.log
. - plugins: Contains all plugins. In original Galaxy package this directory is located in the
galaxy
directory. - tmp: Contains all temporary files that Galaxy needs for compiling and installing tool sheds.
- venv: It is a Python virtual environment directory and it contains all Python package dependencies.
- tool-data: It contains data used by tools. See the samples in data-integration
- tool-dependencies: It contains all packages needed for tool sheds. By default packages in this directory are installed using Anaconda.
- database: It contains input, output, and error files of all jobs that run on cluster nodes.
Galaxy Files ownership and modification
All files of your galaxy instance belongs to a "pseudo account" or "shared account" that is generated by admin at installation time. pseudo accounts do not belong to a real users but they belong to a specific group. They never expired and everyone within the group is able to login as pseudo account using SSH key. The name of the pseudo account in this case is the same name as the top galaxy directory explained above, eg., daglxy
. In order to modify any file of your Galaxy instance, eg., configure files, you first need to login as "pseudo account". To login please generate your SSH key and store your public key somewhere in your home
directory and let the admin knows about that. The admin will then store your public key in appropriate place and then you are able to login to your pseudo account.
Galaxy Server
Galaxy server cannot be run on cedar, please do not run startup script on cedar. Instead we use another machine (called gateway) that contains a web server with all cedar /project
and /home
mounted in. SSH connection to this machine by users is not possible due to the security reason but you can manage your Galaxy server in this machine by going to website https://gateway.cedar.computecanada.ca/ and follow Galaxy link.
Galaxy configuration
All files in config
directory are used to configure your Galaxy server. Configuring and optimizing Galaxy is very tricky and requires a broad scientific and technical knowledge and explaining all of them is beyond our topic. We assume users who are requiring galaxy have knowledge to further setup its own galaxy instance. Here we explain some important and basic setup that needed to be done by admin in order to server to work. We recommend to go though configure files and set them up appropriately. However, we strongly recommend not to overwrite the following variables that are set by the admin. Here are some of those variables:
- File
galaxy.yml
: its most important and the main configure file. The following variables are set in this file:http:
contain your unique port numberdatabase_connection
the name of your Galaxy database and your database server.virtualenv
the path to python virtual environment in gateway machinefile_path, new_file_path, tool_config_file, shed_tool_config_file, tool_dependency_dir, tool_data_path, visualization_plugins_directory, job_working_directory, cluster_files_directory, template_cache_path, citation_cache_data_dir, citation_cache_lock_dir
setup appropriate paths for tools, tool sheds and dependencies.
job_conf.xml
: All variables in this files is used in job submission to cedar. Various packages have different job specification, for example package "spades" uses 8 cores with the walltime of 3 hours and job will be submission under your default group namedef-xxxxx
. Please take a look at this file and setup your desire job specification. Note that any change in these configuration files requites to restart the server.