GBrowse: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
 
(23 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{Draft}}
<languages />
== Introduction ==
<translate>
GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. It requires a web interface to display. GBrowse has been installed on cedar and the way we set it up is slightly different. Here we explain briefly how we setup Gbrowse on cedar. For more information about Gbrowse in general and how it works please take a look at Gbrowse official website: http://gmod.org/wiki/GBrowse


== Introduction == <!--T:1-->


==Gbrowse Installation==
<!--T:2-->
We have installed GBrowse on a server which has access to /home and /project directories of cedar. The server is called "cedar portal" and its web address is "https://gateway.cedar.computecanada.ca". On this server we have installed apache-2.4.6 web server with apache itk module. In order to be able to access files and directories of the the users while maintaining the security, we create a shared-account for each research group who is interested to have a GBrowse account. The itk module is used to assign uid of the shared-account for any member of the corresponding group who is successfully login to cedar portal. The user name and password are the same as the user name and password for any computecanada system. Each shared-account has its own GBowse instance.  Therefore, every GBrowse instance act as a user with uid of that shared-account correspond to that group. In this way GBrowse is able to read files of users withing a research group. Please note that as GBrowse (shared-account) can read your GBrowse config files, any other member of your group can also read those particular config files. If you agree with that and If you need GBrowse on cedar portal please send a request to support@computecanada.ca
GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. It requires a web interface to display. GBrowse is installed on [[Cedar]]. The web address of the installation is https://gateway.cedar.computecanada.ca.


==GBrowse Setup==
<!--T:3-->
===Gbrowse Config Files===
The Cedar installation differs in some ways from the standard GBrowse setup described at the official website: http://gmod.org/wiki/GBrowse, particularly with regard to authentication and authorization.
Since Gbrowse needs to be able to read config files of all users within a group, the config files for each group are installed by the admin in /project and on following directories:


/project/GROUPID/gbrowse/USERNAME/conf
==Requesting access to GBrowse== <!--T:4-->


where GROUPID is your group id and USERNAME is the your user name. There is a symbolic link from this directory to ${HOME}/gbrowse/conf just for simplicity so that you can get access to files within this directory easier. Files in this directory should be readable for all member of the group therefore, please do not change the group permission of files in this directory.
<!--T:5-->
In order for GBrowse to be able to access your files and directories, our staff will create a shared account for each research group that requests access to GBrowse. While using GBrowse, any member of a research group can read GBrowse config files and input files belonging to any other member of that group. If you wish to use GBrowse, the Principal Investigator (PI) of your group must agree to this change from the usual file security practices. Have the PI write to our [[technical support]] indicating that they want a GBrowse account to be created for the group, and that they understand the implications of a shared account.


===Input Files===
<!--T:6-->
GBrowse is able to read .bam files directly, e.g., you do not need to upload them to the database in order to display them. If you want GBrowse to read these kind files you need to pay attention to following:
You must also have a database account on Cedar. If you already have one, please give the name of the database in your email. If you do not already have a database account, please read [[Database servers]] carefully and answer the questions given there for setting up a database.
* Files need to be copied to your /project directory and they should be readable for group.
* the directory that contains files and also all top directories should be readable by the group, e.g., the group mode of the group directories should have SGID bit (Set Group ID up on execution) with small "s" (not "S").
* make sure that your file's group in project directory is set to your group membership and not your username.
* Edit your conf file to specify the path to the bam file. Here is an example


          [example_bam:database]
==Setting up GBrowse== <!--T:7-->
          db_adaptor        = Bio::DB::Sam
          db_args          = -bam /project/GROUPID/USERNAME/gbrowse_bam_files/example_file.bam
          search options    = default


===Config files=== <!--T:8-->


===Setup Database===
<!--T:9-->
Before using GBrowse, you need to ask for an database account. Please send your request to support@computecanada.ca regarding a database account on cedar. Then you need  to setup their own database. More information about setting up database on computecanada is here.
Since GBrowse needs to be able to read config files of all users within a group, place your GBrowse config files in the following directory:


If you use MySQL, in order to make a MySQL connection from GBrowse you need to use the following setup in your corresponding GBrowse config files
<!--T:10-->
/project/''GROUPID''/gbrowse/''USERNAME''/conf


                    [username_example_genome:database]
<!--T:11-->
                    db_adaptor    =    Bio::DB::SeqFeature::Store
where <code>''GROUPID''</code> is your group id and <code>''USERNAME''</code> is your user name. We will create a symbolic link from <code>${HOME}/gbrowse-config/</code> to this directory for your convenience. Files in this directory should be readable by all members of the group, so please do not change the group permission of files in this directory.
                    db_args      =    -adaptor DBI::mysql
                    -dsn DATABASE;mysql_read_default_file=~/.my.cnf
                    -user SHARED


===Configuring the database connection=== <!--T:12-->


where  DATABASE is the name of your database. The .my.cnf file is a text file that is created by the administrator and it contains all information that is required for the shared-account to make a MySQL connection to MySQL.
<!--T:13-->
If you decide to use Postgres you need to use the following setup in your corresponding GBrowse config files
If you use MySQL, you need the following in your GBrowse config files:


                    [username_example_genome:database]
<!--T:14-->
                    db_adaptor    = Bio::DB::SeqFeature::Store
[username_example_genome:database]
                    db_args      =  -adaptor -adaptor DBI::Pg
db_adaptor    =     Bio::DB::SeqFeature::Store
                    -dsn          =  dbi:Pg:dbname=DATABASE
db_args      =   -adaptor DBI::mysql
  -dsn ''DATABASE'';mysql_read_default_file=/home/''SHARED''/.my.cnf
 
<!--T:15-->
where <code>''DATABASE''</code> is the name of your database and <code>''SHARED''</code> is the shared account. The <code>.my.cnf</code> file is a text file that is created by our staff. It contains information required for the shared account to make a connection to MySQL.
 
<!--T:16-->
If you decide to use Postgres, you need the following in your GBrowse config files:
 
<!--T:17-->
[username_example_genome:database]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args      =  -adaptor DBI::Pg
-dsn          =  dbi:Pg:dbname=''DATABASE''
                      
                      
where DATABASE is the name of your database.
where <code>''DATABASE''</code> is the name of your database.
 
==Using GBrowse== <!--T:18-->


===Upload Files to Database===
===Input files=== <!--T:19-->
This can be done by using BioPerl module. Here are commands that need to be run.


* module load bioperl/1.7.1
<!--T:20-->
* bp_seqfeature_load.pl -c –d USERNAME_example_genome:mysql_read_default_file=/home/USERNAME/.my.cnf example_genomic_sequence.fa header_file
GBrowse is able to read .bam files directly. You do not need to upload them to the database in order to display them. If you want GBrowse to read these .bam files directly:
where in this example USERNAME is your username, USERNAME_example_genome is name of your database and example_genomic_sequence.fa is the fasta file containing the entire genome that you want to visualize on gbrowse. header_file contains details about the length of the chromosomes. Here is an example of header file:
* Files need to be copied to your <code>/project</code> directory and they should be readable by the group.  
* The directory that contains the .bam files must have the <tt>setgid</tt> and <tt>group-execute</tt> bits turned on; that is, the output of <code>ls –l</code> must show a small "s" in the group-execute field (not a large "S").
* Make sure that the .bam file's group ownership is set to your group and not to your username. For example, <code>jsmith:jsmith</code> is wrong, <code>jsmith:def-kjones</code> is right.
* Edit your config file to specify the path to the .bam file. Here is an example:


<!--T:21-->
[example_bam:database]
db_adaptor        = Bio::DB::Sam
db_args          = -bam /project/''GROUPID''/''USERNAME''/gbrowse_bam_files/example_file.bam
search options    = default
===Uploading files to the database=== <!--T:22-->
<!--T:23-->
This can be done using BioPerl. Here are commands that need to be run.
<!--T:24-->
module load bioperl/1.7.1
bp_seqfeature_load.pl -c –d ''DATABASE'':mysql_read_default_file=/home/''USERNAME''/.my.cnf \
    example_genomic_sequence.fa header_file
<!--T:25-->
In this example <code>''DATABASE''</code> is the name of your database and <code>example_genomic_sequence.fa</code> is the [https://en.wikipedia.org/wiki/FASTA_format FASTA file] containing the entire genome that you want to visualize with GBrowse. <code>header_file</code> contains details about the length of the chromosomes. Here is an example of a header file:
<!--T:26-->
<pre>
##sequence-region I 1 15072434
##sequence-region I 1 15072434
##sequence-region II 1 15279421
##sequence-region II 1 15279421
Line 64: Line 95:
##sequence-region X 1 17718942
##sequence-region X 1 17718942
##sequence-region MtDNA 1 13794
##sequence-region MtDNA 1 13794
</pre>
<!--T:27-->
We remind you that the above commands should be run via the [[Running jobs|job scheduler]]. Do not run these on the head node!
<!--T:28-->
Once you uploaded your data to your database, you need to grant view access to the <code>''SHARED''</code> account so that GBrowse is able to access your database for reading. Please see [[Database servers#How_to_share_your_MySQL_data|How to share your MySQL data]].




We would like to remind you that running above commands need to be performed by job scheduler, (do not run it on the head node)
</translate>

Latest revision as of 22:32, 22 June 2022

Other languages:

Introduction

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. It requires a web interface to display. GBrowse is installed on Cedar. The web address of the installation is https://gateway.cedar.computecanada.ca.

The Cedar installation differs in some ways from the standard GBrowse setup described at the official website: http://gmod.org/wiki/GBrowse, particularly with regard to authentication and authorization.

Requesting access to GBrowse

In order for GBrowse to be able to access your files and directories, our staff will create a shared account for each research group that requests access to GBrowse. While using GBrowse, any member of a research group can read GBrowse config files and input files belonging to any other member of that group. If you wish to use GBrowse, the Principal Investigator (PI) of your group must agree to this change from the usual file security practices. Have the PI write to our technical support indicating that they want a GBrowse account to be created for the group, and that they understand the implications of a shared account.

You must also have a database account on Cedar. If you already have one, please give the name of the database in your email. If you do not already have a database account, please read Database servers carefully and answer the questions given there for setting up a database.

Setting up GBrowse

Config files

Since GBrowse needs to be able to read config files of all users within a group, place your GBrowse config files in the following directory:

/project/GROUPID/gbrowse/USERNAME/conf

where GROUPID is your group id and USERNAME is your user name. We will create a symbolic link from ${HOME}/gbrowse-config/ to this directory for your convenience. Files in this directory should be readable by all members of the group, so please do not change the group permission of files in this directory.

Configuring the database connection

If you use MySQL, you need the following in your GBrowse config files:

[username_example_genome:database]
db_adaptor    =     Bio::DB::SeqFeature::Store
db_args       =    -adaptor DBI::mysql
-dsn DATABASE;mysql_read_default_file=/home/SHARED/.my.cnf

where DATABASE is the name of your database and SHARED is the shared account. The .my.cnf file is a text file that is created by our staff. It contains information required for the shared account to make a connection to MySQL.

If you decide to use Postgres, you need the following in your GBrowse config files:

[username_example_genome:database]
db_adaptor    = Bio::DB::SeqFeature::Store
db_args       =  -adaptor DBI::Pg
-dsn          =  dbi:Pg:dbname=DATABASE
                    

where DATABASE is the name of your database.

Using GBrowse

Input files

GBrowse is able to read .bam files directly. You do not need to upload them to the database in order to display them. If you want GBrowse to read these .bam files directly:

  • Files need to be copied to your /project directory and they should be readable by the group.
  • The directory that contains the .bam files must have the setgid and group-execute bits turned on; that is, the output of ls –l must show a small "s" in the group-execute field (not a large "S").
  • Make sure that the .bam file's group ownership is set to your group and not to your username. For example, jsmith:jsmith is wrong, jsmith:def-kjones is right.
  • Edit your config file to specify the path to the .bam file. Here is an example:
[example_bam:database]
db_adaptor        = Bio::DB::Sam
db_args           = -bam /project/GROUPID/USERNAME/gbrowse_bam_files/example_file.bam
search options    = default

Uploading files to the database

This can be done using BioPerl. Here are commands that need to be run.

module load bioperl/1.7.1
bp_seqfeature_load.pl -c –d DATABASE:mysql_read_default_file=/home/USERNAME/.my.cnf \
   example_genomic_sequence.fa header_file

In this example DATABASE is the name of your database and example_genomic_sequence.fa is the FASTA file containing the entire genome that you want to visualize with GBrowse. header_file contains details about the length of the chromosomes. Here is an example of a header file:

##sequence-region I 1 15072434
##sequence-region II 1 15279421
##sequence-region III 1 13783801
##sequence-region IV 1 17493829
##sequence-region V 1 20924180
##sequence-region X 1 17718942
##sequence-region MtDNA 1 13794

We remind you that the above commands should be run via the job scheduler. Do not run these on the head node!

Once you uploaded your data to your database, you need to grant view access to the SHARED account so that GBrowse is able to access your database for reading. Please see How to share your MySQL data.