GBrowse: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 52: Line 52:
where in this example USERNAME is your username, USERNAME_example_genome is name of your database and example_genomic_sequence.fa is the fasta file containing the entire genome that you want to visualize on gbrowse. header_file contains details about the length of the chromosomes. Here is an example of header file:
where in this example USERNAME is your username, USERNAME_example_genome is name of your database and example_genomic_sequence.fa is the fasta file containing the entire genome that you want to visualize on gbrowse. header_file contains details about the length of the chromosomes. Here is an example of header file:


##sequence-region I 1 15072434
\#\#sequence-region I 1 15072434
##sequence-region II 1 15279421
\#\#sequence-region II 1 15279421
##sequence-region III 1 13783801
##sequence-region III 1 13783801
##sequence-region IV 1 17493829
##sequence-region IV 1 17493829

Revision as of 22:26, 11 April 2018


This article is a draft

This is not a complete article: This is a draft, a work in progress that is intended to be published into an article, which may or may not be ready for inclusion in the main wiki. It should not necessarily be considered factual or authoritative.



Introduction

GBrowse is a combination of database and interactive web pages for manipulating and displaying annotations on genomes. It requires a web interface to display. GBrowse has been installed on cedar and the way we installed that might be different from others. Here we explain briefly how we setup Gbrowse specifically on cedar. For more information about Gbrowse in general and how it works please take a look at Gbrowse official website: http://gmod.org/wiki/GBrowse


Gbrowse Installation

We have installed GBrowse on a server which has access to /home and /project directories of cedar. The server is called "cedar portal" and its web address is "https://gateway.cedar.computecanada.ca". On this server we have installed apache-2.4.6 web server with apache itk module. In order to be able to access files and directories of the the users while maintaining the security each research group who is interested to have GBrowse will get a specific shared account. The itk module is used to assign uid of the shared account for each specific apache vhost. The GBrowse then will be installed on that vhost. Each vhost will get a specific port number to connect. Therefore for each research group there is one GBrowse instance. Every GBrowse instance act as a user with uid of that shared account. In this way GBrowse is able to read files of users withing a research group. If you need GBrowse on cedar portal please send a request to support@computecanada.ca

GBrowse Setup

Gbrowse Config Files

Since Gbrowse needs to be able to read config files of all users within a group, the config files for each group are installed in /project and on following directories:

/project/GROUPID/gbrowse/USERNAME/conf

where GROUPID is your group id and USERNAME is the your user name. There is a symbolic link from this directory to ${HOME}/gbrowse/conf just for simplicity so that you can get access to files within this directory easier. Files in this directory should be readable for all member of the group therefore, please do not change the group permission of files in this directory.

Input Files

GBrowse is able to read .bam files directly, e.g., you do not need to upload them to the database in order to display them. If you want GBrowse to read these kind files you need to pay attention to following:

  • Files need to be copied to your /project directory and they should be readable for group.
  • the directory that contains files and also all top directories should be readable by the group, e.g., the group mode of the group directories should have SGID bit (Set Group ID up on execution) with small "s" (not capital "S").
  • make sure that your file's group in project directory is set to your group membership and not your username.
  • Edit your conf file to specify the path to the bam file. Here is an example
         [example_bam:database]
         db_adaptor        = Bio::DB::Sam
         db_args           = -bam /project/GROUPID/USERNAME/gbrowse_bam_files/example_file.bam
         search options    = default


Setup Database

Before using GBrowse users require to setup their own database. More information about database of computecanada is here. If you decide to use MySQL as your database the first thing you need to do is to grant read access of your corresponding database for the shared account. This can be done by the following command within MySQL:


In order to make a MySQL from GBrowse you can use the following setup in your corresponding GBrowse config files

                    [username_example_genome:database]
                    db_adaptor        = Bio::DB::SeqFeature::Store
                    db_args       =     -adaptor DBI::mysql
                    -dsn DATABASE;mysql_read_default_file=/home/SHARED/.my.cnf
                    -user SHARED


where SHARED is your group shared account that will be given to you and DATABASE is the name of the database. The .my.cnf file is a text file that is created by the administrator and it contains all information that is required to make a MySQL connection from GBrowse. If you decide to make use postgres you need firs give a read access by the shared account for the database you want to use. This can be done the administrator only. Please sens a request to support@computecanada.ca

Upload Files to Database

This can be done by using BioPerl module. Here are commands that need to be run.

  • module load bioperl/1.7.1
  • bp_seqfeature_load.pl -c –d USERNAME_example_genome:mysql_read_default_file=/home/USERNAME/.my.cnf example_genomic_sequence.fa header_file

where in this example USERNAME is your username, USERNAME_example_genome is name of your database and example_genomic_sequence.fa is the fasta file containing the entire genome that you want to visualize on gbrowse. header_file contains details about the length of the chromosomes. Here is an example of header file:

\#\#sequence-region I 1 15072434 \#\#sequence-region II 1 15279421

    1. sequence-region III 1 13783801
    2. sequence-region IV 1 17493829
    3. sequence-region V 1 20924180
    4. sequence-region X 1 17718942
    5. sequence-region MtDNA 1 13794


Please note that