Difference between revisions of "GRCh37/hg19 GRCh38/hg38 Multi-Genome Tutorial"
From GenPlay, Einstein Genome Analyzer
(→Displaying SNPs, Insertions and Deletions) |
(→Adding DNA Sequence Layers) |
||
| (14 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| − | '''Goal''': This tutorial illustrates how the multi-genome mode of GenPlay can be used to simultaneously display data aligned on different reference | + | '''Goal''': This tutorial illustrates how the multi-genome mode of GenPlay can be used to simultaneously display data aligned on different reference genomes. In this tutorial we will compare gene annotation data aligned on GRCh37/Hg19 with gene annotation data aligned on GRCh38/Hg38. |
| − | '''Prerequisite:''' GenPlay | + | '''Prerequisite:''' GenPlay needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the [[Downloads]] page and follow the instructions to download and install GenPlay. |
'''Note:''' The final result of this tutorial is available as a project that can be loaded from the [[Projects#GRCh37/Hg19 GRCh38/Hg38 Multi-Genome Tutorial| Projects]] page of this website. | '''Note:''' The final result of this tutorial is available as a project that can be loaded from the [[Projects#GRCh37/Hg19 GRCh38/Hg38 Multi-Genome Tutorial| Projects]] page of this website. | ||
| Line 17: | Line 17: | ||
*[http://genplay.einstein.yu.edu/library/Human/hg38/Gene_Annotation/Genes_RefSeq_hg38_06.05.2014.bed Refseq BED file for NCBI38/hg38] (right click on the link and select ''Save Link As...'') | *[http://genplay.einstein.yu.edu/library/Human/hg38/Gene_Annotation/Genes_RefSeq_hg38_06.05.2014.bed Refseq BED file for NCBI38/hg38] (right click on the link and select ''Save Link As...'') | ||
*[http://genplay.einstein.yu.edu/library/Human/hg19/Gene_Annotation/Genes_RefSeq_hg19_09.20.2013.bed Refseq BED file for GRCh37/hg19] (right click on the link and select ''Save Link As...'') | *[http://genplay.einstein.yu.edu/library/Human/hg19/Gene_Annotation/Genes_RefSeq_hg19_09.20.2013.bed Refseq BED file for GRCh37/hg19] (right click on the link and select ''Save Link As...'') | ||
| + | *[http://genplay.einstein.yu.edu/library/Human/hg38/DNA_Sequence/hg38.2bit DNA sequence file for NCBI38/hg38] | ||
| + | *[http://genplay.einstein.yu.edu/library/Human/hg19/DNA_Sequence/DNA_hg19_09.20.2013.2bit DNA sequence file for NCBI37/hg19] | ||
== Starting a New Project == | == Starting a New Project == | ||
| Line 23: | Line 25: | ||
After starting GenPlay you will be prompted to select a name, a clade, a genome and an assembly for your project. | After starting GenPlay you will be prompted to select a name, a clade, a genome and an assembly for your project. | ||
| − | You can enter | + | You can enter ''hg19 - hg38 Tutorial'' for the name. Select the mammal clade, the human genome and the hg38 assembly (figure 1). |
[[image: hg19-38_tutorial_project_frame.png|center|frame|Figure 1: New Project Window]] | [[image: hg19-38_tutorial_project_frame.png|center|frame|Figure 1: New Project Window]] | ||
| − | Then, click on the tool box button on the assembly line. A new window will appear allowing you to select chromosomes. For this tutorial we will work only on the basic chromosomes (chr1 to chr22 plus chrX and Y) . You can select the basic chromosomes by clicking on the ''Basics'' (figure 2). | + | Then, click on the tool box button on the assembly line. A new window will appear allowing you to select chromosomes. For this tutorial we will work only on the basic chromosomes (chr1 to chr22 plus chrX and Y) . You can select the basic chromosomes by clicking on the ''Basics'' button (figure 2). |
[[image: hg19-38_tutorial_chromo_selection.png|center|frame|Figure 2: Project Chromosomes]] | [[image: hg19-38_tutorial_chromo_selection.png|center|frame|Figure 2: Project Chromosomes]] | ||
| Line 43: | Line 45: | ||
''Nickname column'' | ''Nickname column'' | ||
| − | The nickname can be used do differentiate samples | + | The nickname can be used do differentiate samples having the same raw name. In this example we can keep the default nick name. |
''Group column'' | ''Group column'' | ||
| Line 54: | Line 56: | ||
==== Automatically ==== | ==== Automatically ==== | ||
| − | You can automatically setup the multi-genome project by clicking on the ''Import Config'' button at the bottom of the project screen and | + | You can automatically setup the multi-genome project by clicking on the ''Import Config'' button at the bottom of the project screen and selecting the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option. |
== Displaying SNPs, Insertions and Deletions == | == Displaying SNPs, Insertions and Deletions == | ||
| Line 63: | Line 65: | ||
[[image: hg19-38_tutorial_add_variant_layer1.png|center|frame|Figure 4: Add Variant Layer]] | [[image: hg19-38_tutorial_add_variant_layer1.png|center|frame|Figure 4: Add Variant Layer]] | ||
| − | Then click on the ''SNPs' check box (figure 5). | + | Then click on the ''SNPs'' check box (figure 5). |
[[image: hg19-38_tutorial_add_variant_layer2.png|center|frame|Figure 5: VCF Select Variants to Add]] | [[image: hg19-38_tutorial_add_variant_layer2.png|center|frame|Figure 5: VCF Select Variants to Add]] | ||
| Line 71: | Line 73: | ||
== Displaying Gene Annotation Layers == | == Displaying Gene Annotation Layers == | ||
| + | Let's start by loading the hg38 gene annotation. | ||
| + | |||
| + | Right click on the handler of the track 4 and select the ''Add Layer(s)'' option (figure 7). | ||
| + | |||
| + | [[image:hg19-38_tutorial_add_layer.png|center|frame|Figure 7: Add Layer]] | ||
| + | |||
| + | Then select the hg38 gene annotation file downloaded at the beginning of this tutorial. On the next screen select '' Gene Annotation Layer'' (figure 8). | ||
| + | |||
| + | [[image: hg19-38_tutorial_load_gene_layer.png|center|frame|Figure 8: Load Gene Annotation]] | ||
| + | |||
| + | And then we need to tell GenPlay that the data were aligned on the hg38 reference genome (figure 9) | ||
| + | |||
| + | [[image:hg19-38_tutorial_hg38_ref.png|center|frame|Figure 9: Select hg38]] | ||
| + | |||
| + | We now need to repeat the same operation for hg19. You will need to select the other gene annotation file and then select hg19 as the genome used for the alignment (figure 10). | ||
| + | |||
| + | [[image: hg19-38_tutorial_hg19_ref.png|center|frame|Figure 10: Select hg19]] | ||
| + | |||
| + | The result of this step is showed in figure 11. | ||
| + | |||
| + | [[image: hg19-38_tutorial_gene_layers_added.png|center|frame|Figure 11: Gene Layers]] | ||
| + | |||
| + | == Adding DNA Sequence Layers == | ||
| + | We are going to insert two blank tracks. To do so, right click on the track handler of track 1 and select the ''Insert'' option of the contextual menu. Repeat this operation a to insert a second empty track. | ||
| + | |||
| + | Now click on the first track handler and select ''Add Layer(s)''. Then, select the hg38 DNA file downloaded at the beginning of this tutorial. When asked what was the genome used for the alignment, select hg38. | ||
| + | |||
| + | Repeat this operation for the hg19 DNA sequence file. Make sure to select hg19/Maternal allele as the genome used for the alignment. | ||
| + | |||
| + | You should now be able to visualize DNA sequences. Please note that you might need to zoom-in in order to visualize the DNA sequences. This can be easily done by using the mouse wheel. | ||
| + | |||
| + | The final result of this tutorial is shown in figure 12. | ||
| + | |||
| + | [[image: hg19-38_tutorial_final_result.png|center|frame|Figure 12: Final Result]] | ||
Latest revision as of 16:12, 27 June 2014
Goal: This tutorial illustrates how the multi-genome mode of GenPlay can be used to simultaneously display data aligned on different reference genomes. In this tutorial we will compare gene annotation data aligned on GRCh37/Hg19 with gene annotation data aligned on GRCh38/Hg38.
Prerequisite: GenPlay needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the Downloads page and follow the instructions to download and install GenPlay.
Note: The final result of this tutorial is available as a project that can be loaded from the Projects page of this website.
Contents
Getting started
In order to set up and manage a Multi-Genome Project in Genplay, please refer to the following sections of the documentation:
Downloading Files
- XML settings file (right click on the link and select Save Link As...)
- VCF file
- Indexed VCF file (Tabix)
- Refseq BED file for NCBI38/hg38 (right click on the link and select Save Link As...)
- Refseq BED file for GRCh37/hg19 (right click on the link and select Save Link As...)
- DNA sequence file for NCBI38/hg38
- DNA sequence file for NCBI37/hg19
Starting a New Project
Selecting the Reference Assembly
After starting GenPlay you will be prompted to select a name, a clade, a genome and an assembly for your project. You can enter hg19 - hg38 Tutorial for the name. Select the mammal clade, the human genome and the hg38 assembly (figure 1).
Then, click on the tool box button on the assembly line. A new window will appear allowing you to select chromosomes. For this tutorial we will work only on the basic chromosomes (chr1 to chr22 plus chrX and Y) . You can select the basic chromosomes by clicking on the Basics button (figure 2).
Setting the Multi-Genome Parameters
Manually
Next we need to setup a multi-genome project. To do so, click on the Multi Genome Project radio button at the bottom of the screen and click on Select VCF. Click on the Add... label of the File column to select the VCF file to load. Select the VCF downloaded earlier. Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI37/hg19 and the reference genome GRCh38/hg38.
File column
Click on the Add... label and then on the Add... menu and select the VCF file downloaded earlier (hg19ToHg38.vcf.gz).
Raw name(s) column
The raw name is automatically filled. In the case of this tutorial there is only one genome beside the hg38 assembly: hg19
Nickname column
The nickname can be used do differentiate samples having the same raw name. In this example we can keep the default nick name.
Group column
Since this tutorial is about comparing reference genomes; a generic group name can be Reference genome. Click on the Group 1 text of the Group column and then click on the pencil to edit the the group name.
The result is shown in figure 3.
Automatically
You can automatically setup the multi-genome project by clicking on the Import Config button at the bottom of the project screen and selecting the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option.
Displaying SNPs, Insertions and Deletions
Once you're done with the previous step click on create to initialize the project. This should only take a few seconds.
We can now display variants. Let's start by loading SNPs. To do so, right click on the handler of the first track (the blue part of the track with a number on it) and then select the Add Variant Layer option (figure 4).
Then click on the SNPs check box (figure 5).
Using the same method, load insertions on track 2 and deletions on track 3. The result should be similar to what is shown on figure 6.
Displaying Gene Annotation Layers
Let's start by loading the hg38 gene annotation.
Right click on the handler of the track 4 and select the Add Layer(s) option (figure 7).
Then select the hg38 gene annotation file downloaded at the beginning of this tutorial. On the next screen select Gene Annotation Layer (figure 8).
And then we need to tell GenPlay that the data were aligned on the hg38 reference genome (figure 9)
We now need to repeat the same operation for hg19. You will need to select the other gene annotation file and then select hg19 as the genome used for the alignment (figure 10).
The result of this step is showed in figure 11.
Adding DNA Sequence Layers
We are going to insert two blank tracks. To do so, right click on the track handler of track 1 and select the Insert option of the contextual menu. Repeat this operation a to insert a second empty track.
Now click on the first track handler and select Add Layer(s). Then, select the hg38 DNA file downloaded at the beginning of this tutorial. When asked what was the genome used for the alignment, select hg38.
Repeat this operation for the hg19 DNA sequence file. Make sure to select hg19/Maternal allele as the genome used for the alignment.
You should now be able to visualize DNA sequences. Please note that you might need to zoom-in in order to visualize the DNA sequences. This can be easily done by using the mouse wheel.
The final result of this tutorial is shown in figure 12.











