Genomics Terminology: Difference between revisions
Bradley Monk (talk | contribs) mNo edit summary |
Bradley Monk (talk | contribs) |
||
Line 16: | Line 16: | ||
== Molecular organization and gene content == | == Molecular organization and gene content == | ||
The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed [ | The total length of the [https://en.wikipedia.org/wiki/Human_genome human genome] is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed [https://en.wikipedia.org/wiki/autosome autosomes], plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table. | ||
{| class="wikitable sortable" style="font-size:75%;" | {| class="wikitable sortable" style="font-size:75%;" | ||
|- | |- | ||
! Chromosome !! Length<br />( | ! Chromosome !! Length<br />(Millimetre|mm) !! Base<br />pairs !! Variations !! Protein-<br />coding<br />genes !! Pseudo-<br />genes !! Total<br />long<br />ncRNA || Total<br />small<br />ncRNA || miRNA !! rRNA !! snRNA !! snoRNA !! Misc<br />ncRNA !! Links !! Centromere<br />position<br />(Mega base pairs|Mbp) || Cumulative<br />(%) | ||
|- | |- | ||
| | | Chromosome 1 (human)|1 || 85 ||style="text-align: right;"| 248,956,422 ||style="text-align: right;"| 12,151,146 || 2058 || 1220 || 1200 || 496 || 134 || 66 || 221 || 145 || 192 || [https://archive.today/20130414235101/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=1 EBI] || 125 || 7.9 | ||
|- | |- | ||
| | | Chromosome 2 (human)|2 || 83 ||style="text-align: right;"| 242,193,529 ||style="text-align: right;"| 12,945,965 || 1309 || 1023 || 1037 || 375 || 115 || 40 || 161 || 117 || 176 || [https://archive.today/20130414170207/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=2 EBI] || 93.3 || 16.2 | ||
|- | |- | ||
| | | Chromosome 3 (human)|3 || 67 ||style="text-align: right;"| 198,295,559 ||style="text-align: right;"| 10,638,715 || 1078 || 763 || 711 || 298 || 99 || 29 || 138 || 87 || 134 || [https://archive.today/20130414155057/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=3 EBI] || 91 || 23 | ||
|- | |- | ||
| | | Chromosome 4 (human)|4 || 65 ||style="text-align: right;"| 190,214,555 ||style="text-align: right;"| 10,165,685 || 752 || 727 || 657 || 228 || 92 || 24 || 120 || 56 || 104 || [https://archive.today/20130414184734/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=4 EBI] || 50.4 || 29.6 | ||
|- | |- | ||
| | | Chromosome 5 (human)|5 || 62 ||style="text-align: right;"| 181,538,259 ||style="text-align: right;"| 9,519,995 || 876 || 721 || 844 || 235 || 83 || 25 || 106 || 61 || 119 || [https://archive.today/20130414165438/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=5 EBI] || 48.4 || 35.8 | ||
|- | |- | ||
| | | Chromosome 6 (human)|6 || 58 ||style="text-align: right;"| 170,805,979 ||style="text-align: right;"| 9,130,476 || 1048 || 801 || 639 || 234 || 81 || 26 || 111 || 73 || 105 || [https://archive.today/20130414210620/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=6 EBI] || 61 || 41.6 | ||
|- | |- | ||
| | | Chromosome 7 (human)|7 || 54 ||style="text-align: right;"| 159,345,973 ||style="text-align: right;"| 8,613,298 || 989 || 885 || 605 || 208 || 90 || 24 || 90 || 76 || 143 || [https://archive.today/20130414191348/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=7 EBI] || 59.9 || 47.1 | ||
|- | |- | ||
| | | Chromosome 8 (human)|8 || 50 ||style="text-align: right;"| 145,138,636 ||style="text-align: right;"| 8,221,520 || 677 || 613 || 735 || 214 || 80 || 28 || 86 || 52 || 82 || [https://archive.today/20130414151536/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=8 EBI] || 45.6 || 52 | ||
|- | |- | ||
| | | Chromosome 9 (human)|9 || 48 ||style="text-align: right;"| 138,394,717 ||style="text-align: right;"| 6,590,811 || 786 || 661 || 491 || 190 || 69 || 19 || 66 || 51 || 96 || [https://archive.today/20130414154313/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=9 EBI] || 49 || 56.3 | ||
|- | |- | ||
| | | Chromosome 10 (human)|10 || 46 ||style="text-align: right;"| 133,797,422 ||style="text-align: right;"| 7,223,944 || 733 || 568 || 579 || 204 || 64 || 32 || 87 || 56 || 89 || [https://archive.today/20130414155104/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=10 EBI] || 40.2 || 60.9 | ||
|- | |- | ||
| | | Chromosome 11 (human)|11 || 46 ||style="text-align: right;"| 135,086,622 ||style="text-align: right;"| 7,535,370 || 1298 || 821 || 710 || 233 || 63 || 24 || 74 || 76 || 97 || [https://archive.today/20130414155450/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=11 EBI] || 53.7 || 65.4 | ||
|- | |- | ||
| | | Chromosome 12 (human)|12 || 45 ||style="text-align: right;"| 133,275,309 ||style="text-align: right;"| 7,228,129 || 1034 || 617 || 848 || 227 || 72 || 27 || 106 || 62 || 115 || [https://archive.today/20130414163842/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=12 EBI] || 35.8 || 70 | ||
|- | |- | ||
| | | Chromosome 13 (human)|13 || 39 ||style="text-align: right;"| 114,364,328 ||style="text-align: right;"| 5,082,574 || 327 || 372 || 397 || 104 || 42 || 16 || 45 || 34 || 75 || [https://archive.today/20130414153908/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=13 EBI] || 17.9 || 73.4 | ||
|- | |- | ||
| | | Chromosome 14 (human)|14 || 36 ||style="text-align: right;"| 107,043,718 ||style="text-align: right;"| 4,865,950 || 830 || 523 || 533 || 239 || 92 || 10 || 65 || 97 || 79 || [https://archive.today/20130414221716/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=14 EBI] || 17.6 || 76.4 | ||
|- | |- | ||
| | | Chromosome 15 (human)|15 || 35 ||style="text-align: right;"| 101,991,189 ||style="text-align: right;"| 4,515,076 || 613 || 510 || 639 || 250 || 78 || 13 || 63 || 136 || 93 || [https://archive.today/20130414185000/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=15 EBI] || 19 || 79.3 | ||
|- | |- | ||
| | | Chromosome 16 (human)|16 || 31 ||style="text-align: right;"|90,338,345 || style="text-align: right;"|5,101,702 || 873 || 465 || 799 || 187 || 52 || 32 || 53 || 58 || 51 || [https://archive.today/20130414182905/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=16 EBI] || 36.6 || 82 | ||
|- | |- | ||
| | | Chromosome 17 (human)|17 || 28 ||style="text-align: right;"| 83,257,441 ||style="text-align: right;"| 4,614,972 || 1197 || 531 || 834 || 235 || 61 || 15 || 80 || 71 || 99 || [https://archive.today/20130414171249/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=17 EBI] || 24 || 84.8 | ||
|- | |- | ||
| | | Chromosome 18 (human)|18 || 27 ||style="text-align: right;"| 80,373,285 ||style="text-align: right;"| 4,035,966 || 270 || 247 || 453 || 109 || 32 || 13 || 51 || 36 || 41 || [https://archive.today/20130414160719/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=18 EBI] || 17.2 || 87.4 | ||
|- | |- | ||
| | | Chromosome 19 (human)|19 || 20 ||style="text-align: right;"| 58,617,616 ||style="text-align: right;"| 3,858,269 || 1472 || 512 || 628 || 179 || 110 || 13 || 29 || 31 || 61 || [https://archive.today/20130414165626/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=19 EBI] || 26.5 || 89.3 | ||
|- | |- | ||
| | | Chromosome 20 (human)|20 || 21 ||style="text-align: right;"| 64,444,167 ||style="text-align: right;"| 3,439,621 || 544 || 249 || 384 || 131 || 57 || 15 || 46 || 37 || 68 || [https://archive.today/20130414185621/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=20 EBI] || 27.5 || 91.4 | ||
|- | |- | ||
| | | Chromosome 21 (human)|21 || 16 ||style="text-align: right;"| 46,709,983 ||style="text-align: right;"| 2,049,697 || 234 || 185 || 305 || 71 || 16 || 5 || 21 || 19 || 24 || [https://archive.today/20130414191700/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=21 EBI] || 13.2 || 92.6 | ||
|- | |- | ||
| | | Chromosome 22 (human)|22 || 17 ||style="text-align: right;"| 50,818,468 ||style="text-align: right;"| 2,135,311 || 488 || 324 || 357 || 78 || 31 || 5 || 23 || 23 || 62 || [https://archive.today/20130414213655/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=22 EBI] || 14.7 || 93.8 | ||
|- | |- | ||
| | | X chromosome|X || 53 ||style="text-align: right;"| 156,040,895 ||style="text-align: right;"| 5,753,881 || 842 || 874 || 271 || 258 || 128 || 22 || 85 || 64 || 100 || [https://archive.today/20130414192751/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=X EBI] || 60.6 || 99.1 | ||
|- | |- | ||
| | | Y chromosome|Y || 20 ||style="text-align: right;"| 57,227,415 ||style="text-align: right;"| 211,643 || 71 || 388 || 71 || 30 || 15 || 7 || 17 || 3 || 8 || [https://archive.today/20130414161928/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=Y EBI] || 10.4 || 100 | ||
|- | |- | ||
| | | Mitochondrial DNA|mtDNA || 0.0054 ||style="text-align: right;"| 16,569 ||style="text-align: right;"| 929 || 13 || 0 || 0 || 24 || 0 || 2 || 0 || 0 || 0 || [https://archive.today/20130414220526/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=MT EBI] || N/A || 100 | ||
|- | |- | ||
Line 78: | Line 77: | ||
|} | |} | ||
'''Table 1''' (above) summarizes the physical organization and gene content of the human | '''Table 1''' (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation. | ||
'''The number of genes''' in the human genome is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies. | |||
{| class="wikitable" | {| class="wikitable" | ||
|+Table 2. Number of human genes in different databases as of July 2018 | |+Table 2. Number of human genes in different databases as of July 2018 | ||
! | ! | ||
!Gencode | !Gencode | ||
!Ensemble | !Ensemble | ||
!Refseq | !Refseq | ||
!CHESS | !CHESS | ||
|- | |- | ||
|protein-coding genes | |protein-coding genes | ||
Line 125: | Line 125: | ||
|328,827 | |328,827 | ||
|} | |} | ||
Human genetic variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December 2016. The number of identified variations is expected to increase as further Personal genomics|personal genomes are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser. | |||
Small non-coding RNAs are RNAs of as many as 200 bases that do not have protein-coding potential. These include: microRNAs, or miRNAs (post-transcriptional regulators of gene expression), small nuclear RNAs, or snRNAs (the RNA components of spliceosomes), and small nucleolar RNAs, or snoRNA (involved in guiding chemical modifications to other RNA molecules). Long non-coding RNAs are RNA molecules longer than 200 bases that do not have protein-coding potential. These include: ribosomal RNAs, or rRNAs (the RNA components of ribosomes), and a variety of other long RNAs that are involved in regulation of gene expression, epigenetic modifications of DNA nucleotides and histone proteins, and regulation of the activity of protein-coding genes. Small discrepancies between total-small-ncRNA numbers and the numbers of specific types of small ncNRAs result from the former values being sourced from Ensembl release 87 and the latter from Ensembl release 68. | |||
The | The entropy rate of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair. | ||
Revision as of 23:17, 28 October 2020
proband
A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first affected individual in a family who brings a genetic disorder to the attention of the medical community.
trio analysis
A trio refers to 2 parents + 1 offspring (2 + 1 = 3, hence trio). In medical genetics, trio analysis often means the analysis of a proband's genome and along with their parents genome. An exome trio-based approach is fundamental to the identification of heterozygous dominant pathogenic variants (in an afflicted proband and their unaffected parents).
Molecular organization and gene content
The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed autosomes, plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table.
Chromosome | mm) | Base pairs |
Variations | Protein- coding genes |
Pseudo- genes |
Total long ncRNA |
Total small ncRNA |
miRNA | rRNA | snRNA | snoRNA | Misc ncRNA |
Links | Mbp) | Cumulative (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 85 | 248,956,422 | 12,151,146 | 2058 | 1220 | 1200 | 496 | 134 | 66 | 221 | 145 | 192 | EBI | 125 | 7.9 |
2 | 83 | 242,193,529 | 12,945,965 | 1309 | 1023 | 1037 | 375 | 115 | 40 | 161 | 117 | 176 | EBI | 93.3 | 16.2 |
3 | 67 | 198,295,559 | 10,638,715 | 1078 | 763 | 711 | 298 | 99 | 29 | 138 | 87 | 134 | EBI | 91 | 23 |
4 | 65 | 190,214,555 | 10,165,685 | 752 | 727 | 657 | 228 | 92 | 24 | 120 | 56 | 104 | EBI | 50.4 | 29.6 |
5 | 62 | 181,538,259 | 9,519,995 | 876 | 721 | 844 | 235 | 83 | 25 | 106 | 61 | 119 | EBI | 48.4 | 35.8 |
6 | 58 | 170,805,979 | 9,130,476 | 1048 | 801 | 639 | 234 | 81 | 26 | 111 | 73 | 105 | EBI | 61 | 41.6 |
7 | 54 | 159,345,973 | 8,613,298 | 989 | 885 | 605 | 208 | 90 | 24 | 90 | 76 | 143 | EBI | 59.9 | 47.1 |
8 | 50 | 145,138,636 | 8,221,520 | 677 | 613 | 735 | 214 | 80 | 28 | 86 | 52 | 82 | EBI | 45.6 | 52 |
9 | 48 | 138,394,717 | 6,590,811 | 786 | 661 | 491 | 190 | 69 | 19 | 66 | 51 | 96 | EBI | 49 | 56.3 |
10 | 46 | 133,797,422 | 7,223,944 | 733 | 568 | 579 | 204 | 64 | 32 | 87 | 56 | 89 | EBI | 40.2 | 60.9 |
11 | 46 | 135,086,622 | 7,535,370 | 1298 | 821 | 710 | 233 | 63 | 24 | 74 | 76 | 97 | EBI | 53.7 | 65.4 |
12 | 45 | 133,275,309 | 7,228,129 | 1034 | 617 | 848 | 227 | 72 | 27 | 106 | 62 | 115 | EBI | 35.8 | 70 |
13 | 39 | 114,364,328 | 5,082,574 | 327 | 372 | 397 | 104 | 42 | 16 | 45 | 34 | 75 | EBI | 17.9 | 73.4 |
14 | 36 | 107,043,718 | 4,865,950 | 830 | 523 | 533 | 239 | 92 | 10 | 65 | 97 | 79 | EBI | 17.6 | 76.4 |
15 | 35 | 101,991,189 | 4,515,076 | 613 | 510 | 639 | 250 | 78 | 13 | 63 | 136 | 93 | EBI | 19 | 79.3 |
16 | 31 | 90,338,345 | 5,101,702 | 873 | 465 | 799 | 187 | 52 | 32 | 53 | 58 | 51 | EBI | 36.6 | 82 |
17 | 28 | 83,257,441 | 4,614,972 | 1197 | 531 | 834 | 235 | 61 | 15 | 80 | 71 | 99 | EBI | 24 | 84.8 |
18 | 27 | 80,373,285 | 4,035,966 | 270 | 247 | 453 | 109 | 32 | 13 | 51 | 36 | 41 | EBI | 17.2 | 87.4 |
19 | 20 | 58,617,616 | 3,858,269 | 1472 | 512 | 628 | 179 | 110 | 13 | 29 | 31 | 61 | EBI | 26.5 | 89.3 |
20 | 21 | 64,444,167 | 3,439,621 | 544 | 249 | 384 | 131 | 57 | 15 | 46 | 37 | 68 | EBI | 27.5 | 91.4 |
21 | 16 | 46,709,983 | 2,049,697 | 234 | 185 | 305 | 71 | 16 | 5 | 21 | 19 | 24 | EBI | 13.2 | 92.6 |
22 | 17 | 50,818,468 | 2,135,311 | 488 | 324 | 357 | 78 | 31 | 5 | 23 | 23 | 62 | EBI | 14.7 | 93.8 |
X | 53 | 156,040,895 | 5,753,881 | 842 | 874 | 271 | 258 | 128 | 22 | 85 | 64 | 100 | EBI | 60.6 | 99.1 |
Y | 20 | 57,227,415 | 211,643 | 71 | 388 | 71 | 30 | 15 | 7 | 17 | 3 | 8 | EBI | 10.4 | 100 |
mtDNA | 0.0054 | 16,569 | 929 | 13 | 0 | 0 | 24 | 0 | 2 | 0 | 0 | 0 | EBI | N/A | 100 |
total | 3,088,286,401 | 155,630,645 | 20412 | 14600 | 14727 | 5037 | 1756 | 532 | 1944 | 1521 | 2213 |
Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.
The number of genes in the human genome is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies.
Gencode | Ensemble | Refseq | CHESS | |
---|---|---|---|---|
protein-coding genes | 19,901 | 20,376 | 20,345 | 21,306 |
lncRNA genes | 15,779 | 14,720 | 17,712 | 18,484 |
antisense RNA | 5501 | 28 | 2694 | |
miscellaneous RNA | 2213 | 2222 | 13,899 | 4347 |
Pseudogenes | 14,723 | 1740 | 15,952 | |
total transcripts | 203,835 | 203,903 | 154,484 | 328,827 |
Human genetic variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December 2016. The number of identified variations is expected to increase as further Personal genomics|personal genomes are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser.
Small non-coding RNAs are RNAs of as many as 200 bases that do not have protein-coding potential. These include: microRNAs, or miRNAs (post-transcriptional regulators of gene expression), small nuclear RNAs, or snRNAs (the RNA components of spliceosomes), and small nucleolar RNAs, or snoRNA (involved in guiding chemical modifications to other RNA molecules). Long non-coding RNAs are RNA molecules longer than 200 bases that do not have protein-coding potential. These include: ribosomal RNAs, or rRNAs (the RNA components of ribosomes), and a variety of other long RNAs that are involved in regulation of gene expression, epigenetic modifications of DNA nucleotides and histone proteins, and regulation of the activity of protein-coding genes. Small discrepancies between total-small-ncRNA numbers and the numbers of specific types of small ncNRAs result from the former values being sourced from Ensembl release 87 and the latter from Ensembl release 68.
The entropy rate of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair.