Difference between revisions of "Genomics Terminology"

From Bradwiki
Jump to navigation Jump to search
m
Line 16: Line 16:
  
 
== Molecular organization and gene content ==
 
== Molecular organization and gene content ==
{{see also|Lists of human genes}}
 
  
The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed [[autosome]]s, plus the 23rd pair of [[sex chromosome]]s (XX) in the female, and (XY) in the male.  These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the [[mitochondrial DNA]], a comparatively small circular molecule present in each [[mitochondrion]]. Basic information about these molecules and their gene content, based on a [[reference genome]] that does not represent the sequence of any specific individual, are provided in the following table. (Data source: [http://useast.ensembl.org/Homo_sapiens/Location/Genome?r=Y:1-1000 Ensembl genome browser release 87]{{Dead link|date=January 2020 |bot=InternetArchiveBot |fix-attempted=yes }}, December 2016 for most values;  [http://jul2012.archive.ensembl.org/Homo_sapiens/Location/Chromosome?r=1:1-1000000 Ensembl genome browser release 68], July 2012 for miRNA, rRNA, snRNA, snoRNA.)
+
The total length of the [https://en.wikipedia.org/wiki/Human_genome human genome] is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed [https://en.wikipedia.org/wiki/autosome autosomes], plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male.  These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table.
  
 
{| class="wikitable sortable" style="font-size:75%;"
 
{| class="wikitable sortable" style="font-size:75%;"
 
|-
 
|-
! Chromosome !! Length<br />([[Millimetre|mm]]) !! Base<br />pairs !! Variations !! Protein-<br />coding<br />genes !! Pseudo-<br />genes !! Total<br />long<br />ncRNA || Total<br />small<br />ncRNA || miRNA !! rRNA !! snRNA !! snoRNA !! Misc<br />ncRNA !! Links !! Centromere<br />position<br />([[Mega base pairs|Mbp]]) || Cumulative<br />(%)
+
! Chromosome !! Length<br />(Millimetre|mm) !! Base<br />pairs !! Variations !! Protein-<br />coding<br />genes !! Pseudo-<br />genes !! Total<br />long<br />ncRNA || Total<br />small<br />ncRNA || miRNA !! rRNA !! snRNA !! snoRNA !! Misc<br />ncRNA !! Links !! Centromere<br />position<br />(Mega base pairs|Mbp) || Cumulative<br />(%)
 
|-
 
|-
[[Chromosome 1 (human)|1]] || 85 ||style="text-align: right;"| 248,956,422 ||style="text-align: right;"| 12,151,146 || 2058 || 1220 || 1200 || 496 || 134 || 66 || 221 || 145 || 192 || [https://archive.today/20130414235101/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=1 EBI] || 125 || 7.9
+
|  Chromosome 1 (human)|1 || 85 ||style="text-align: right;"| 248,956,422 ||style="text-align: right;"| 12,151,146 || 2058 || 1220 || 1200 || 496 || 134 || 66 || 221 || 145 || 192 || [https://archive.today/20130414235101/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=1 EBI] || 125 || 7.9
 
|-
 
|-
[[Chromosome 2 (human)|2]] || 83 ||style="text-align: right;"| 242,193,529 ||style="text-align: right;"| 12,945,965 || 1309 || 1023 || 1037 || 375 || 115 || 40 || 161 || 117 || 176 || [https://archive.today/20130414170207/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=2 EBI] || 93.3 || 16.2
+
|  Chromosome 2 (human)|2 || 83 ||style="text-align: right;"| 242,193,529 ||style="text-align: right;"| 12,945,965 || 1309 || 1023 || 1037 || 375 || 115 || 40 || 161 || 117 || 176 || [https://archive.today/20130414170207/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=2 EBI] || 93.3 || 16.2
 
|-
 
|-
[[Chromosome 3 (human)|3]] || 67 ||style="text-align: right;"| 198,295,559 ||style="text-align: right;"| 10,638,715 || 1078 || 763 || 711 || 298 || 99 || 29 || 138 || 87 || 134 || [https://archive.today/20130414155057/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=3 EBI] || 91 || 23
+
|  Chromosome 3 (human)|3 || 67 ||style="text-align: right;"| 198,295,559 ||style="text-align: right;"| 10,638,715 || 1078 || 763 || 711 || 298 || 99 || 29 || 138 || 87 || 134 || [https://archive.today/20130414155057/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=3 EBI] || 91 || 23
 
|-
 
|-
[[Chromosome 4 (human)|4]] || 65 ||style="text-align: right;"| 190,214,555 ||style="text-align: right;"| 10,165,685 || 752 || 727 || 657 || 228 || 92 || 24 || 120 || 56 || 104 || [https://archive.today/20130414184734/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=4 EBI] || 50.4 || 29.6
+
|  Chromosome 4 (human)|4 || 65 ||style="text-align: right;"| 190,214,555 ||style="text-align: right;"| 10,165,685 || 752 || 727 || 657 || 228 || 92 || 24 || 120 || 56 || 104 || [https://archive.today/20130414184734/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=4 EBI] || 50.4 || 29.6
 
|-
 
|-
[[Chromosome 5 (human)|5]] || 62 ||style="text-align: right;"| 181,538,259 ||style="text-align: right;"| 9,519,995 || 876 || 721 || 844 || 235 || 83 || 25 || 106 || 61 || 119 || [https://archive.today/20130414165438/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=5 EBI] || 48.4 || 35.8
+
|  Chromosome 5 (human)|5 || 62 ||style="text-align: right;"| 181,538,259 ||style="text-align: right;"| 9,519,995 || 876 || 721 || 844 || 235 || 83 || 25 || 106 || 61 || 119 || [https://archive.today/20130414165438/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=5 EBI] || 48.4 || 35.8
 
|-
 
|-
[[Chromosome 6 (human)|6]] || 58 ||style="text-align: right;"| 170,805,979 ||style="text-align: right;"| 9,130,476 || 1048 || 801 || 639 || 234 || 81 || 26 || 111 || 73 || 105 || [https://archive.today/20130414210620/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=6 EBI] || 61 || 41.6
+
|  Chromosome 6 (human)|6 || 58 ||style="text-align: right;"| 170,805,979 ||style="text-align: right;"| 9,130,476 || 1048 || 801 || 639 || 234 || 81 || 26 || 111 || 73 || 105 || [https://archive.today/20130414210620/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=6 EBI] || 61 || 41.6
 
|-
 
|-
[[Chromosome 7 (human)|7]] || 54 ||style="text-align: right;"| 159,345,973 ||style="text-align: right;"| 8,613,298 || 989 || 885 || 605 || 208 || 90 || 24 || 90 || 76 || 143 || [https://archive.today/20130414191348/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=7 EBI] || 59.9 || 47.1
+
|  Chromosome 7 (human)|7 || 54 ||style="text-align: right;"| 159,345,973 ||style="text-align: right;"| 8,613,298 || 989 || 885 || 605 || 208 || 90 || 24 || 90 || 76 || 143 || [https://archive.today/20130414191348/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=7 EBI] || 59.9 || 47.1
 
|-
 
|-
[[Chromosome 8 (human)|8]] || 50 ||style="text-align: right;"| 145,138,636 ||style="text-align: right;"| 8,221,520 || 677 || 613 || 735 || 214 || 80 || 28 || 86 || 52 || 82 || [https://archive.today/20130414151536/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=8 EBI] || 45.6 || 52
+
|  Chromosome 8 (human)|8 || 50 ||style="text-align: right;"| 145,138,636 ||style="text-align: right;"| 8,221,520 || 677 || 613 || 735 || 214 || 80 || 28 || 86 || 52 || 82 || [https://archive.today/20130414151536/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=8 EBI] || 45.6 || 52
 
|-
 
|-
[[Chromosome 9 (human)|9]] || 48 ||style="text-align: right;"| 138,394,717 ||style="text-align: right;"| 6,590,811 || 786 || 661 || 491 || 190 || 69 || 19 || 66 || 51 || 96 || [https://archive.today/20130414154313/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=9 EBI] || 49 || 56.3
+
|  Chromosome 9 (human)|9 || 48 ||style="text-align: right;"| 138,394,717 ||style="text-align: right;"| 6,590,811 || 786 || 661 || 491 || 190 || 69 || 19 || 66 || 51 || 96 || [https://archive.today/20130414154313/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=9 EBI] || 49 || 56.3
 
|-
 
|-
[[Chromosome 10 (human)|10]] || 46 ||style="text-align: right;"| 133,797,422 ||style="text-align: right;"| 7,223,944 || 733 || 568 || 579 || 204 || 64 || 32 || 87 || 56 || 89 || [https://archive.today/20130414155104/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=10 EBI] || 40.2 || 60.9
+
|  Chromosome 10 (human)|10 || 46 ||style="text-align: right;"| 133,797,422 ||style="text-align: right;"| 7,223,944 || 733 || 568 || 579 || 204 || 64 || 32 || 87 || 56 || 89 || [https://archive.today/20130414155104/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=10 EBI] || 40.2 || 60.9
 
|-
 
|-
[[Chromosome 11 (human)|11]] || 46 ||style="text-align: right;"| 135,086,622 ||style="text-align: right;"| 7,535,370 || 1298 || 821 || 710 || 233 || 63 || 24 || 74 || 76 || 97 || [https://archive.today/20130414155450/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=11 EBI] || 53.7 || 65.4
+
|  Chromosome 11 (human)|11 || 46 ||style="text-align: right;"| 135,086,622 ||style="text-align: right;"| 7,535,370 || 1298 || 821 || 710 || 233 || 63 || 24 || 74 || 76 || 97 || [https://archive.today/20130414155450/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=11 EBI] || 53.7 || 65.4
 
|-
 
|-
[[Chromosome 12 (human)|12]] || 45 ||style="text-align: right;"| 133,275,309 ||style="text-align: right;"| 7,228,129 || 1034 || 617 || 848 || 227 || 72 || 27 || 106 || 62 || 115 || [https://archive.today/20130414163842/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=12 EBI] || 35.8 || 70
+
|  Chromosome 12 (human)|12 || 45 ||style="text-align: right;"| 133,275,309 ||style="text-align: right;"| 7,228,129 || 1034 || 617 || 848 || 227 || 72 || 27 || 106 || 62 || 115 || [https://archive.today/20130414163842/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=12 EBI] || 35.8 || 70
 
|-
 
|-
[[Chromosome 13 (human)|13]] || 39 ||style="text-align: right;"| 114,364,328 ||style="text-align: right;"| 5,082,574 || 327 || 372 || 397 || 104 || 42 || 16 || 45 || 34 || 75 || [https://archive.today/20130414153908/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=13 EBI] || 17.9 || 73.4
+
|  Chromosome 13 (human)|13 || 39 ||style="text-align: right;"| 114,364,328 ||style="text-align: right;"| 5,082,574 || 327 || 372 || 397 || 104 || 42 || 16 || 45 || 34 || 75 || [https://archive.today/20130414153908/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=13 EBI] || 17.9 || 73.4
 
|-
 
|-
[[Chromosome 14 (human)|14]] || 36 ||style="text-align: right;"| 107,043,718 ||style="text-align: right;"| 4,865,950 || 830 || 523 || 533 || 239 || 92 || 10 || 65 || 97 || 79 || [https://archive.today/20130414221716/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=14 EBI] || 17.6 || 76.4
+
|  Chromosome 14 (human)|14 || 36 ||style="text-align: right;"| 107,043,718 ||style="text-align: right;"| 4,865,950 || 830 || 523 || 533 || 239 || 92 || 10 || 65 || 97 || 79 || [https://archive.today/20130414221716/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=14 EBI] || 17.6 || 76.4
 
|-
 
|-
[[Chromosome 15 (human)|15]] || 35 ||style="text-align: right;"| 101,991,189 ||style="text-align: right;"| 4,515,076 || 613 || 510 || 639 || 250 || 78 || 13 || 63 || 136 || 93 || [https://archive.today/20130414185000/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=15 EBI] || 19 || 79.3
+
|  Chromosome 15 (human)|15 || 35 ||style="text-align: right;"| 101,991,189 ||style="text-align: right;"| 4,515,076 || 613 || 510 || 639 || 250 || 78 || 13 || 63 || 136 || 93 || [https://archive.today/20130414185000/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=15 EBI] || 19 || 79.3
 
|-
 
|-
[[Chromosome 16 (human)|16]] || 31 ||style="text-align: right;"|90,338,345 || style="text-align: right;"|5,101,702 || 873 || 465 || 799 || 187 || 52 || 32 || 53 || 58 || 51 || [https://archive.today/20130414182905/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=16 EBI] || 36.6 || 82
+
|  Chromosome 16 (human)|16 || 31 ||style="text-align: right;"|90,338,345 || style="text-align: right;"|5,101,702 || 873 || 465 || 799 || 187 || 52 || 32 || 53 || 58 || 51 || [https://archive.today/20130414182905/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=16 EBI] || 36.6 || 82
 
|-
 
|-
[[Chromosome 17 (human)|17]] || 28 ||style="text-align: right;"| 83,257,441 ||style="text-align: right;"| 4,614,972 || 1197 || 531 || 834 || 235 || 61 || 15 || 80 || 71 || 99 || [https://archive.today/20130414171249/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=17 EBI] || 24 || 84.8
+
|  Chromosome 17 (human)|17 || 28 ||style="text-align: right;"| 83,257,441 ||style="text-align: right;"| 4,614,972 || 1197 || 531 || 834 || 235 || 61 || 15 || 80 || 71 || 99 || [https://archive.today/20130414171249/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=17 EBI] || 24 || 84.8
 
|-
 
|-
[[Chromosome 18 (human)|18]] || 27 ||style="text-align: right;"| 80,373,285 ||style="text-align: right;"| 4,035,966 || 270 || 247 || 453 || 109 || 32 || 13 || 51 || 36 || 41 || [https://archive.today/20130414160719/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=18 EBI] || 17.2 || 87.4
+
|  Chromosome 18 (human)|18 || 27 ||style="text-align: right;"| 80,373,285 ||style="text-align: right;"| 4,035,966 || 270 || 247 || 453 || 109 || 32 || 13 || 51 || 36 || 41 || [https://archive.today/20130414160719/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=18 EBI] || 17.2 || 87.4
 
|-
 
|-
[[Chromosome 19 (human)|19]] || 20 ||style="text-align: right;"| 58,617,616 ||style="text-align: right;"| 3,858,269 || 1472 || 512 || 628 || 179 || 110 || 13 || 29 || 31 || 61 || [https://archive.today/20130414165626/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=19 EBI] || 26.5 || 89.3
+
|  Chromosome 19 (human)|19 || 20 ||style="text-align: right;"| 58,617,616 ||style="text-align: right;"| 3,858,269 || 1472 || 512 || 628 || 179 || 110 || 13 || 29 || 31 || 61 || [https://archive.today/20130414165626/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=19 EBI] || 26.5 || 89.3
 
|-
 
|-
[[Chromosome 20 (human)|20]] || 21 ||style="text-align: right;"| 64,444,167 ||style="text-align: right;"| 3,439,621 || 544 || 249 || 384 || 131 || 57 || 15 || 46 || 37 || 68 || [https://archive.today/20130414185621/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=20 EBI] || 27.5 || 91.4
+
|  Chromosome 20 (human)|20 || 21 ||style="text-align: right;"| 64,444,167 ||style="text-align: right;"| 3,439,621 || 544 || 249 || 384 || 131 || 57 || 15 || 46 || 37 || 68 || [https://archive.today/20130414185621/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=20 EBI] || 27.5 || 91.4
 
|-
 
|-
[[Chromosome 21 (human)|21]] || 16 ||style="text-align: right;"| 46,709,983 ||style="text-align: right;"| 2,049,697 || 234 || 185 || 305 || 71 || 16 || 5 || 21 || 19 || 24 || [https://archive.today/20130414191700/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=21 EBI] || 13.2 || 92.6
+
|  Chromosome 21 (human)|21 || 16 ||style="text-align: right;"| 46,709,983 ||style="text-align: right;"| 2,049,697 || 234 || 185 || 305 || 71 || 16 || 5 || 21 || 19 || 24 || [https://archive.today/20130414191700/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=21 EBI] || 13.2 || 92.6
 
|-
 
|-
[[Chromosome 22 (human)|22]] || 17 ||style="text-align: right;"| 50,818,468 ||style="text-align: right;"| 2,135,311 || 488 || 324 || 357 || 78 || 31 || 5 || 23 || 23 || 62 || [https://archive.today/20130414213655/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=22 EBI] || 14.7 || 93.8
+
|  Chromosome 22 (human)|22 || 17 ||style="text-align: right;"| 50,818,468 ||style="text-align: right;"| 2,135,311 || 488 || 324 || 357 || 78 || 31 || 5 || 23 || 23 || 62 || [https://archive.today/20130414213655/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=22 EBI] || 14.7 || 93.8
 
|-
 
|-
[[X chromosome|X]] || 53 ||style="text-align: right;"| 156,040,895 ||style="text-align: right;"| 5,753,881 || 842 || 874 || 271 || 258 || 128 || 22 || 85 || 64 || 100 || [https://archive.today/20130414192751/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=X EBI] || 60.6 || 99.1
+
|  X chromosome|X || 53 ||style="text-align: right;"| 156,040,895 ||style="text-align: right;"| 5,753,881 || 842 || 874 || 271 || 258 || 128 || 22 || 85 || 64 || 100 || [https://archive.today/20130414192751/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=X EBI] || 60.6 || 99.1
 
|-
 
|-
[[Y chromosome|Y]] || 20 ||style="text-align: right;"| 57,227,415 ||style="text-align: right;"| 211,643 || 71 || 388 || 71 || 30 || 15 || 7 || 17 || 3 || 8 || [https://archive.today/20130414161928/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=Y EBI] || 10.4 || 100
+
|  Y chromosome|Y || 20 ||style="text-align: right;"| 57,227,415 ||style="text-align: right;"| 211,643 || 71 || 388 || 71 || 30 || 15 || 7 || 17 || 3 || 8 || [https://archive.today/20130414161928/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=Y EBI] || 10.4 || 100
 
|-
 
|-
[[Mitochondrial DNA|mtDNA]] || 0.0054 ||style="text-align: right;"| 16,569 ||style="text-align: right;"| 929 || 13 || 0 || 0 || 24 || 0 || 2 || 0 || 0 || 0 || [https://archive.today/20130414220526/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=MT EBI] || N/A || 100
+
|  Mitochondrial DNA|mtDNA || 0.0054 ||style="text-align: right;"| 16,569 ||style="text-align: right;"| 929 || 13 || 0 || 0 || 24 || 0 || 2 || 0 || 0 || 0 || [https://archive.today/20130414220526/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=MT EBI] || N/A || 100
 
|-
 
|-
  
Line 78: Line 77:
 
|}
 
|}
  
'''Table 1''' (above) summarizes the physical organization and gene content of the human [[reference genome]], with links to the original analysis, as published in the [[Ensembl]] database at the [[European Bioinformatics Institute]] (EBI) and [[Wellcome Trust Sanger Institute]]. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the [[DNA#Properties|DNA double helix]]. A recent estimation of human chromosome lengths based on updated data reports 205.00&nbsp;cm for the diploid male genome and 208.23&nbsp;cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively.<ref>{{cite journal | vauthors = Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L | title = On the length, weight and GC content of the human genome | journal = BMC Research Notes | volume = 12 | issue = 1 | pages = 106 | date = February 2019 | pmid = 30813969 | pmc = 6391780 | doi = 10.1186/s13104-019-4137-z }}</ref> The number of proteins is based on the number of initial [[precursor mRNA]] transcripts, and does not include products of [[Alternative splicing|alternative pre-mRNA splicing]], or modifications to protein structure that occur after [[Translation (biology)|translation]].
+
'''Table 1''' (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00&nbsp;cm for the diploid male genome and 208.23&nbsp;cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.
 +
 
 +
'''The number of genes''' in the human genome is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies.
  
'''The number of genes''' in the human genome is not entirely clear because the function of numerous [[Transcription (biology)|transcripts]] remains unclear. This is especially true for [[non-coding RNA]] (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short [[open reading frame]]s. Table 2 gives estimates from various projects and shows these discrepancies.
 
 
{| class="wikitable"
 
{| class="wikitable"
|+Table 2. Number of human genes in different databases as of July 2018<ref>{{cite journal | vauthors = Salzberg SL | title = Open questions: How many genes do we have? | journal = BMC Biology | volume = 16 | issue = 1 | pages = 94 | date = August 2018 | pmid = 30124169 | pmc = 6100717 | doi = 10.1186/s12915-018-0564-x }}</ref>
+
|+Table 2. Number of human genes in different databases as of July 2018
 
!
 
!
!Gencode<ref>{{Cite web|url=http://www.gencodegenes.org/stats/current.html|title=Gencode statistics, version 28|access-date=12 July 2018|archive-url=https://web.archive.org/web/20180302114250/http://www.gencodegenes.org/stats/current.html|archive-date=2 March 2018|url-status=dead}}</ref>
+
!Gencode
!Ensemble<ref>{{Cite web|url=http://ensembl.org/Homo_sapiens/Info/Annotation|title=Ensemble statistics for version 92.38, corresponding to Gencode v28 |access-date=12 July 2018}}</ref>
+
!Ensemble
!Refseq<ref>{{Cite web|url=http://www.ncbi.nlm.nih.gov/genome/annotation_euk/Homo_sapiens/108/|title=NCBI Homo sapiens Annotation Release 108 |access-date=|date = 2016|publisher = NIH}}</ref>
+
!Refseq
!CHESS<ref>{{Cite web|url=http://ccb.jhu.edu/chess|title=CHESS statistics, version 2.0 |access-date=|publisher = Johns Hopkins University|website = Center for Computational Biology}}</ref>
+
!CHESS
 
|-
 
|-
 
|protein-coding genes
 
|protein-coding genes
Line 125: Line 125:
 
|328,827
 
|328,827
 
|}
 
|}
[[Human genetic variation|Variations]] are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December 2016. The number of identified variations is expected to increase as further [[Personal genomics|personal genomes]] are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser.
 
  
Small [[non-coding RNA]]s are RNAs of as many as 200 bases that do not have protein-coding potential. These include: [[microRNA]]s, or miRNAs (post-transcriptional regulators of gene expression), [[small nuclear RNA]]s, or snRNAs (the RNA components of [[spliceosome]]s), and [[small nucleolar RNA]]s, or snoRNA (involved in guiding chemical modifications to other RNA molecules). [[Long non-coding RNA]]s are RNA molecules longer than 200 bases that do not have protein-coding potential. These include: [[ribosomal RNA]]s, or rRNAs (the RNA components of [[ribosome]]s), and a variety of other long RNAs that are involved in [[regulation of gene expression]], [[epigenetic]] modifications of DNA nucleotides and [[histone]] proteins, and regulation of the activity of protein-coding genes. Small discrepancies between total-small-ncRNA numbers and the numbers of specific types of small ncNRAs result from the former values being sourced from Ensembl release 87 and the latter from Ensembl release 68.
 
  
===Information content===
+
Human genetic variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December 2016. The number of identified variations is expected to increase as further Personal genomics|personal genomes are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser.
  
[[File:Genes and base pairs on chromosomes.svg|thumb|right|400px|Diagram showing the number of base pairs on each chromosome in green.]]
+
Small non-coding RNAs are RNAs of as many as 200 bases that do not have protein-coding potential. These include: microRNAs, or miRNAs (post-transcriptional regulators of gene expression), small nuclear RNAs, or snRNAs (the RNA components of spliceosomes), and small nucleolar RNAs, or snoRNA (involved in guiding chemical modifications to other RNA molecules). Long non-coding RNAs are RNA molecules longer than 200 bases that do not have protein-coding potential. These include: ribosomal RNAs, or rRNAs (the RNA components of ribosomes), and a variety of other long RNAs that are involved in regulation of gene expression, epigenetic modifications of DNA nucleotides and histone proteins, and regulation of the activity of protein-coding genes. Small discrepancies between total-small-ncRNA numbers and the numbers of specific types of small ncNRAs result from the former values being sourced from Ensembl release 87 and the latter from Ensembl release 68.
The [[haploid]] human genome (23 [[chromosomes]]) is about 3 billion base pairs long and contains around 30,000 genes.<ref>{{Cite web|url=https://www.genome.gov/11006943/human-genome-project-completion-frequently-asked-questions/|title=Human Genome Project Completion: Frequently Asked Questions|website=National Human Genome Research Institute (NHGRI)|language=en-US|access-date=2019-02-02}}</ref> Since every base pair can be coded by 2 bits, this is about 750 [[megabyte]]s of data. An individual somatic ([[diploid]]) cell contains twice this amount, that is, about 6 billion base pairs. Men have fewer than women because the Y chromosome is about 57 million base pairs whereas the X is about 156 million. Since individual genomes vary in sequence by less than 1% from each other, the variations of a given human's genome from a common reference can be [[Lossless data compression|losslessly compressed]] to roughly 4 megabytes.<ref name="Christley">{{cite journal | vauthors = Christley S, Lu Y, Li C, Xie X | title = Human genomes as email attachments | journal = Bioinformatics | volume = 25 | issue = 2 | pages = 274–5 | date = January 2009 | pmid = 18996942 | doi = 10.1093/bioinformatics/btn582 | doi-access = free }}</ref>
 
  
The [[entropy rate]] of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair.<ref name="Liu">{{cite journal | doi = 10.1186/1471-2164-9-509 | volume=9 | title=Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples | year=2008 | journal=BMC Genomics | page=509 | author=Liu Z| pmid=18973670 | pmc=2628393 }}, fig. 6, using the [[Lempel-Ziv]] estimators of entropy rate.</ref>
+
The entropy rate of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair.
{{clear right}}
 

Revision as of 22:17, 28 October 2020


proband

A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first affected individual in a family who brings a genetic disorder to the attention of the medical community.


trio analysis

A trio refers to 2 parents + 1 offspring (2 + 1 = 3, hence trio). In medical genetics, trio analysis often means the analysis of a proband's genome and along with their parents genome. An exome trio-based approach is fundamental to the identification of heterozygous dominant pathogenic variants (in an afflicted proband and their unaffected parents).



Molecular organization and gene content

The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed autosomes, plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table.

Chromosome mm) Base
pairs
Variations Protein-
coding
genes
Pseudo-
genes
Total
long
ncRNA
Total
small
ncRNA
miRNA rRNA snRNA snoRNA Misc
ncRNA
Links Mbp) Cumulative
(%)
1 85 248,956,422 12,151,146 2058 1220 1200 496 134 66 221 145 192 EBI 125 7.9
2 83 242,193,529 12,945,965 1309 1023 1037 375 115 40 161 117 176 EBI 93.3 16.2
3 67 198,295,559 10,638,715 1078 763 711 298 99 29 138 87 134 EBI 91 23
4 65 190,214,555 10,165,685 752 727 657 228 92 24 120 56 104 EBI 50.4 29.6
5 62 181,538,259 9,519,995 876 721 844 235 83 25 106 61 119 EBI 48.4 35.8
6 58 170,805,979 9,130,476 1048 801 639 234 81 26 111 73 105 EBI 61 41.6
7 54 159,345,973 8,613,298 989 885 605 208 90 24 90 76 143 EBI 59.9 47.1
8 50 145,138,636 8,221,520 677 613 735 214 80 28 86 52 82 EBI 45.6 52
9 48 138,394,717 6,590,811 786 661 491 190 69 19 66 51 96 EBI 49 56.3
10 46 133,797,422 7,223,944 733 568 579 204 64 32 87 56 89 EBI 40.2 60.9
11 46 135,086,622 7,535,370 1298 821 710 233 63 24 74 76 97 EBI 53.7 65.4
12 45 133,275,309 7,228,129 1034 617 848 227 72 27 106 62 115 EBI 35.8 70
13 39 114,364,328 5,082,574 327 372 397 104 42 16 45 34 75 EBI 17.9 73.4
14 36 107,043,718 4,865,950 830 523 533 239 92 10 65 97 79 EBI 17.6 76.4
15 35 101,991,189 4,515,076 613 510 639 250 78 13 63 136 93 EBI 19 79.3
16 31 90,338,345 5,101,702 873 465 799 187 52 32 53 58 51 EBI 36.6 82
17 28 83,257,441 4,614,972 1197 531 834 235 61 15 80 71 99 EBI 24 84.8
18 27 80,373,285 4,035,966 270 247 453 109 32 13 51 36 41 EBI 17.2 87.4
19 20 58,617,616 3,858,269 1472 512 628 179 110 13 29 31 61 EBI 26.5 89.3
20 21 64,444,167 3,439,621 544 249 384 131 57 15 46 37 68 EBI 27.5 91.4
21 16 46,709,983 2,049,697 234 185 305 71 16 5 21 19 24 EBI 13.2 92.6
22 17 50,818,468 2,135,311 488 324 357 78 31 5 23 23 62 EBI 14.7 93.8
X 53 156,040,895 5,753,881 842 874 271 258 128 22 85 64 100 EBI 60.6 99.1
Y 20 57,227,415 211,643 71 388 71 30 15 7 17 3 8 EBI 10.4 100
mtDNA 0.0054 16,569 929 13 0 0 24 0 2 0 0 0 EBI N/A 100
total 3,088,286,401 155,630,645 20412 14600 14727 5037 1756 532 1944 1521 2213

Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.

The number of genes in the human genome is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies.

Table 2. Number of human genes in different databases as of July 2018
Gencode Ensemble Refseq CHESS
protein-coding genes 19,901 20,376 20,345 21,306
lncRNA genes 15,779 14,720 17,712 18,484
antisense RNA 5501 28 2694
miscellaneous RNA 2213 2222 13,899 4347
Pseudogenes 14,723 1740 15,952
total transcripts 203,835 203,903 154,484 328,827


Human genetic variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December 2016. The number of identified variations is expected to increase as further Personal genomics|personal genomes are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser.

Small non-coding RNAs are RNAs of as many as 200 bases that do not have protein-coding potential. These include: microRNAs, or miRNAs (post-transcriptional regulators of gene expression), small nuclear RNAs, or snRNAs (the RNA components of spliceosomes), and small nucleolar RNAs, or snoRNA (involved in guiding chemical modifications to other RNA molecules). Long non-coding RNAs are RNA molecules longer than 200 bases that do not have protein-coding potential. These include: ribosomal RNAs, or rRNAs (the RNA components of ribosomes), and a variety of other long RNAs that are involved in regulation of gene expression, epigenetic modifications of DNA nucleotides and histone proteins, and regulation of the activity of protein-coding genes. Small discrepancies between total-small-ncRNA numbers and the numbers of specific types of small ncNRAs result from the former values being sourced from Ensembl release 87 and the latter from Ensembl release 68.

The entropy rate of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair.