Genomics Terminology: Difference between revisions
Bradley Monk (talk | contribs) (Created page with " === proband === A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first af...") |
Bradley Monk (talk | contribs) mNo edit summary |
||
(11 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== gnomad == | |||
The '''G'''enome '''A'''ggregation '''D'''atabase ('''gnomAD''' or '''gnomad''') is a resource developed with the goal of aggregating both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Version 2 (v2) of the gnomad dataset (GRCh37) spans 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. Version 3 (v3) data set (GRCh38) spans 71,702 genomes. All data is released without restriction on use. | |||
* [https://gnomad.broadinstitute.org '''gnomad''' (Broad Institute) homepage] | |||
* [https://console.cloud.google.com/marketplace/product/broad-institute/gnomad gnomad on Google BigQuery] | |||
== proband == | |||
A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first affected individual in a family who brings a genetic disorder to the attention of the medical community. | A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first affected individual in a family who brings a genetic disorder to the attention of the medical community. | ||
== trio analysis == | |||
A trio refers to 2 parents + 1 offspring (2 + 1 = 3, hence trio). In medical genetics, trio analysis often means the analysis of a proband's genome and along with their parents genome. An exome trio-based approach is fundamental to the identification of heterozygous dominant pathogenic variants (in an afflicted proband and their unaffected parents). | |||
== Hail == | |||
[https://hail.is/docs/0.2/index.html Hail] is an open-source library for scalable data exploration and analysis, with a particular emphasis on genomics. See the overview for a high-level walkthrough of the library, the GWAS tutorial for a simple example of conducting a genome-wide association study, and the installation page to get started using Hail. | |||
* [https://hail.is/docs/0.2/index.html Hail homepage] | |||
* [https://github.com/danking/hail-cloud-docs/blob/master/how-to-cloud.md Hail - ''how to cloud''] | |||
== GATK == | |||
The '''G'''enome '''A'''nalysis '''T'''ool'''K'''it (GATK) is a genomic analysis toolkit focused on variant discovery. GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope includes somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit. | |||
These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy. | |||
* [https://gatk.broadinstitute.org/hc/en-us GATK homepage] | |||
* [https://gatk.broadinstitute.org/hc/en-us/community/topics GATK community topics] | |||
== human genome == | |||
The total length of the [https://en.wikipedia.org/wiki/Human_genome human genome] is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed [https://en.wikipedia.org/wiki/autosome autosomes], plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table. | |||
{| class="wikitable sortable" style="font-size:75%;" | |||
|- | |||
! Chromosome !! Length<br />(mm) !! Base<br />pairs !! Variations !! Protein<br />coding<br />genes !! Pseudo-<br />genes !! long<br />ncRNA || small<br />ncRNA || miRNA !! rRNA !! snRNA !! snoRNA !! gnomAD<br />exome.vcf !! Links !! Centromere<br />pos (Mbp) || Cumulative<br />(%) | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_1 1] || 85 ||style="text-align: right;"| 248,956,422 ||style="text-align: right;"| 12,151,146 || 2058 || 1220 || 1200 || 496 || 134 || 66 || 221 || 145 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.1.vcf.bgz 5.77 GiB]|| [https://archive.today/20130414235101/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=1 EBI] || 125 || 7.9 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_2 2] || 83 ||style="text-align: right;"| 242,193,529 ||style="text-align: right;"| 12,945,965 || 1309 || 1023 || 1037 || 375 || 115 || 40 || 161 || 117 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.2.vcf.bgz 4.20 GiB] || [https://archive.today/20130414170207/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=2 EBI] || 93.3 || 16.2 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_3 3] || 67 ||style="text-align: right;"| 198,295,559 ||style="text-align: right;"| 10,638,715 || 1078 || 763 || 711 || 298 || 99 || 29 || 138 || 87 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.3.vcf.bgz 3.29 GiB] || [https://archive.today/20130414155057/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=3 EBI] || 91 || 23 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_4 4] || 65 ||style="text-align: right;"| 190,214,555 ||style="text-align: right;"| 10,165,685 || 752 || 727 || 657 || 228 || 92 || 24 || 120 || 56 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.4.vcf.bgz 2.17 GiB] || [https://archive.today/20130414184734/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=4 EBI] || 50.4 || 29.6 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_5 5] || 62 ||style="text-align: right;"| 181,538,259 ||style="text-align: right;"| 9,519,995 || 876 || 721 || 844 || 235 || 83 || 25 || 106 || 61 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.5.vcf.bgz 2.51 GiB] || [https://archive.today/20130414165438/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=5 EBI] || 48.4 || 35.8 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_6 6] || 58 ||style="text-align: right;"| 170,805,979 ||style="text-align: right;"| 9,130,476 || 1048 || 801 || 639 || 234 || 81 || 26 || 111 || 73 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.6.vcf.bgz 2.83 GiB] || [https://archive.today/20130414210620/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=6 EBI] || 61 || 41.6 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_7 7] || 54 ||style="text-align: right;"| 159,345,973 ||style="text-align: right;"| 8,613,298 || 989 || 885 || 605 || 208 || 90 || 24 || 90 || 76 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.7.vcf.bgz 2.88 GiB] || [https://archive.today/20130414191348/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=7 EBI] || 59.9 || 47.1 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_8 8] || 50 ||style="text-align: right;"| 145,138,636 ||style="text-align: right;"| 8,221,520 || 677 || 613 || 735 || 214 || 80 || 28 || 86 || 52 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.8.vcf.bgz 2.13 GiB] || [https://archive.today/20130414151536/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=8 EBI] || 45.6 || 52 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_9 9] || 48 ||style="text-align: right;"| 138,394,717 ||style="text-align: right;"| 6,590,811 || 786 || 661 || 491 || 190 || 69 || 19 || 66 || 51 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.9.vcf.bgz 2.40 GiB] || [https://archive.today/20130414154313/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=9 EBI] || 49 || 56.3 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_10 10] || 46 ||style="text-align: right;"| 133,797,422 ||style="text-align: right;"| 7,223,944 || 733 || 568 || 579 || 204 || 64 || 32 || 87 || 56 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.10.vcf.bgz 2.23 GiB] || [https://archive.today/20130414155104/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=10 EBI] || 40.2 || 60.9 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_11 11] || 46 ||style="text-align: right;"| 135,086,622 ||style="text-align: right;"| 7,535,370 || 1298 || 821 || 710 || 233 || 63 || 24 || 74 || 76 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.11.vcf.bgz 3.61 GiB] || [https://archive.today/20130414155450/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=11 EBI] || 53.7 || 65.4 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_12 12] || 45 ||style="text-align: right;"| 133,275,309 ||style="text-align: right;"| 7,228,129 || 1034 || 617 || 848 || 227 || 72 || 27 || 106 || 62 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.12.vcf.bgz 3.07 GiB] || [https://archive.today/20130414163842/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=12 EBI] || 35.8 || 70 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_13 13] || 39 ||style="text-align: right;"| 114,364,328 ||style="text-align: right;"| 5,082,574 || 327 || 372 || 397 || 104 || 42 || 16 || 45 || 34 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.13.vcf.bgz 0.98 GiB] || [https://archive.today/20130414153908/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=13 EBI] || 17.9 || 73.4 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_14 14] || 36 ||style="text-align: right;"| 107,043,718 ||style="text-align: right;"| 4,865,950 || 830 || 523 || 533 || 239 || 92 || 10 || 65 || 97 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.14.vcf.bgz 2.02 GiB] || [https://archive.today/20130414221716/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=14 EBI] || 17.6 || 76.4 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_15 15] || 35 ||style="text-align: right;"| 101,991,189 ||style="text-align: right;"| 4,515,076 || 613 || 510 || 639 || 250 || 78 || 13 || 63 || 136 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.15.vcf.bgz 2.08 GiB] || [https://archive.today/20130414185000/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=15 EBI] || 19 || 79.3 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_16 16] || 31 ||style="text-align: right;"|90,338,345 || style="text-align: right;"|5,101,702 || 873 || 465 || 799 || 187 || 52 || 32 || 53 || 58 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.16.vcf.bgz 3.04 GiB] || [https://archive.today/20130414182905/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=16 EBI] || 36.6 || 82 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_17 17] || 28 ||style="text-align: right;"| 83,257,441 ||style="text-align: right;"| 4,614,972 || 1197 || 531 || 834 || 235 || 61 || 15 || 80 || 71 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.17.vcf.bgz 3.62 GiB] || [https://archive.today/20130414171249/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=17 EBI] || 24 || 84.8 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_18 18] || 27 ||style="text-align: right;"| 80,373,285 ||style="text-align: right;"| 4,035,966 || 270 || 247 || 453 || 109 || 32 || 13 || 51 || 36 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.18.vcf.bgz 0.88 GiB] || [https://archive.today/20130414160719/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=18 EBI] || 17.2 || 87.4 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_19 19] || 20 ||style="text-align: right;"| 58,617,616 ||style="text-align: right;"| 3,858,269 || 1472 || 512 || 628 || 179 || 110 || 13 || 29 || 31 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.19.vcf.bgz 4.30 GiB] || [https://archive.today/20130414165626/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=19 EBI] || 26.5 || 89.3 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_20 20] || 21 ||style="text-align: right;"| 64,444,167 ||style="text-align: right;"| 3,439,621 || 544 || 249 || 384 || 131 || 57 || 15 || 46 || 37 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.20.vcf.bgz 1.44 GiB] || [https://archive.today/20130414185621/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=20 EBI] || 27.5 || 91.4 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_21 21] || 16 ||style="text-align: right;"| 46,709,983 ||style="text-align: right;"| 2,049,697 || 234 || 185 || 305 || 71 || 16 || 5 || 21 || 19 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.21.vcf.bgz 0.65 GiB] || [https://archive.today/20130414191700/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=21 EBI] || 13.2 || 92.6 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Chromosome_22 22] || 17 ||style="text-align: right;"| 50,818,468 ||style="text-align: right;"| 2,135,311 || 488 || 324 || 357 || 78 || 31 || 5 || 23 || 23 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.22.vcf.bgz 1.43 GiB] || [https://archive.today/20130414213655/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=22 EBI] || 14.7 || 93.8 | |||
|- | |||
| [https://en.wikipedia.org/wiki/X_chromosome X] || 53 ||style="text-align: right;"| 156,040,895 ||style="text-align: right;"| 5,753,881 || 842 || 874 || 271 || 258 || 128 || 22 || 85 || 64 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.X.vcf.bgz 1.33 GiB] || [https://archive.today/20130414192751/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=X EBI] || 60.6 || 99.1 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Y_chromosome Y] || 20 ||style="text-align: right;"| 57,227,415 ||style="text-align: right;"| 211,643 || 71 || 388 || 71 || 30 || 15 || 7 || 17 || 3 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.Y.vcf.bgz 15.66 GiB] || [https://archive.today/20130414161928/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=Y EBI] || 10.4 || 100 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Category:Human_mitochondrial_genes mtDNA] || 0.0054 ||style="text-align: right;"| 16,569 ||style="text-align: right;"| 929 || 13 || 0 || 0 || 24 || 0 || 2 || 0 || 0 || NA || [https://archive.today/20130414220526/http://useast.ensembl.org/Homo_sapiens/Location/Chromosome?r=MT EBI] || N/A || 100 | |||
|- | |||
| [https://en.wikipedia.org/wiki/Category:Genes_by_human_chromosome total] || ||style="text-align: right;"| 3,088,286,401 ||style="text-align: right;"| 155,630,645 || 20412 || 14600 || 14727 || 5037 || 1756 || 532 || 1944 || 1521 || [https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.vcf.bgz 58.81 GiB] || || || | |||
|} | |||
'''Table 1''' (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation. | |||
== human genes (count) == | |||
'''The number of genes''' in the human genome (see: [https://en.wikipedia.org/wiki/Category:Genes_by_human_chromosome '''full gene list''']) is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies. | |||
{| class="wikitable" | |||
|+Table 2. Number of human genes<br />(according to different databases) | |||
! | |||
!Gencode | |||
!Ensemble | |||
!Refseq | |||
!CHESS | |||
|- | |||
|protein-coding genes | |||
|19,901 | |||
|20,376 | |||
|20,345 | |||
|21,306 | |||
|- | |||
|lncRNA genes | |||
|15,779 | |||
|14,720 | |||
|17,712 | |||
|18,484 | |||
|- | |||
|antisense RNA | |||
|5501 | |||
| | |||
|28 | |||
|2694 | |||
|- | |||
|miscellaneous RNA | |||
|2213 | |||
|2222 | |||
|13,899 | |||
|4347 | |||
|- | |||
|Pseudogenes | |||
|14,723 | |||
|1740 | |||
|15,952 | |||
| | |||
|- | |||
|total transcripts | |||
|203,835 | |||
|203,903 | |||
|154,484 | |||
|328,827 | |||
|} |
Latest revision as of 00:22, 29 October 2020
gnomad
The Genome Aggregation Database (gnomAD or gnomad) is a resource developed with the goal of aggregating both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Version 2 (v2) of the gnomad dataset (GRCh37) spans 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals sequenced as part of various disease-specific and population genetic studies. Version 3 (v3) data set (GRCh38) spans 71,702 genomes. All data is released without restriction on use.
proband
A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first affected individual in a family who brings a genetic disorder to the attention of the medical community.
trio analysis
A trio refers to 2 parents + 1 offspring (2 + 1 = 3, hence trio). In medical genetics, trio analysis often means the analysis of a proband's genome and along with their parents genome. An exome trio-based approach is fundamental to the identification of heterozygous dominant pathogenic variants (in an afflicted proband and their unaffected parents).
Hail
Hail is an open-source library for scalable data exploration and analysis, with a particular emphasis on genomics. See the overview for a high-level walkthrough of the library, the GWAS tutorial for a simple example of conducting a genome-wide association study, and the installation page to get started using Hail.
GATK
The Genome Analysis ToolKit (GATK) is a genomic analysis toolkit focused on variant discovery. GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope includes somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit.
These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.
human genome
The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed autosomes, plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table.
Chromosome | Length (mm) |
Base pairs |
Variations | Protein coding genes |
Pseudo- genes |
long ncRNA |
small ncRNA |
miRNA | rRNA | snRNA | snoRNA | gnomAD exome.vcf |
Links | Centromere pos (Mbp) |
Cumulative (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 85 | 248,956,422 | 12,151,146 | 2058 | 1220 | 1200 | 496 | 134 | 66 | 221 | 145 | 5.77 GiB | EBI | 125 | 7.9 |
2 | 83 | 242,193,529 | 12,945,965 | 1309 | 1023 | 1037 | 375 | 115 | 40 | 161 | 117 | 4.20 GiB | EBI | 93.3 | 16.2 |
3 | 67 | 198,295,559 | 10,638,715 | 1078 | 763 | 711 | 298 | 99 | 29 | 138 | 87 | 3.29 GiB | EBI | 91 | 23 |
4 | 65 | 190,214,555 | 10,165,685 | 752 | 727 | 657 | 228 | 92 | 24 | 120 | 56 | 2.17 GiB | EBI | 50.4 | 29.6 |
5 | 62 | 181,538,259 | 9,519,995 | 876 | 721 | 844 | 235 | 83 | 25 | 106 | 61 | 2.51 GiB | EBI | 48.4 | 35.8 |
6 | 58 | 170,805,979 | 9,130,476 | 1048 | 801 | 639 | 234 | 81 | 26 | 111 | 73 | 2.83 GiB | EBI | 61 | 41.6 |
7 | 54 | 159,345,973 | 8,613,298 | 989 | 885 | 605 | 208 | 90 | 24 | 90 | 76 | 2.88 GiB | EBI | 59.9 | 47.1 |
8 | 50 | 145,138,636 | 8,221,520 | 677 | 613 | 735 | 214 | 80 | 28 | 86 | 52 | 2.13 GiB | EBI | 45.6 | 52 |
9 | 48 | 138,394,717 | 6,590,811 | 786 | 661 | 491 | 190 | 69 | 19 | 66 | 51 | 2.40 GiB | EBI | 49 | 56.3 |
10 | 46 | 133,797,422 | 7,223,944 | 733 | 568 | 579 | 204 | 64 | 32 | 87 | 56 | 2.23 GiB | EBI | 40.2 | 60.9 |
11 | 46 | 135,086,622 | 7,535,370 | 1298 | 821 | 710 | 233 | 63 | 24 | 74 | 76 | 3.61 GiB | EBI | 53.7 | 65.4 |
12 | 45 | 133,275,309 | 7,228,129 | 1034 | 617 | 848 | 227 | 72 | 27 | 106 | 62 | 3.07 GiB | EBI | 35.8 | 70 |
13 | 39 | 114,364,328 | 5,082,574 | 327 | 372 | 397 | 104 | 42 | 16 | 45 | 34 | 0.98 GiB | EBI | 17.9 | 73.4 |
14 | 36 | 107,043,718 | 4,865,950 | 830 | 523 | 533 | 239 | 92 | 10 | 65 | 97 | 2.02 GiB | EBI | 17.6 | 76.4 |
15 | 35 | 101,991,189 | 4,515,076 | 613 | 510 | 639 | 250 | 78 | 13 | 63 | 136 | 2.08 GiB | EBI | 19 | 79.3 |
16 | 31 | 90,338,345 | 5,101,702 | 873 | 465 | 799 | 187 | 52 | 32 | 53 | 58 | 3.04 GiB | EBI | 36.6 | 82 |
17 | 28 | 83,257,441 | 4,614,972 | 1197 | 531 | 834 | 235 | 61 | 15 | 80 | 71 | 3.62 GiB | EBI | 24 | 84.8 |
18 | 27 | 80,373,285 | 4,035,966 | 270 | 247 | 453 | 109 | 32 | 13 | 51 | 36 | 0.88 GiB | EBI | 17.2 | 87.4 |
19 | 20 | 58,617,616 | 3,858,269 | 1472 | 512 | 628 | 179 | 110 | 13 | 29 | 31 | 4.30 GiB | EBI | 26.5 | 89.3 |
20 | 21 | 64,444,167 | 3,439,621 | 544 | 249 | 384 | 131 | 57 | 15 | 46 | 37 | 1.44 GiB | EBI | 27.5 | 91.4 |
21 | 16 | 46,709,983 | 2,049,697 | 234 | 185 | 305 | 71 | 16 | 5 | 21 | 19 | 0.65 GiB | EBI | 13.2 | 92.6 |
22 | 17 | 50,818,468 | 2,135,311 | 488 | 324 | 357 | 78 | 31 | 5 | 23 | 23 | 1.43 GiB | EBI | 14.7 | 93.8 |
X | 53 | 156,040,895 | 5,753,881 | 842 | 874 | 271 | 258 | 128 | 22 | 85 | 64 | 1.33 GiB | EBI | 60.6 | 99.1 |
Y | 20 | 57,227,415 | 211,643 | 71 | 388 | 71 | 30 | 15 | 7 | 17 | 3 | 15.66 GiB | EBI | 10.4 | 100 |
mtDNA | 0.0054 | 16,569 | 929 | 13 | 0 | 0 | 24 | 0 | 2 | 0 | 0 | NA | EBI | N/A | 100 |
total | 3,088,286,401 | 155,630,645 | 20412 | 14600 | 14727 | 5037 | 1756 | 532 | 1944 | 1521 | 58.81 GiB |
Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.
human genes (count)
The number of genes in the human genome (see: full gene list) is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies.
Gencode | Ensemble | Refseq | CHESS | |
---|---|---|---|---|
protein-coding genes | 19,901 | 20,376 | 20,345 | 21,306 |
lncRNA genes | 15,779 | 14,720 | 17,712 | 18,484 |
antisense RNA | 5501 | 28 | 2694 | |
miscellaneous RNA | 2213 | 2222 | 13,899 | 4347 |
Pseudogenes | 14,723 | 1740 | 15,952 | |
total transcripts | 203,835 | 203,903 | 154,484 | 328,827 |