Genomics Terminology
proband
A proband is an individual serving as the starting point for the genetic study of a family (used especially in medicine). A proband is usually the first affected individual in a family who brings a genetic disorder to the attention of the medical community.
trio analysis
A trio refers to 2 parents + 1 offspring (2 + 1 = 3, hence trio). In medical genetics, trio analysis often means the analysis of a proband's genome and along with their parents genome. An exome trio-based approach is fundamental to the identification of heterozygous dominant pathogenic variants (in an afflicted proband and their unaffected parents).
human genome
The total length of the human genome is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed autosomes, plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in each mitochondrion. Basic information about these molecules and their gene content, based on a reference genome that does not represent the sequence of any specific individual, are provided in the following table.
Chromosome | Length (mm) |
Base pairs |
Variations | Protein coding genes |
Pseudo- genes |
long ncRNA |
small ncRNA |
miRNA | rRNA | snRNA | snoRNA | gnomAD exome.vcf |
Links | Centromere pos (Mbp) |
Cumulative (%) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 85 | 248,956,422 | 12,151,146 | 2058 | 1220 | 1200 | 496 | 134 | 66 | 221 | 145 | 5.77 GiB | EBI | 125 | 7.9 |
2 | 83 | 242,193,529 | 12,945,965 | 1309 | 1023 | 1037 | 375 | 115 | 40 | 161 | 117 | 4.20 GiB | EBI | 93.3 | 16.2 |
3 | 67 | 198,295,559 | 10,638,715 | 1078 | 763 | 711 | 298 | 99 | 29 | 138 | 87 | 3.29 GiB | EBI | 91 | 23 |
4 | 65 | 190,214,555 | 10,165,685 | 752 | 727 | 657 | 228 | 92 | 24 | 120 | 56 | 2.17 GiB | EBI | 50.4 | 29.6 |
5 | 62 | 181,538,259 | 9,519,995 | 876 | 721 | 844 | 235 | 83 | 25 | 106 | 61 | 2.51 GiB | EBI | 48.4 | 35.8 |
6 | 58 | 170,805,979 | 9,130,476 | 1048 | 801 | 639 | 234 | 81 | 26 | 111 | 73 | 2.83 GiB | EBI | 61 | 41.6 |
7 | 54 | 159,345,973 | 8,613,298 | 989 | 885 | 605 | 208 | 90 | 24 | 90 | 76 | 2.88 GiB | EBI | 59.9 | 47.1 |
8 | 50 | 145,138,636 | 8,221,520 | 677 | 613 | 735 | 214 | 80 | 28 | 86 | 52 | 2.13 GiB | EBI | 45.6 | 52 |
9 | 48 | 138,394,717 | 6,590,811 | 786 | 661 | 491 | 190 | 69 | 19 | 66 | 51 | 2.40 GiB | EBI | 49 | 56.3 |
10 | 46 | 133,797,422 | 7,223,944 | 733 | 568 | 579 | 204 | 64 | 32 | 87 | 56 | 2.23 GiB | EBI | 40.2 | 60.9 |
11 | 46 | 135,086,622 | 7,535,370 | 1298 | 821 | 710 | 233 | 63 | 24 | 74 | 76 | 3.61 GiB | EBI | 53.7 | 65.4 |
12 | 45 | 133,275,309 | 7,228,129 | 1034 | 617 | 848 | 227 | 72 | 27 | 106 | 62 | 3.07 GiB | EBI | 35.8 | 70 |
13 | 39 | 114,364,328 | 5,082,574 | 327 | 372 | 397 | 104 | 42 | 16 | 45 | 34 | 0.98 GiB | EBI | 17.9 | 73.4 |
14 | 36 | 107,043,718 | 4,865,950 | 830 | 523 | 533 | 239 | 92 | 10 | 65 | 97 | 2.02 GiB | EBI | 17.6 | 76.4 |
15 | 35 | 101,991,189 | 4,515,076 | 613 | 510 | 639 | 250 | 78 | 13 | 63 | 136 | 2.08 GiB | EBI | 19 | 79.3 |
16 | 31 | 90,338,345 | 5,101,702 | 873 | 465 | 799 | 187 | 52 | 32 | 53 | 58 | 3.04 GiB | EBI | 36.6 | 82 |
17 | 28 | 83,257,441 | 4,614,972 | 1197 | 531 | 834 | 235 | 61 | 15 | 80 | 71 | 3.62 GiB | EBI | 24 | 84.8 |
18 | 27 | 80,373,285 | 4,035,966 | 270 | 247 | 453 | 109 | 32 | 13 | 51 | 36 | 0.88 GiB | EBI | 17.2 | 87.4 |
19 | 20 | 58,617,616 | 3,858,269 | 1472 | 512 | 628 | 179 | 110 | 13 | 29 | 31 | 4.30 GiB | EBI | 26.5 | 89.3 |
20 | 21 | 64,444,167 | 3,439,621 | 544 | 249 | 384 | 131 | 57 | 15 | 46 | 37 | 1.44 GiB | EBI | 27.5 | 91.4 |
21 | 16 | 46,709,983 | 2,049,697 | 234 | 185 | 305 | 71 | 16 | 5 | 21 | 19 | 0.65 GiB | EBI | 13.2 | 92.6 |
22 | 17 | 50,818,468 | 2,135,311 | 488 | 324 | 357 | 78 | 31 | 5 | 23 | 23 | 1.43 GiB | EBI | 14.7 | 93.8 |
X | 53 | 156,040,895 | 5,753,881 | 842 | 874 | 271 | 258 | 128 | 22 | 85 | 64 | 1.33 GiB | EBI | 60.6 | 99.1 |
Y | 20 | 57,227,415 | 211,643 | 71 | 388 | 71 | 30 | 15 | 7 | 17 | 3 | 15.66 GiB | EBI | 10.4 | 100 |
mtDNA | 0.0054 | 16,569 | 929 | 13 | 0 | 0 | 24 | 0 | 2 | 0 | 0 | NA | EBI | N/A | 100 |
total | 3,088,286,401 | 155,630,645 | 20412 | 14600 | 14727 | 5037 | 1756 | 532 | 1944 | 1521 | 58.81 GiB |
Table 1 (above) summarizes the physical organization and gene content of the human reference genome, with links to the original analysis, as published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths were estimated by multiplying the number of base pairs by 0.34 nanometers, the distance between base pairs in the DNA double helix. A recent estimation of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. The number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.
number of human genes
The number of genes in the human genome (see: full gene list) is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA (see below). The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames. Table 2 gives estimates from various projects and shows these discrepancies.
Gencode | Ensemble | Refseq | CHESS | |
---|---|---|---|---|
protein-coding genes | 19,901 | 20,376 | 20,345 | 21,306 |
lncRNA genes | 15,779 | 14,720 | 17,712 | 18,484 |
antisense RNA | 5501 | 28 | 2694 | |
miscellaneous RNA | 2213 | 2222 | 13,899 | 4347 |
Pseudogenes | 14,723 | 1740 | 15,952 | |
total transcripts | 203,835 | 203,903 | 154,484 | 328,827 |