The number of genome projects has increased as technological improvements continue to lower the cost of sequencing. (A) Exponential growth of genome sequence databases since 1995. (B) The cost in US Dollars (USD) to sequence one million bases. (C) The cost in USD to sequence a 3,000 Mb (human-sized) genome on a log-transformed scale.
General schema showing the relationships of the genome, transcriptome, proteome, and metabolome (lipidome).
Overview of a genome project. First, the genome must be selected, which involves several factors including cost and relevance. Second, the sequence is generated and assembled at a given sequencing center (such as BGI or DOE JGI). Third, the genome sequence is annotated at several levels: DNA, protein, gene pathways, or comparatively.
An ABI PRISM 3100 Genetic Analyzer. Such capillary sequencers automated early large-scale genome sequencing efforts.
Illumina Genome Analyzer II System. Illumina technologies have set the standard for high-throughput massively parallel sequencing.
Environmental Shotgun Sequencing (ESS) is a key technique in metagenomics. (A) Sampling from habitat; (B) filtering particles, typically by size; (C) Lysis and DNA extraction; (D) cloning and library construction; (E) sequencing the clones; (F) sequence assembly into contigs and scaffolds.

Interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes.

- Genomics

500 related topics



Interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex.

Early bioinformatics—computational alignment of experimentally determined sequences of a class of related proteins; see for further information.
Map of the human X chromosome (from the National Center for Biotechnology Information website)
Sequences of genetic material are frequently used in bioinformatics and are easier to manage using computers than manually.
These are sequences being compared in a MUSCLE multiple sequence alignment (MSA). Each sequence name (leftmost column) is from various louse species, while the sequences themselves are in the second column.
Image: 450 pixels Sequencing analysis steps
MIcroarray vs RNA-Seq
3-dimensional protein structures such as this one are common subjects in bioinformatic analyses.
Interactions between proteins are frequently visualized and analyzed using networks. This network is made up of protein–protein interactions from Treponema pallidum, the causative agent of syphilis and other diseases.

Bioinformatics includes biological studies that use computer programming as part of their methodology, as well as specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics.


Biologist who studies genetics, the science of genes, heredity, and variation of organisms.

A geneticist explaining gene sequencing.

They also participate in more specific genetics courses such as molecular genetics, transmission genetics, population genetics, quantitative genetics, ecological genetics, and genomics.

Systems biology

Computational and mathematical analysis and modeling of complex biological systems.

An illustration of the systems approach to biology
Shows trends in systems biology research by presenting the number of articles out of the top 30 cited systems biology papers during that time which include a specific topic
Overview of signal transduction pathways
A simple three protein negative feedback loop modeled with mass action kinetic differential equations. Each protein interaction is described by a Michaelis–Menten reaction.
Plot of Concentrations vs time for the simple three protein negative feedback loop. All parameters are set to either 0 or 1 for initial conditions. The reaction is allowed to proceed until it hits equilibrium. This plot is of the change in each protein over time.

Items that may be a computer database include: phenomics, organismal variation in phenotype as it changes during its life span; genomics, organismal deoxyribonucleic acid (DNA) sequence, including intra-organismal cell specific variation.


All genetic information of an organism.

A label diagram explaining the different parts of a prokaryotic genome
An image of the 46 chromosomes making up the diploid genome of a human male. (The mitochondrial chromosome is not shown.)
Part of DNA sequence - prototypification of complete genome of virus
Composition of the human genome
Log-log plot of the total number of annotated proteins in genomes submitted to GenBank as a function of genome size.

The study of the genome is called genomics.


Large-scale study of proteins.

Robotic preparation of MALDI mass spectrometry samples on a sample carrier

After genomics and transcriptomics, proteomics is the next step in the study of biological systems.


Pleiotropy (from Greek πλείων pleion, 'more', and τρόπος tropos, 'way') occurs when one gene influences two or more seemingly unrelated phenotypic traits.

Simple genotype–phenotype map that only shows additive pleiotropy effects. G1, G2, and G3 are different genes that contribute to phenotypic traits P1, P2, and P3.
Pleiotropy seems limited for many traits in humans since the SNP overlap, as measured by variance accounted for, between many polygenic predictors is small.
Peacock with albinism
The blood of a two-week-old infant is collected for a PKU screening.
Photomicrograph of normal-shaped and sickle-shape red blood cells from a patient with sickle cell disease
Patient with Marfan Syndrome
Chicken exhibiting the frizzle feather trait

Studies on fungal evolutionary genomics have shown pleiotropic traits that simultaneously affect adaptation and reproductive isolation, converting adaptations directly to speciation.

DNA sequencing

Process of determining the nucleic acid sequence – the order of nucleotides in DNA.

An example of the results of automated chain-termination DNA sequencing.
Frederick Sanger, a pioneer of sequencing. Sanger is one of the few scientists who was awarded two Nobel prizes, one for the sequencing of proteins, and the other for the sequencing of DNA.
The 5,386 bp genome of bacteriophage φX174. Each coloured block represents a gene.
Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping DNA regions.(click to expand)
Multiple, fragmented sequence reads must be assembled together on the basis of their overlapping areas.
An Illumina HiSeq 2500 sequencer
Illumina NovaSeq 6000 flow cell
An Illumina MiSeq sequencer
A BGI MGISEQ-2000RS sequencer
Library preparation for the SOLiD platform
Sequencing of the TAGGCT template with IonTorrent, PacBioRS and GridION
Total cost of sequencing a human genome over time as calculated by the NHGRI.

The Sanger method, in mass production form, is the technology which produced the first human genome in 2001, ushering in the age of genomics.

Human Genome Project

International scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a physical and a functional standpoint.

Logo of the Human Genome Project
The first printout of the human genome to be presented as a series of books, displayed at the Wellcome Collection, London

Because of widespread international cooperation and advances in the field of genomics (especially in sequence analysis), as well as major advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by U.S. President Bill Clinton and British Prime Minister Tony Blair on June 26, 2000).

Model organism

Non-human species that is extensively studied to understand particular biological phenomena, with the expectation that discoveries made in the model organism will provide insight into the workings of other organisms.

Escherichia coli is a gram-negative prokaryotic model organism
Drosophila melanogaster, one of the most famous subjects for genetics experiments
Saccharomyces cerevisiae, one of the most intensively studied eukaryotic model organisms in molecular and cell biology
Laboratory mice, widely used in medical research

Inquiries about the DNA of organisms are classed as genetic models (with short generation times, such as the fruitfly and nematode worm), experimental models, and genomic parsimony models, investigating pivotal position in the evolutionary tree.


Study of genetic material recovered directly from environmental samples.

In metagenomics, the genetic materials (DNA, C) are extracted directly from samples taken from the environment (e.g. soil, sea water, human gut, A) after filtering (B), and are sequenced (E) after multiplication by cloning (D) in an approach called shotgun sequencing. These short sequences can then be put together again using assembly methods (F) to deduce the individual genomes or parts of genomes that constitute the original environmental sample. This information can then be used to study the species diversity and functional potential of the microbial community of the environment.
Flow diagram of a typical metagenome project
Schematic representation of the main steps necessary for the analysis of whole metagenome shotgun sequencing-derived data. The software related to each step is shown in italics.
Metagenomics allows the study of microbial communities like those present in this stream receiving acid drainage from surface coal mining.

While traditional microbiology and microbial genome sequencing and genomics rely upon cultivated clonal cultures, early environmental gene sequencing cloned specific genes (often the 16S rRNA gene) to produce a profile of diversity in a natural sample.