Bioinformatics

Early bioinformatics—computational alignment of experimentally determined sequences of a class of related proteins; see for further information.
Map of the human X chromosome (from the National Center for Biotechnology Information website)
Sequences of genetic material are frequently used in bioinformatics and are easier to manage using computers than manually.
These are sequences being compared in a MUSCLE multiple sequence alignment (MSA). Each sequence name (leftmost column) is from various louse species, while the sequences themselves are in the second column.
Image: 450 pixels Sequencing analysis steps
MIcroarray vs RNA-Seq
3-dimensional protein structures such as this one are common subjects in bioinformatic analyses.
Interactions between proteins are frequently visualized and analyzed using networks. This network is made up of protein–protein interactions from Treponema pallidum, the causative agent of syphilis and other diseases.

Interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex.

- Bioinformatics

500 related topics

Relevance

Genomics

Interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes.

The number of genome projects has increased as technological improvements continue to lower the cost of sequencing. (A) Exponential growth of genome sequence databases since 1995. (B) The cost in US Dollars (USD) to sequence one million bases. (C) The cost in USD to sequence a 3,000 Mb (human-sized) genome on a log-transformed scale.
General schema showing the relationships of the genome, transcriptome, proteome, and metabolome (lipidome).
Overview of a genome project. First, the genome must be selected, which involves several factors including cost and relevance. Second, the sequence is generated and assembled at a given sequencing center (such as BGI or DOE JGI). Third, the genome sequence is annotated at several levels: DNA, protein, gene pathways, or comparatively.
An ABI PRISM 3100 Genetic Analyzer. Such capillary sequencers automated early large-scale genome sequencing efforts.
Illumina Genome Analyzer II System. Illumina technologies have set the standard for high-throughput massively parallel sequencing.
300px
Environmental Shotgun Sequencing (ESS) is a key technique in metagenomics. (A) Sampling from habitat; (B) filtering particles, typically by size; (C) Lysis and DNA extraction; (D) cloning and library construction; (E) sequencing the clones; (F) sequence assembly into contigs and scaffolds.

Genomics also involves the sequencing and analysis of genomes through uses of high throughput DNA sequencing and bioinformatics to assemble and analyze the function and structure of entire genomes.

Interdisciplinarity

Interdisciplinarity or interdisciplinary studies involves the combination of two or more academic disciplines into one activity (e.g., a research project).

Collage of images representing different academic disciplines

Examples include quantum information processing, an amalgamation of quantum physics and computer science, and bioinformatics, combining molecular biology with computer science.

Sequence alignment

Alignment of 27 avian influenza hemagglutinin protein sequences colored by residue conservation (top) and residue properties (bottom)
A profile HMM modelling a multiple sequence alignment

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.

Pattern recognition

Automated recognition of patterns and regularities in data.

The face was automatically detected by special software.

It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Systems biology

Computational and mathematical analysis and modeling of complex biological systems.

An illustration of the systems approach to biology
Shows trends in systems biology research by presenting the number of articles out of the top 30 cited systems biology papers during that time which include a specific topic
Overview of signal transduction pathways
A simple three protein negative feedback loop modeled with mass action kinetic differential equations. Each protein interaction is described by a Michaelis–Menten reaction.
Plot of Concentrations vs time for the simple three protein negative feedback loop. All parameters are set to either 0 or 1 for initial conditions. The reaction is allowed to proceed until it hits equilibrium. This plot is of the change in each protein over time.

Indeed, the focus on the dynamics of the studied systems is the main conceptual difference between systems biology and bioinformatics.

Computational biology

Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modelling and computational simulation techniques to the study of biological, ecological, behavioral, and social systems.

This timeline displays the year-by-year progress of the now famous Human Genome Project in the context of genetics and genomics as a whole since 1865. Starting in 1990, by 1995 they had mapped the first ever bacterial genome, H. influenzae. Four years later in 1999, chromosome 22 became the first human chromosome to be completely sequenced. Three years after that, a complete draft for the mouse, rat, and rice genome were all completed. Finally, in the following year a complete draft of the human genome was completed, satisfying the initial goals of the project, though work continues to this day.
A partially sequenced genome.
Figure 1 - Heat-map of the Jaccard similarity index matrix for two given nuclear profiles.
Figure 2 - Radar chart comparing percentage of features in each cluster.
Figure 3 - Heat-map of the normalized linkage values for two given genomic windows.
Figure 1: Heat-map of Jaccard Distances of nuclear profiles
Diagram showing a simple random forest.

Computational biology, which includes many aspects of bioinformatics and much more, is the science of using biological data to develop algorithms or models in order to understand biological systems and relationships.

Biological engineering

Application of principles of biology and the tools of engineering to create usable, tangible, economically-viable products.

Some biological machines
Modeling of the spread of disease using Cellular Automata and Nearest Neighbor Interactions

Biological engineering employs knowledge and expertise from a number of pure and applied sciences, such as mass and heat transfer, kinetics, biocatalysts, biomechanics, bioinformatics, separation and purification processes, bioreactor design, surface science, fluid mechanics, thermodynamics, and polymer science.

Information engineering (field)

Engineering discipline that deals with the generation, distribution, analysis, and use of information, data, and knowledge in systems.

Object detection for a stop sign.
An example of clustering in machine learning.
An example of how the 2D Fourier transform can be used to remove unwanted information from an X-ray scan.

The components of information engineering include more theoretical fields such as machine learning, artificial intelligence, control theory, signal processing, and information theory, and more applied fields such as computer vision, natural language processing, bioinformatics, medical image computing, cheminformatics, autonomous robotics, mobile robotics, and telecommunications.

Cluster analysis

Task of grouping a set of objects in such a way that objects in the same group are more similar (in some sense) to each other than to those in other groups (clusters).

The result of a cluster analysis shown as the coloring of the squares into three clusters.
Single-linkage on Gaussian data. At 35 clusters, the biggest cluster starts fragmenting into smaller parts, while before it was still connected to the second largest due to the single-link effect.
Single-linkage on density-based clusters. 20 clusters extracted, most of which contain single elements, since linkage clustering does not have a notion of "noise".
k-means separates data into Voronoi cells, which assumes equal-sized clusters (not adequate here)
k-means cannot represent density-based clusters
On Gaussian-distributed data, <abbr title="expectation–maximization">EM works well, since it uses Gaussians for modelling clusters
Density-based clusters cannot be modeled using Gaussian distributions
Density-based clustering with DBSCAN.
DBSCAN assumes clusters of similar density, and may have problems separating nearby clusters
OPTICS is a DBSCAN variant, improving handling of different densities clusters

It is a main task of exploratory data analysis, and a common technique for statistical data analysis, used in many fields, including pattern recognition, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning.

Structural biology

Branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules , how they acquire the structures they have, and how alterations in their structures affect their function.

Hemoglobin, the oxygen transporting protein found in red blood cells
Examples of protein structures from the Protein Data Bank (PDB)
Flowchart of how structural biology plays a role in drug discovery

A third approach that structural biologists take to understanding structure is bioinformatics to look for patterns among the diverse sequences that give rise to particular shapes.