Sequence alignment

sequence identityalignmentalignmentsalignedaligningalignbiological sequences alignedconserveddisplay conventionalignment comparisons
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.wikipedia
284 Related Articles

Bioinformatics

bioinformaticbioinformaticianbio-informatics
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein–protein interactions, genome-wide association studies, the modeling of evolution and cell division/mitosis.

Segregating site

non-conservative replacementsnon-conservative mutationsvariable
Below the protein sequences is a key denoting conserved sequence, conservative mutations, semi-conservative mutations, and non-conservative mutations.
Segregating sites are positions which show differences (polymorphisms) between related genes in a sequence alignment (are not conserved).

Consensus sequence

consensus sequencescanonical sequenceconsensus
For multiple sequences the last row in each column is often the consensus sequence determined by the alignment; the consensus sequence is also often represented in graphical format with a sequence logo in which the size of each nucleotide or amino acid letter corresponds to its degree of conservation.
In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated order of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment.

Smith–Waterman algorithm

Smith-Waterman algorithmSmith-WatermanSmith and Waterman
The Smith–Waterman algorithm is a general local alignment method based on the same dynamic programming scheme but with additional choices to start and end at any place.
The Smith–Waterman algorithm performs local sequence alignment; that is, for determining similar regions between two strings of nucleic acid sequences or protein sequences.

SAM (file format)

SAMSequence Alignment MapSAM format
The SAM/BAM files use the CIGAR (Compact Idiosyncratic Gapped Alignment Report) string format to represent an alignment of a sequence to a reference by encoding a sequence of events (e.g. match/mismatch, insertions, deletions).
Sequence Alignment Map (SAM) is a text-based format originally for storing biological sequences aligned to a reference sequence developed by Heng Li and Bob Handsaker et al.

Needleman–Wunsch algorithm

Needleman-Wunsch algorithmNeedleman-WunschNeedleman and Wunsch
(This does not mean global alignments cannot start and/or end in gaps.) A general global alignment technique is the Needleman–Wunsch algorithm, which is based on dynamic programming.
The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences.

Biopython

There are also several programming packages which provide this conversion functionality, such as BioPython, BioRuby and BioPerl.
Separate modules extend Biopython's capabilities to sequence alignment, protein structure, population genetics, phylogenetics, sequence motifs, and machine learning.

Dynamic programming

dynamicdynamic contracting problemsdynamic programming (DP),
These include slow but formally correct methods like dynamic programming.
Dynamic programming is widely used in bioinformatics for the tasks such as sequence alignment, protein folding, RNA structure prediction and protein-DNA binding.

FASTA format

FASTAfasta sequences
Most web-based tools allow a limited number of input and output formats, such as FASTA format and GenBank format and the output is not easily editable.
Sequences may be protein sequences or nucleic acid sequences, and they can contain gaps or alignment characters (see sequence alignment).

BioRuby

There are also several programming packages which provide this conversion functionality, such as BioPython, BioRuby and BioPerl.
It contains classes for DNA and protein sequence analysis, sequence alignment, biological database parsing, structural biology and other bioinformatics tasks.

Conserved sequence

sequence conservationconservedhighly conserved
Below the protein sequences is a key denoting conserved sequence, conservative mutations, semi-conservative mutations, and non-conservative mutations.
Conserved sequences are typically identified by bioinformatics approaches based on sequence alignment.

DNA

deoxyribonucleic aciddouble-stranded DNAdsDNA
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences.
The DNA sequence may be aligned with other DNA sequences to identify homologous sequences and locate the specific mutations that make them distinct.

EMBOSS

Several conversion programs that provide graphical and/or command line interfaces are available, such as READSEQ and EMBOSS.
The EMBOSS package contains a variety of applications for sequence alignment, rapid database searching with sequence patterns, protein motif identification (including domain analysis), and much more.

Substitution matrix

substitution matricesBLOSUM50
In typical usage, protein alignments use a substitution matrix to assign scores to amino-acid matches or mismatches, and a gap penalty for matching an amino acid in one sequence to a gap in the other.
Substitution matrices are usually seen in the context of amino acid or DNA sequence alignments, where the similarity between sequences depends on their divergence time and the substitution rates as represented in the matrix.

Edit distance

distance costfamily of distance metricsLevenshtein algorithm
Sequence alignments are also used for non-biological sequences, such as calculating the distance cost between strings in a natural language or in financial data.

Multiple sequence alignment

MSAmultiple alignmentalignment
Multiple sequence alignment is an extension of pairwise alignment to incorporate more than two sequences at a time.
A multiple sequence alignment (MSA) is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA.

Bowtie (sequence analysis)

BowtieBowtie 2Bowtie2
The Burrows–Wheeler transform has been successfully applied to fast short read alignment in popular tools such as Bowtie and BWA.
Bowtie is a software package commonly used for sequence alignment and sequence analysis in bioinformatics.

Sequence logo

consensus logoHMM logo
For multiple sequences the last row in each column is often the consensus sequence determined by the alignment; the consensus sequence is also often represented in graphical format with a sequence logo in which the size of each nucleotide or amino acid letter corresponds to its degree of conservation.
Like a sequence logo, a consensus logo is created from a collection of aligned protein or DNA/RNA sequences and conveys information about the conservation of each position of a sequence motif or sequence alignment

BioPerl

Perl biology tools
There are also several programming packages which provide this conversion functionality, such as BioPython, BioRuby and BioPerl.

Conservative replacement

conservative mutationconservativeconservative substitution
Below the protein sequences is a key denoting conserved sequence, conservative mutations, semi-conservative mutations, and non-conservative mutations.

Protein structure prediction

secondary structure predictionexperimental and modeling approachespredict
Many variations of the Clustal progressive implementation are used for multiple sequence alignment, phylogenetic tree construction, and as input for protein structure prediction.
The best modern methods of secondary structure prediction in proteins reach about 80% accuracy; this high accuracy allows the use of the predictions as feature improving fold recognition and ab initio protein structure prediction, classification of structural motifs, and refinement of sequence alignments.

Sequence assembly

genome assemblyassembledassembly
Sequence alignment is also a part of genome assembly, where sequences are aligned to find overlap so that contigs (long stretches of sequence) can be formed.
In bioinformatics, sequence assembly refers to aligning and merging fragments from a longer DNA sequence in order to reconstruct the original sequence.

List of sequence alignment software

Sequence alignment softwareBurrows-Wheeler AlignerBWA
A more complete list of available software categorized by algorithm and alignment type is available at sequence alignment software, but common software tools used for general sequence alignment tasks include ClustalW2 and T-coffee for alignment, and BLAST and FASTA3x for database searching.
This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment.

Point accepted mutation

PAMPAM MatricesPAM matrix
A series of matrices called PAM matrices (Point Accepted Mutation matrices, originally defined by Margaret Dayhoff and sometimes referred to as "Dayhoff matrices") explicitly encode evolutionary approximations regarding the rates and probabilities of particular amino acid mutations.
In bioinformatics, PAM matrices are regularly used as substitution matrices to score sequence alignments for proteins.

Distance matrix

distance matricesdissimilarity matrixdistance (or cost) matrix
The DALI method, or distance matrix alignment, is a fragment-based method for constructing structural alignments based on contact similarity patterns between successive hexapeptides in the query sequences.
They are used in structural and sequential alignment, and for the determination of protein structures from NMR or X-ray crystallography.