What is the difference between match length, chain length, and sequence length?

Sequence length is the nucleotide length of a particular sequence. Chain means the same thing. The match length is the stretch of nucleotides over which two sequences align, including any mismatches which fall within the match length.

Does match length include mismatches?


How is the Score calculated?

Score refers to the BLAST score, which actually is more influenced by the length of the pairwise alignment between the search sequence and the retrieved reference sequence, than by the number of mismatches.  There are several references to the BLAST score (since BLAST can and should be parametrized).

How are Identities and Percent Similarities determined?

The number of matched positions in a pairwise alignment and that number expressed as a percent of the total number of nucleotides aligned.

Is it true that more mismatches result in lower Identities and Scores?


A mismatch of 61 is shown, what does this mean?

This means there are 61 mismatches along the whole match length (not a mismatch length of 61).

What do square brackets around an organism name mean?

Square brackets ([ ]) around a genus typically indicate that the organism name awaits appropriate action by the research community to be transferred to another genus. Square brackets can also indicate a naming issue for example the original record does not carry the current nomenclature for the organism.

How are gaps handled?

A gap counts as a mismatch. BLAST may discontinue the alignment when the diversity in a region is too high, so these mismatches are then not accounted for by BLAST. Perform a quick multiple alignment, in case of a doubt, to visualize if BLAST has indeed truncated an alignment prematurely. (Note: Multiple alignments work for this purpose because the multiple alignments are created using an algorithm different from BLAST).

How is the ID Report Result field populated?

Often, the Report Result field is populated automatically from the identification chosen and specified by the user building an ID report. The report may be manually edited by an authorized user(e.g. to specify a slash call). This is a field within the database that can be configured to the specific needs of an organization.

What is the difference between searching the IDNS Reference Sequences (blue) and searching the organization's Sample Sequences (red). When/how do an organization's Accessions become References?

The main work area is the red area, where a new sample is added or the sample database is mined e.g. to identify similar cases. The private reference database is optional (blue) and intended to be used if an institution wishes to build its own private reference database of very well-characterized isolates, unusual isolates, isolates of a particular genus of (research) interest, etc. Usually, customers focus on building the sample database (red area) first and then, consider making use of the private reference database as experience grows.

When searching references, what does selecting IUPAC aware do?

If the consensus sequence of an isolate has ambiguous positions in the sequence (e.g. Y, R, K...) perhaps as a result of inter-operon variability in the 16S, each ambiguous position will be counted as a mismatch by BLAST. However, setting the search flag "IUPAC aware" means that no mismatch will be recorded IF the reference sequence contains one of the nucleotide codes which is valid for the IUPAC code present in your sample sequence at that position. IUPAC aware searches take longer to run, due to increased computation of permutations.

When examining nucleotide alignment for differences/similarities what is represented by the various symbols?

The --- symbol(s) indicate no data available at this position for this sequence; the ... symbol(s) indicate a match with the reference at this position; the *** symbol(s)indicate all aligned sequences match at this position.