SmartGene - Integrated Database Network System

What is the difference between match length, chain length, and sequence length?

Sequence length is the nucleotide length of a particular sequence. Chain means the same thing. The match length is the stretch of nucleotides over which two sequences align, including any mismatches which fall within the match length.

Does match length include mismatches?

Yes

How is the Score calculated?

Score refers to the BLAST (Basic Local Alignment Search Tool) score. BLAST is a Registered Trademark of the National Library of Medicine. BLAST finds regions of similarity between nucleotide or protein sequences and is used to search sequence databases for optimal local alignments to a query sequence. Compare BLAST results by sorting by both Score and Mismatches. Per NCBI's definition page, the raw score of BLAST is the score of an alignment, calculated as the sum of substitution and gap scores.

How are Identities and Percent Similarities determined?

Identities are the number of matched positions in a pairwise alignment, and Percent Similarity is calculated by expresseing the Identities number as a percentage of the total number of nucleotides aligned.

Is it true that more mismatches result in lower Identities and Scores?

Yes

A mismatch of 61 is shown, what does this mean?

This means there are 61 nucleotide mismatches along the whole match length (not a mismatch length of 61).

What do square brackets around an organism name mean?

Square brackets ([ ]) around a genus typically indicate that the organism name awaits appropriate action by the research community to be transferred to another genus. Square brackets can also indicate a naming issue for example the original record does not carry the current nomenclature for the organism.

How are gaps handled?

A gap counts as a mismatch. BLAST may discontinue the alignment when the diversity in a region is too high, so these mismatches are then not accounted for by BLAST. When in doubt, perform a quick multiple alignment to visualize if BLAST has indeed truncated an alignment prematurely. (Note: Multiple alignments work for this purpose because the multiple alignments are created using an algorithm different from BLAST).

How is the Result field in the ID Report populated?

Often, the Result field in the ID Report field is populated automatically from the identification chosen and specified by the user building the ID report. The text of the result may be manually edited by an authorized user (e.g. to specify a slash call). This is a field within the database that can be configured to the specific needs of an organization.

What is the difference between searching the IDNS Reference Sequences (blue) and searching the organization's Sample Sequences (red). When/how do an organization's Accessions become References?

The main work area is the red area, where a new sample is added or the sample database is mined e.g. to identify similar cases. The private reference database is optional (blue) and intended to be used if an institution wishes to build its own private reference database of very well-characterized isolates, unusual isolates, isolates of a particular genus of (research) interest, etc. Usually, customers focus on building the sample database (red area) first and then, consider making use of the private reference database as experience grows.

When searching references, what does selecting IUPAC aware do?

If the consensus sequence of an isolate has ambiguous positions in the sequence (e.g. Y, R, K...) perhaps as a result of inter-operon variability in the 16S, each ambiguous position will be counted as a mismatch by BLAST. However, setting the search flag "IUPAC aware" means that no mismatch will be recorded IF the reference sequence contains one of the nucleotide codes which is valid for the IUPAC code present in your sample sequence at that position. IUPAC aware searches take longer to run, due to increased computation of permutations.

When examining nucleotide alignment for differences/similarities what is represented by the various symbols?

The --- symbol(s) indicate no data available at this position for this sequence; the ... symbol(s) indicate a match with the reference at this position; the *** symbol(s)indicate all aligned sequences match at this position.