Precomputed results were generated by running PADLOC v2.0.0 with PADLOC-DB v2.0.0 over RefSeq v209 Bacteria and Archaea genomes. Last updated on 25 October 2023.
See the notes at the bottom of this page for guidance in interpreting output.
Accession: GCF_000146755.1_ASM14675v1
Name: Streptococcus sp. oral taxon 071 str. 73H25AP
Please note: CRISPRDetect numbers CRISPR arrays within contigs. PADLOC renumbers the arrays with sequential numbering across all contigs, thereby ensuring all array numbers are unique.
Interpreting output:
Systems separated by < 3 ORFs are grouped into 'defence loci' for visualisation.
Pseudogenes are outlined in red. They are only represented in the precomputed RefSeq data (not in user-supplied genomes). Only pseudogenes with predicted protein homologs in RefSeq v209 are shown.
In general, the accuracy of detection for single-gene systems will be less than for multi-gene systems, as gene colocalisation greatly enhances detection specificity. It is important to consider the significance of each HMM hit (including E-values and alignment coverages) when determining the accuracy of detection. Lower E-values and higher alignment coverages generally indicate better hits.
Notes for specific systems:
"*_other" models require only two genes to be present and co-localised, their purpose is to capture systems that may be broken up by large insertions or breaks in contigs, these should be manually inspected. In particular, larger systems may be split by contig breaks, and will only be identified by "*_other" models if at least two genes co-occur.
The "DMS_other" model is a loose model that includes various DNA-modification system proteins (e.g. RM, BREX, DISARM). The "DMS_other" is provided to capture potential new DNA-modification based systems. Hits to this model should be carefully inspected and not assumed to be bona fide systems.
Wadjet systems are similar to SMC chromosome partitioning systems, and "wadjet_other" models may actually match SMC systems.
There are several phoshorothioation (PT) system models, including "PT_Dnd*", "PT_Pbe*", and "PT_Ssp*" which are separated into modification and restriction modules, as these modules are known to mix-and-match in genomes and are not necessarily co-localised. In general, it should be expected that at least one modification and one restriction module be present in a genome for an active system. It should also be noted that several PT system genes are too small to be correctly annotated in some genomes and therefore may be missed by PADLOC.
Some defence systems are reported to be found co-localised exclusively with other specific defence systems. In particular, "PrrC" should occur only in association with RM Type I systems. In these cases, PADLOC typically uses relaxed detection rules (to enable discovery of new biological mechanisms) and users should manually verify these types of expected associations.
CRISPR-Cas Type V ("cas_type_V") and CRISPR-Cas Type VI ("cas_type_IV") models require Cas12/13 and at least one Cas adaptation gene or CRISPR array. To improve chances of detecting these systems, CRISPRDetect should be run.
Loading output into genome viewing software:
The .gff output generated by PADLOC can be loaded into some types of graphical software for system visualisation. For example, in Geneious Prime (load the corresponding .gbff/.gb file first):