Validating padloc-db data • padlocdev

Expanding HMM metadata table

The raw HMM metadata table read-in by read_hmm_meta() can have multiple names listed in protein.name and secondary.name (separated by the delimiter |). To split these assignments into multiple rows, use expand_hmm_meta().

hmm_meta <- padlocdev_example("padloc-db/hmm_meta.txt") |> read_hmm_meta()

hmm_meta |> subset(grepl("HamA", protein.name))
#> # A tibble: 4 × 15
#>   hmm.acc hmm.name hmm.description protein.name secondary.name author number.seq
#>   <chr>   <chr>    <chr>           <chr>        <chr>          <chr>       <dbl>
#> 1 PDLC00… HamA1_0… DUF1837 domain  HamA1|HamA   NA             Payne…         49
#> 2 PDLC00… HamA1_0… DUF1837 domain  HamA1|HamA   NA             Payne…          6
#> 3 PDLC00… HamA2_0… DUF1837 domain  HamA2|HamA   NA             Payne…          7
#> 4 PDLC00… HamA2_0… DUF1837 domain  HamA2|HamA   NA             Payne…        144
#> # ℹ 8 more variables: length.hmm <dbl>, e.value.threshold <dbl>,
#> #   hmm.coverage.threshold <dbl>, target.coverage.threshold <dbl>,
#> #   system <chr>, literature.ref <chr>, database.ref <chr>, comments <chr>

hmm_meta_expanded <- expand_hmm_meta(hmm_meta)
hmm_meta_expanded |> subset(grepl("HamA", protein.name))
#> # A tibble: 8 × 15
#>   hmm.acc hmm.name hmm.description protein.name secondary.name author number.seq
#>   <chr>   <chr>    <chr>           <chr>        <chr>          <chr>       <dbl>
#> 1 PDLC00… HamA1_0… DUF1837 domain  HamA1        NA             Payne…         49
#> 2 PDLC00… HamA1_0… DUF1837 domain  HamA         NA             Payne…         49
#> 3 PDLC00… HamA1_0… DUF1837 domain  HamA1        NA             Payne…          6
#> 4 PDLC00… HamA1_0… DUF1837 domain  HamA         NA             Payne…          6
#> 5 PDLC00… HamA2_0… DUF1837 domain  HamA2        NA             Payne…          7
#> 6 PDLC00… HamA2_0… DUF1837 domain  HamA         NA             Payne…          7
#> 7 PDLC00… HamA2_0… DUF1837 domain  HamA2        NA             Payne…        144
#> 8 PDLC00… HamA2_0… DUF1837 domain  HamA         NA             Payne…        144
#> # ℹ 8 more variables: length.hmm <dbl>, e.value.threshold <dbl>,
#> #   hmm.coverage.threshold <dbl>, target.coverage.threshold <dbl>,
#> #   system <chr>, literature.ref <chr>, database.ref <chr>, comments <chr>

The HMM metadata table can be re-collapsed using collapse_hmm_meta().

hmm_meta_collapsed <- collapse_hmm_meta(hmm_meta_expanded)
hmm_meta_collapsed |> subset(grepl("HamA", protein.name))
#> # A tibble: 4 × 15
#>   hmm.acc hmm.name hmm.description protein.name secondary.name author number.seq
#>   <chr>   <chr>    <chr>           <chr>        <chr>          <chr>       <dbl>
#> 1 PDLC00… HamA1_0… DUF1837 domain  HamA1|HamA   NA             Payne…         49
#> 2 PDLC00… HamA1_0… DUF1837 domain  HamA1|HamA   NA             Payne…          6
#> 3 PDLC00… HamA2_0… DUF1837 domain  HamA2|HamA   NA             Payne…          7
#> 4 PDLC00… HamA2_0… DUF1837 domain  HamA2|HamA   NA             Payne…        144
#> # ℹ 8 more variables: length.hmm <dbl>, e.value.threshold <dbl>,
#> #   hmm.coverage.threshold <dbl>, target.coverage.threshold <dbl>,
#> #   system <chr>, literature.ref <chr>, database.ref <chr>, comments <chr>

Expanding secondary gene assignments

Raw system model files can contain references to gene groups a.k.a secondary.name assignments. Before testing model validity, these gene groups need to be expanded.

First, read in the data.

model <- padlocdev_example("padloc-db/sys/DRT_class_II.yaml") |> read_padloc_model()

The model for DRT_class_II has one ‘gene’ in secondary_genes i.e. Drt_class_II which is actually a group of genes.

model[["core_genes"]]
#> [1] "Drt_class_II"

Several genes in the HMM metadata table have been assigned Drt_class_II as a secondary.name to group them together.

hmm_meta_expanded |> subset(secondary.name == "Drt_class_II")
#> # A tibble: 4 × 15
#>   hmm.acc hmm.name hmm.description protein.name secondary.name author number.seq
#>   <chr>   <chr>    <chr>           <chr>        <chr>          <chr>       <dbl>
#> 1 PDLC02… Drt2_00… NA              Drt2         Drt_class_II   Payne…         92
#> 2 PDLC02… Drt3a_0… NA              Drt3a        Drt_class_II   Payne…         53
#> 3 PDLC02… Drt3a_0… NA              Drt3a        Drt_class_II   Payne…         68
#> 4 PDLC02… Drt3b_0… NA              Drt3b        Drt_class_II   Payne…        107
#> # ℹ 8 more variables: length.hmm <dbl>, e.value.threshold <dbl>,
#> #   hmm.coverage.threshold <dbl>, target.coverage.threshold <dbl>,
#> #   system <chr>, literature.ref <chr>, database.ref <chr>, comments <chr>

Use expand_gene_groups() to expand gene groups for a particuar gene category, based on the secondary.name assignments in the HMM metadata table.

genes <- expand_gene_groups(model, "core_genes", hmm_meta_expanded)
genes
#> [1] "Drt2"  "Drt3a" "Drt3b"

These can be assigned back onto the model. Other genes that are listed in the gene category being expanded, and are not part of a gene group will also get carried over.

model[["core_genes"]] <- genes
model[["core_genes"]]
#> [1] "Drt2"  "Drt3a" "Drt3b"

A wrapper is provided to expand the gene assignments for all gene categories of all system models in a list, and assign them back onto the models.

models <- padlocdev_example("padloc-db/sys") |> multi_read_padloc_model()
models[["DRT_class_II"]][c("core_genes", "secondary_genes", "neutral_genes", "prohibited_genes")]
#> $core_genes
#> [1] "Drt_class_II"
#> 
#> $secondary_genes
#> [1] "NA"
#> 
#> $neutral_genes
#> [1] "NA"
#> 
#> $prohibited_genes
#> [1] "Drt_class_I"   "Drt_class_III"

models_expanded <- expand_gene_groups_all(models, hmm_meta_expanded)
models_expanded[["DRT_class_II"]][c("core_genes", "secondary_genes", "neutral_genes", "prohibited_genes")]
#> $core_genes
#> [1] "Drt2"  "Drt3a" "Drt3b"
#> 
#> $secondary_genes
#> [1] "NA"
#> 
#> $neutral_genes
#> [1] "NA"
#> 
#> $prohibited_genes
#> [1] "Drt1a" "Drt1b" "Drt4"  "Drt5"

Working through an example of a broken database

An example of a broken database with various issues has been included, to demonstrate database validation functions. First, we’ll read in the database files.

path_hmm <- padlocdev_example("padloc-db-broken/hmm")
path_sys <- padlocdev_example("padloc-db-broken/sys")
path_hmm_meta <- padlocdev_example("padloc-db-broken/hmm_meta.txt")
path_sys_meta <- padlocdev_example("padloc-db-broken/sys_meta.txt")
path_sys_groups <- padlocdev_example("padloc-db-broken/sys_groups.txt")

hmms <- multi_read_hmm_header(path_hmm)
models <- multi_read_padloc_model(path_sys)
hmm_meta <- read_hmm_meta(path_hmm_meta)
sys_meta <- read_sys_meta(path_sys_meta)
sys_groups <- read_sys_groups(path_sys_groups)

Verify that HMM file names match their accession

Before trying to expand anything, lets see if there’s anything missing. First, we’ll check that the HMM file names match their accessions, as this may affect other stages of validation. To do this, use verify_hmm_names().

hmm_name_verification <- verify_hmm_names(hmms)
hmm_name_verification
#> $names_match
#> # A tibble: 40 × 3
#>    file_name accession_field match
#>    <chr>     <chr>           <lgl>
#>  1 PDLC00150 PDLC00150       TRUE 
#>  2 PDLC00151 PDLC00151       TRUE 
#>  3 PDLC00174 PDLC00174       TRUE 
#>  4 PDLC00175 PDLC00175       TRUE 
#>  5 PDLC00195 PDLC00195       TRUE 
#>  6 PDLC00196 PDLC00196       TRUE 
#>  7 PDLC00199 PDLC00199       TRUE 
#>  8 PDLC00200 PDLC00200       TRUE 
#>  9 PDLC00202 PDLC00202       TRUE 
#> 10 PDLC00203 PDLC00203       TRUE 
#> # ℹ 30 more rows
#> 
#> $names_mismatch
#> # A tibble: 2 × 3
#>   file_name accession_field match
#>   <chr>     <chr>           <lgl>
#> 1 PDLC02505 PDLC00012       FALSE
#> 2 PDLC03000 PDLC02511       FALSE

Here, we can see that the HMMs with filenames PDLC03000.hmm and PDLC02505.hmm do not have matching accession fields. It’s unclear whether the filename or the accession fields are incorrect, but we could check whether there are mentions of either of these HMMs in the metadata table to investigate.

check <- list(
  file_names = hmm_name_verification[["names_mismatch"]][["file_name"]],
  accessions = hmm_name_verification[["names_mismatch"]][["accession_field"]]
)

in_hmm_meta <- purrr::map(check, function(x) subset(hmm_meta, hmm.acc %in% x))
in_hmm_meta
#> $file_names
#> # A tibble: 1 × 15
#>   hmm.acc hmm.name hmm.description protein.name secondary.name author number.seq
#>   <chr>   <chr>    <chr>           <chr>        <chr>          <chr>       <dbl>
#> 1 PDLC02… BrxL_00… BrxL            BrxL         NA             Payne…         NA
#> # ℹ 8 more variables: length.hmm <dbl>, e.value.threshold <dbl>,
#> #   hmm.coverage.threshold <dbl>, target.coverage.threshold <dbl>,
#> #   system <chr>, literature.ref <chr>, database.ref <chr>, comments <chr>
#> 
#> $accessions
#> # A tibble: 1 × 15
#>   hmm.acc hmm.name hmm.description protein.name secondary.name author number.seq
#>   <chr>   <chr>    <chr>           <chr>        <chr>          <chr>       <dbl>
#> 1 PDLC02… BrxP_00… BrxP            BrxP         NA             Payne…         NA
#> # ℹ 8 more variables: length.hmm <dbl>, e.value.threshold <dbl>,
#> #   hmm.coverage.threshold <dbl>, target.coverage.threshold <dbl>,
#> #   system <chr>, literature.ref <chr>, database.ref <chr>, comments <chr>

Now we can see that the HMM with filename PDLC02505.hmm has an entry in the HMM metadata table, so the accession is most likely what is incorrect, and should be updated to be PDLC02505. In addition, the HMM with accession PDLC02511 has an entry in the metadata table, so the filename PDLC03000.hmm should be updated to be PDLC02511.hmm.

If any HMMs listed in the mismatch table did not show up in the metadata table, we might deduce that both the filename and the accession are incorrect, or this HMM hasn’t been added to the metadata table yet (the name mismatch should be resolved before adding), or the HMM has been included in the hmm directory in error. Either way, these cases should also be dealt with.

Compare the profile HMMs with the HMM metadata table

Next, we’ll compare the HMMs with the metadata table, to ensure that all HMMs have an entry and vice-versa. To do this, use compare_hmm_files_hmm_meta().

comparison <- compare_hmm_files_hmm_meta(hmms, hmm_meta)
comparison
#> $`hmm_files without hmm_meta`
#> # A tibble: 2 × 4
#>   file.name NAME              ACC       hmm_files
#>   <chr>     <chr>             <chr>     <lgl>    
#> 1 PDLC04000 Extra_HMM         PDLC04000 TRUE     
#> 2 PDLC05000 Another_extra_HMM PDLC05000 TRUE     
#> 
#> $`hmm_meta without hmm_files`
#> # A tibble: 3 × 2
#>   hmm.name   hmm_meta
#>   <chr>      <lgl>   
#> 1 BrxA_00001 TRUE    
#> 2 BrxE_00001 TRUE    
#> 3 BrxP_00001 TRUE    
#> 
#> $`hmm_files with hmm_meta`
#> # A tibble: 40 × 3
#>    hmm.name    hmm_files hmm_meta
#>    <chr>       <lgl>     <lgl>   
#>  1 HamA1_00001 TRUE      TRUE    
#>  2 HamA1_00002 TRUE      TRUE    
#>  3 HamB1_00001 TRUE      TRUE    
#>  4 HamB1_00002 TRUE      TRUE    
#>  5 HamA2_00001 TRUE      TRUE    
#>  6 HamA2_00002 TRUE      TRUE    
#>  7 HamB2_00001 TRUE      TRUE    
#>  8 HamB2_00002 TRUE      TRUE    
#>  9 HamC2_00001 TRUE      TRUE    
#> 10 HamC2_00002 TRUE      TRUE    
#> # ℹ 30 more rows

Here, we can see that there are two HMMs that exist, but are not mentioned in the HMM metadata table: PDLC04000.hmm and PDLC05000.hmm. There are also three HMMs that are mentioned in the metadata table but do not exist: BrxA_00001, BrxE_00001, and BrxP_00001.

Compare the system models with the system metadata table

Next, we’ll compare the system models with the system metadata table, to ensure that all systems have an entry and vice-versa. To do this, use compare_sys_files_sys_meta().

comparison <- compare_sys_files_sys_meta(models, sys_meta)
comparison
#> $`sys_files without sys_meta`
#> # A tibble: 4 × 2
#>   yaml.name        sys_files
#>   <chr>            <lgl>    
#> 1 brex_other       TRUE     
#> 2 generic_system_A TRUE     
#> 3 generic_system_B TRUE     
#> 4 hachiman_type_I  TRUE     
#> 
#> $`sys_meta without sys_files`
#> # A tibble: 3 × 2
#>   yaml.name        sys_meta
#>   <chr>            <lgl>   
#> 1 brex_type_II     TRUE    
#> 2 DRT_type_II      TRUE    
#> 3 hachiman_type_II TRUE    
#> 
#> $`sys_files with sys_meta`
#> # A tibble: 14 × 3
#>    yaml.name      sys_files sys_meta
#>    <chr>          <lgl>     <lgl>   
#>  1 brex_type_I    TRUE      TRUE    
#>  2 brex_type_III  TRUE      TRUE    
#>  3 brex_type_IV   TRUE      TRUE    
#>  4 brex_type_V    TRUE      TRUE    
#>  5 brex_type_VI   TRUE      TRUE    
#>  6 DRT_class_I    TRUE      TRUE    
#>  7 DRT_class_II   TRUE      TRUE    
#>  8 DRT_class_III  TRUE      TRUE    
#>  9 DRT_other      TRUE      TRUE    
#> 10 DRT_type_I     TRUE      TRUE    
#> 11 DRT_type_III   TRUE      TRUE    
#> 12 DRT_type_IV    TRUE      TRUE    
#> 13 DRT_type_V     TRUE      TRUE    
#> 14 hachiman_other TRUE      TRUE

Here, we can see that four system models do not have entries in the system metadata table: brex_other, generic_system_A, generic_system_B, and hachiman_type_I. Additionaly, there are three entries in the metadata table that do not have corresponding system models: brex_type_II, DRT_type_II, and hachiman_type_II.

Compare the system metadata table with the system groups table

Next, we’ll compare the system metadata table with the system groups table, to ensure that all system models have been assigned to a group. To do this, use compare_sys_meta_sys_groups().

comparison <- compare_sys_meta_sys_groups(sys_meta, sys_groups)
comparison
#> $`sys_meta without sys_groups`
#> # A tibble: 2 × 2
#>   yaml.name      sys_meta
#>   <chr>          <lgl>   
#> 1 DRT_class_I    TRUE    
#> 2 hachiman_other TRUE    
#> 
#> $`sys_groups without sys_meta`
#> # A tibble: 2 × 2
#>   yaml.name       sys_groups
#>   <chr>           <lgl>     
#> 1 brex_other      TRUE      
#> 2 hachiman_type_I TRUE      
#> 
#> $`sys_meta with sys_groups`
#> # A tibble: 15 × 3
#>    yaml.name        sys_meta sys_groups
#>    <chr>            <lgl>    <lgl>     
#>  1 brex_type_I      TRUE     TRUE      
#>  2 brex_type_II     TRUE     TRUE      
#>  3 brex_type_III    TRUE     TRUE      
#>  4 brex_type_IV     TRUE     TRUE      
#>  5 brex_type_V      TRUE     TRUE      
#>  6 brex_type_VI     TRUE     TRUE      
#>  7 DRT_class_II     TRUE     TRUE      
#>  8 DRT_class_III    TRUE     TRUE      
#>  9 DRT_other        TRUE     TRUE      
#> 10 DRT_type_I       TRUE     TRUE      
#> 11 DRT_type_II      TRUE     TRUE      
#> 12 DRT_type_III     TRUE     TRUE      
#> 13 DRT_type_IV      TRUE     TRUE      
#> 14 DRT_type_V       TRUE     TRUE      
#> 15 hachiman_type_II TRUE     TRUE

Here, we can see that two systems with entries in the metadata table have not been assigned system groups: DRT_class_I and hachiman_other. Additionally, two of the systems that were previously identified as having missing entries in the metadata table have been assigned groups: brex_other and hachiman_type_I. The systems generic_system_A and generic_system_B do not appear here as they do not have entries in the metadata table or the system groups table.

Compare the system models with the HMM metadata table

Once the above issues have been resolved, we can expand the hmm metadata table and the system models (for the purposes of this example, we won’t be correcting the issues here).

hmm_meta_expanded <- expand_hmm_meta(hmm_meta)
models_expanded <- expand_gene_groups_all(models, hmm_meta_expanded)

Now we can compare the expanded models with the expanded metadata table to ensure that all genes mentioned in the models have entries in the metadata table and vice-versa. Use compare_sys_files_hmm_meta().

comparison <- compare_sys_files_hmm_meta(models_expanded, hmm_meta_expanded)
comparison
#> $`hmm_meta_expanded without sys_genes`
#> # A tibble: 2 × 2
#>   protein.name hmm_meta_expanded
#>   <chr>        <lgl>            
#> 1 HamA2        TRUE             
#> 2 HamB2        TRUE             
#> 
#> $`sys_genes without hmm_meta_expanded`
#> # A tibble: 5 × 2
#>   protein.name sys_genes
#>   <chr>        <lgl>    
#> 1 GenA         TRUE     
#> 2 GenB         TRUE     
#> 3 GenC         TRUE     
#> 4 GenD         TRUE     
#> 5 GenE         TRUE     
#> 
#> $`hmm_meta_expanded with sys_genes`
#> # A tibble: 27 × 3
#>    protein.name hmm_meta_expanded sys_genes
#>    <chr>        <lgl>             <lgl>    
#>  1 HamA1        TRUE              TRUE     
#>  2 HamA         TRUE              TRUE     
#>  3 HamB1        TRUE              TRUE     
#>  4 HamB         TRUE              TRUE     
#>  5 HamC2        TRUE              TRUE     
#>  6 HamC         TRUE              TRUE     
#>  7 Drt1a        TRUE              TRUE     
#>  8 Drt1b        TRUE              TRUE     
#>  9 Drt2         TRUE              TRUE     
#> 10 Drt3a        TRUE              TRUE     
#> # ℹ 17 more rows

Here, we can see that there are two genes in the HMM metadata table that are not used in any of the system models: HamA2 and HamB2. This is perhaps unsuprising, as we already established that there is no system model present for hachiman_type_II, which would use these genes. In addition, there are five genes used by system models which are not referenced in the HMM metadata table: GenA, GenB, GenC, GenD, and GenE. To see which systems use those genes, we can use which_uses().

genes_without_meta <- comparison[["sys_genes without hmm_meta_expanded"]][["protein.name"]]
which_uses(genes_without_meta, models_expanded)
#> $GenA
#> [1] "generic_system_A" "generic_system_A"
#> 
#> $GenB
#> [1] "generic_system_A" "generic_system_A"
#> 
#> $GenC
#> [1] "generic_system_B"
#> 
#> $GenD
#> [1] "generic_system_B"
#> 
#> $GenE
#> [1] "generic_system_B"

Checking that system models are valid

Now we’ve checked that all information is present in the database (and fixed any issues), we can now check each of the system models to ensure that they are actually capable of identifying systems with their given parameters.

Use valid_padloc_model() for assessing model validity. This function provides useful error information for fixing a model if it is invalid. It should be after expanding gene groups.

models_expanded[["brex_type_III"]]
#> $force_strand
#> [1] FALSE
#> 
#> $maximum_separation
#> [1] 3
#> 
#> $minimum_core
#> [1] 6
#> 
#> $minimum_total
#> [1] 6
#> 
#> $core_genes
#> [1] "BrxA"   "BrxC"   "BrxF"   "PglXI"  "BrxHII" "PglZ"  
#> 
#> $secondary_genes
#> [1] "NA"
#> 
#> $neutral_genes
#> [1] "NA"
#> 
#> $prohibited_genes
#> [1] "NA"

valid_padloc_model(models_expanded[["brex_type_III"]])
#> [1] TRUE

models_expanded[["brex_type_I"]]
#> $force_strand
#> [1] FALSE
#> 
#> $maximum_separation
#> [1] 3
#> 
#> $minimum_core
#> [1] 7
#> 
#> $minimum_total
#> [1] 6
#> 
#> $core_genes
#> [1] "BrxA" "BrxB" "BrxC" "PglX" "PglZ" "BrxL"
#> 
#> $secondary_genes
#> [1] "BrxA"
#> 
#> $neutral_genes
#> [1] "NA"
#> 
#> $prohibited_genes
#> [1] "BrxE"

valid_padloc_model(models_expanded[["brex_type_I"]])
#> Error in `valid_padloc_model()`:
#> ! Invalid padloc system model:
#> ℹ `minimum_core` must be <= the number of `core_genes`
#> ✖ `minimum_core` is 7, but number of `core_genes` is 6
#> ℹ `core_genes` should not overlap with `secondary_genes`.
#> ✖ Overlapping genes include: "BrxA"

The model for brex_type_I is invalid, as the required number of core genes is greater than the number of core genes specifed. Additionaly, the gene BrxA has been specified in both the core_genes and secondary_genes.

The wrapper report_padloc_model_validity() is provided to test the validity of all system models in a list and return the output as a list of ?purrr::safely-style sub-lists with two components result and error.

validity_report <- report_padloc_model_validity(models_expanded)

For each model, if it is valid, result is TRUE and error is NULL.

validity_report[["DRT_class_II"]]
#> $result
#> [1] TRUE
#> 
#> $error
#> NULL

If the model is invalid, result is FALSE and error is an ?rlang::rlang_error object that describes the reason the model is invalid.

validity_report[["brex_type_I"]]
#> $result
#> [1] FALSE
#> 
#> $error
#> <error/rlang_error>
#> Error:
#> ! Invalid padloc system model:
#> ℹ `minimum_core` must be <= the number of `core_genes`
#> ✖ `minimum_core` is 7, but number of `core_genes` is 6
#> ℹ `core_genes` should not overlap with `secondary_genes`.
#> ✖ Overlapping genes include: "BrxA"

To return just the error messages for all model in the list, use why_invalid().

why_invalid(validity_report)
#> $brex_type_I
#> <error/rlang_error>
#> Error:
#> ! Invalid padloc system model:
#> ℹ `minimum_core` must be <= the number of `core_genes`
#> ✖ `minimum_core` is 7, but number of `core_genes` is 6
#> ℹ `core_genes` should not overlap with `secondary_genes`.
#> ✖ Overlapping genes include: "BrxA"
#> 
#> $DRT_type_I
#> <error/rlang_error>
#> Error:
#> ! Invalid padloc system model:
#> ℹ `maximum_separation` must be a whole number of type integer or double.
#> ✖ Instead, got character: zero.
#> ℹ `minimum_core` must be a whole number of type integer or double.
#> ✖ Instead, got character: two.
#> ℹ `minimum_total` must be a whole number of type integer or double.
#> ✖ Instead, got character: two.
#> ℹ `minimum_core` must be <= the number of `core_genes`
#> ✖ `minimum_core` is "two", but number of `core_genes` is 2
#> ℹ `minimum_total` must be <= the combined number of `core_genes` and
#> `secondary_genes`.
#> ✖ `minimum_total` is "two", but `core_genes` + `secondary_genes` is 2
#> 
#> $DRT_type_III
#> <simpleError: the condition has length > 1>
#> 
#> $generic_system_A
#> <error/rlang_error>
#> Error:
#> ! Invalid padloc system model:
#> ℹ `minimum_core` must be <= the number of `core_genes`
#> ✖ `minimum_core` is 3, but number of `core_genes` is 2
#> ℹ `minimum_total` must be <= the combined number of `core_genes` and
#> `secondary_genes`.
#> ✖ `minimum_total` is 4, but `core_genes` + `secondary_genes` is 3
#> ℹ `core_genes` should not overlap with `secondary_genes`.
#> ✖ Overlapping genes include: "GenB"
#> ℹ `core_genes` should not overlap with `prohibited_genes`.
#> ✖ Overlapping genes include: "GenA"
#> 
#> $generic_system_B
#> <error/rlang_error>
#> Error:
#> ! Invalid padloc system model:
#> ℹ `core_genes` can not contain the special value "NA".
#> ✖ Got "GenC", "GenD", and "NA".
#> ℹ `secondary_genes` can not contain the special value "NA" if additional genes
#> are specified.
#> ✖ Got "GenE" and "NA".

Some malformed models may fail for reasons outside of the scope of the current validity checks, resulting in cryptic error messages, like that seen for DRT_type_III. These models should be inspected manually for the cause of the error.

validity_report[["DRT_type_III"]]
#> $result
#> [1] FALSE
#> 
#> $error
#> <simpleError: the condition has length > 1>