Skip to contents

Spliting a database into groups

Once a padloc-db is validated, it is useful to split it into groups for batched searches with padloc.

Read sys_groups.txt using read_sys_groups().

path <- padlocdev_example("sys_groups.txt")
#> Error:
#> ! Can't find package file.
sys_groups <- read_sys_groups(path)
#> Error in eval(expr, envir, enclos): object 'path' not found
head(sys_groups)
#> Error in eval(expr, envir, enclos): object 'sys_groups' not found

Use filter_models() to filter a list of padloc models for those belonging to a particular group, as defined in the system groups table.

path <- padlocdev_example("sys")
#> Error:
#> ! Can't find package file.
models <- multi_read_padloc_model(path)
#> Error in eval(expr, envir, enclos): object 'path' not found
models_group_1 <- filter_models(models, sys_groups, "group_1")
#> Error in eval(expr, envir, enclos): object 'sys_groups' not found
names(models_group_1)
#> Error in eval(expr, envir, enclos): object 'models_group_1' not found

To pull

hmm_meta <- padlocdev_example("hmm_meta.txt") |> read_hmm_meta()
#> Error:
#> ! Can't find package file.
hmm_meta_expanded <- expand_hmm_meta(hmm_meta)
#> Error in eval(expr, envir, enclos): object 'hmm_meta' not found
hmm_meta_group_1 <- filter_hmm_meta(models_group_1, hmm_meta_expanded)
#> Error in eval(expr, envir, enclos): object 'models_group_1' not found

To copy HMMs from a particular system group to a new directory use system_cp_hmm(). This, and related system_cp_* functions, use the ?fs package to manipulate system files.

Before trying to copy any HMMs, it’s important to verify that the name of the HMM file corresponds with the accession i.e. ACC field in the HMM header. This is not important for function of the database, though it is probably good practice, and it is important for this exercise as the accession is what we’ll be using to identify HMMs to copy.

Use verify_hmm_names() to check that the names match. It returns a list of two tibbles, names_match which lists HMMs with matching file and accession names, and names_mismatch which lists HMMs with different file and accession names.

group_1_names_verify <- verify_hmm_names(models_group_1)
#> Error in eval(expr, envir, enclos): object 'models_group_1' not found
group_1_names_verify
#> Error in eval(expr, envir, enclos): object 'group_1_names_verify' not found

In this case all names match, so we can continue. Use filter_hmm_meta() to extract rows from the HMM metadata table that are actually relevant to

system_cp_hmm()
#> Error in system_cp_hmm(): argument "new_path" is missing, with no default

Use the wrapper divide_database() to create sub-databases for all groups in the system groups table.

 # divide_database(
 #   sys_expanded = sys_expanded,
 #   sys_groups = sys_groups,
 #   hmm_meta_expanded = hmm_meta_expanded,
 #   sys_meta = sys_meta
 #   path = "path/to/database",
 #   new_path = "/path/to/new/database"
 # )

{r, message = FALSE} # #

{r, message = FALSE} # #

{r, message = FALSE} # #