Spliting a database into groups
Once a padloc-db is validated, it is useful to split it into groups for batched searches with padloc.
Read sys_groups.txt using
read_sys_groups().
path <- padlocdev_example("sys_groups.txt")
#> Error:
#> ! Can't find package file.
sys_groups <- read_sys_groups(path)
#> Error in eval(expr, envir, enclos): object 'path' not found
head(sys_groups)
#> Error in eval(expr, envir, enclos): object 'sys_groups' not foundUse filter_models() to filter a list of padloc models
for those belonging to a particular group, as defined in the system
groups table.
path <- padlocdev_example("sys")
#> Error:
#> ! Can't find package file.
models <- multi_read_padloc_model(path)
#> Error in eval(expr, envir, enclos): object 'path' not found
models_group_1 <- filter_models(models, sys_groups, "group_1")
#> Error in eval(expr, envir, enclos): object 'sys_groups' not found
names(models_group_1)
#> Error in eval(expr, envir, enclos): object 'models_group_1' not foundTo pull
hmm_meta <- padlocdev_example("hmm_meta.txt") |> read_hmm_meta()
#> Error:
#> ! Can't find package file.
hmm_meta_expanded <- expand_hmm_meta(hmm_meta)
#> Error in eval(expr, envir, enclos): object 'hmm_meta' not found
hmm_meta_group_1 <- filter_hmm_meta(models_group_1, hmm_meta_expanded)
#> Error in eval(expr, envir, enclos): object 'models_group_1' not foundTo copy HMMs from a particular system group to a new directory use
system_cp_hmm(). This, and related system_cp_*
functions, use the ?fs package to manipulate system
files.
Before trying to copy any HMMs, it’s important to verify that the
name of the HMM file corresponds with the accession
i.e. ACC field in the HMM header. This is not important for
function of the database, though it is probably good practice, and it is
important for this exercise as the accession is what we’ll be using to
identify HMMs to copy.
Use verify_hmm_names() to check that the names match. It
returns a list of two tibbles,
names_match which lists HMMs with matching file and
accession names, and names_mismatch which lists HMMs with
different file and accession names.
group_1_names_verify <- verify_hmm_names(models_group_1)
#> Error in eval(expr, envir, enclos): object 'models_group_1' not found
group_1_names_verify
#> Error in eval(expr, envir, enclos): object 'group_1_names_verify' not foundIn this case all names match, so we can continue. Use
filter_hmm_meta() to extract rows from the HMM metadata
table that are actually relevant to
system_cp_hmm()
#> Error in system_cp_hmm(): argument "new_path" is missing, with no defaultUse the wrapper divide_database() to create
sub-databases for all groups in the system groups table.
# divide_database(
# sys_expanded = sys_expanded,
# sys_groups = sys_groups,
# hmm_meta_expanded = hmm_meta_expanded,
# sys_meta = sys_meta
# path = "path/to/database",
# new_path = "/path/to/new/database"
# )