Returns a tibble of distances from the reference document for each corpus in a vector of corpora. This tibble is sorted in the order of increasing distance.

corpora_best_match(
  refDoc,
  corpora,
  metric = "cosine_similarity",
  model_name = "cb_ns_500_10"
)

Arguments

refDoc	character vector for reference document
corpora	character vector for corpora
metric	character vector for metric used to calculate distance, optional (default : "cosine_similarity")
model_name	character vector, optional (default : "cb_ns_500_10")

Value

tibble

Examples

coRPysprofiling::corpora_best_match("kitten meows", c("ice cream is yummy", "cat meowed", "dog barks", "The Hitchhiker's Guide to the Galaxy has become an international multi-media phenomenon"))
#> Model was not found locally. Downloading and processing this pretrained model can take up to 20 minutes in total.
#> This will only need to be run once for each pretrained model.
#> Downloading pretrained model for sentence embedding. This part may take up to 10 minutes with stable internet connection...
#> Download Complete! Processing raw files. This part may also take up to 10 minutes...
#> Downloaded model found. Loading downloaded model...
#> Downloaded model found. Loading downloaded model...
#> Downloaded model found. Loading downloaded model...
#> # A tibble: 4 x 2
#>   corpora                                                                 metric
#>   <chr>                                                                    <dbl>
#> 1 cat meowed                                                               0.344
#> 2 dog barks                                                                0.404
#> 3 ice cream is yummy                                                       0.835
#> 4 The Hitchhiker's Guide to the Galaxy has become an international multi~  1.18