Returns a tibble of distances from the reference document for each corpus in a vector of corpora. This tibble is sorted in the order of increasing distance.

corpora_best_match(
  refDoc,
  corpora,
  metric = "cosine_similarity",
  model_name = "cb_ns_500_10"
)

Arguments

refDoc

character vector for reference document

corpora

character vector for corpora

metric

character vector for metric used to calculate distance, optional (default : "cosine_similarity")

model_name

character vector, optional (default : "cb_ns_500_10")

Value

tibble

Examples

coRPysprofiling::corpora_best_match("kitten meows", c("ice cream is yummy", "cat meowed", "dog barks", "The Hitchhiker's Guide to the Galaxy has become an international multi-media phenomenon"))
#> Model was not found locally. Downloading and processing this pretrained model can take up to 20 minutes in total.
#> This will only need to be run once for each pretrained model.
#> Downloading pretrained model for sentence embedding. This part may take up to 10 minutes with stable internet connection...
#> Download Complete! Processing raw files. This part may also take up to 10 minutes...
#> Downloaded model found. Loading downloaded model...
#> Downloaded model found. Loading downloaded model...
#> Downloaded model found. Loading downloaded model...
#> # A tibble: 4 x 2 #> corpora metric #> <chr> <dbl> #> 1 cat meowed 0.344 #> 2 dog barks 0.404 #> 3 ice cream is yummy 0.835 #> 4 The Hitchhiker's Guide to the Galaxy has become an international multi~ 1.18