get_eval_embed.Rdget_eval_embed acts as embedding generation & evaluation from co-occurrence data file.
It returns a list of summary including meta-data, evaluation and embedding itself.
get_eval_embed(
CO_file,
freq_file,
dims = seq(100, 1000, 100),
out_dir = NULL,
freq_min = 1000,
threshold = 10,
normalize = TRUE,
use.dataframe = FALSE,
save.summary = TRUE
)Co-occurrence data file with format .csv, .parquet or .Rdata.
If use.dataframe = TRUE, then it should be a R dataframe variable.
The data should be a table with 3 columns index1, index2, count:
index1: Shows the index of code1.
index2: Shows the col index of code2.
count: Shows the counts for certain pair.
If the columns are not index1, index2 and count, it will consider the first 3 columns as the corresponding columns.
Frequency count file with format .csv, .parquet or .Rdata.
If use.dataframe = TRUE, then it should be a R dataframe variable.
The data should be a table with 4 columns index, code, description, freq_count:
index: Shows the index of code.
code: Shows the name of code.
description: Shows the description text of code.
freq_count: Shows the frequency count of code.
If the columns are not index1, code, description and freq_count, it will consider the first 4 columns as the corresponding columns.
A vector of numeric values for dimension, by default is seq(100, 1000, 100).
Output folder, if NULL then by default set to your_working_directory/output.s
The frequency counts cutoff for code filtering. If the counts are less than freq_min, it’ll be filtered
out. By default is 1000.
Integer number, the threshold to get SPPMI matrix, by default is 10.
TRUE or FALSE, to normalize embedding or not. By default is True.
TRUE or FALSE. If TURE, CO_file and freq_file will be accepted as a R data frame variable, other than a file name.
TRUE or FALSE. If FALSE, the function will not save results to location out_dir.
A list of information of meta-data, embedding & evaluation result. It will
be saved in out_dir as .Rdata file.