Skip to contents

Introduction

The DataGen_rare_group function generates synthetic data for rare group analysis, simulating structured datasets for testing and validating algorithms. This vignette demonstrates how to use DataGen_rare_group with example inputs.


Load the Required Library

Ensure the MUGS package is loaded before running the example:


Generate Synthetic Data

Run the DataGen_rare_group function to generate the synthetic dataset:

# Generate data
seed =1
p = 5
n1 = 100
n2 = 100
n.common = 50
n.group = 30
sigma.eps.1 = 1
sigma.eps.2 = 3
ratio.delta = 0.05
network.k = 5
rho.beta = 0.5
rho.U0 = 0.4
rho.delta = 0.7
sigma.rare = 10
n.rare = 20
group.size = 5

DataGen.out <- DataGen_rare_group(seed, p, n1, n2, n.common, n.group, sigma.eps.1, sigma.eps.2, ratio.delta, network.k, rho.beta, rho.U0, rho.delta, sigma.rare, n.rare, group.size)
#> Warning: package 'MASS' was built under R version 4.4.1
#> Warning: package 'fastDummies' was built under R version 4.4.2
#> Warning: package 'rsvd' was built under R version 4.4.1
#> Warning: package 'Rcpp' was built under R version 4.4.2
#> Warning: package 'RcppArmadillo' was built under R version 4.4.3
#> Warning: package 'inline' was built under R version 4.4.3
#> 
#> Attaching package: 'inline'
#> The following object is masked from 'package:Rcpp':
#> 
#>     registerPlugin
#>  >> setting environment variables: 
#> PKG_LIBS = $(SHLIB_OPENMP_CFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS)
#> PKG_CPPFLAGS = -I../inst/include $(SHLIB_OPENMP_CFLAGS)
#> 
#>  >> LinkingTo : RcppArmadillo, Rcpp
#> CLINK_CPPFLAGS =  -I"F:/R-4.4.0/library/RcppArmadillo/include" -I"F:/R-4.4.0/library/Rcpp/include" 
#> 
#>  >> Program source :
#> 
#>    1 : 
#>    2 : // includes from the plugin
#>    3 : #include <RcppArmadillo.h>
#>    4 : #include <Rcpp.h>
#>    5 : 
#>    6 : 
#>    7 : #ifndef BEGIN_RCPP
#>    8 : #define BEGIN_RCPP
#>    9 : #endif
#>   10 : 
#>   11 : #ifndef END_RCPP
#>   12 : #define END_RCPP
#>   13 : #endif
#>   14 : 
#>   15 : using namespace Rcpp;
#>   16 : 
#>   17 : // user includes
#>   18 : 
#>   19 : 
#>   20 : // declarations
#>   21 : extern "C" {
#>   22 : SEXP file5ed86024ff( SEXP n_, SEXP mu_, SEXP sigma_) ;
#>   23 : }
#>   24 : 
#>   25 : // definition
#>   26 : SEXP file5ed86024ff(SEXP n_, SEXP mu_, SEXP sigma_) {
#>   27 : BEGIN_RCPP
#>   28 : 
#>   29 :   using namespace Rcpp;
#>   30 :   int n = as<int>(n_);
#>   31 :   arma::vec mu = as<arma::vec>(mu_);
#>   32 :   arma::mat sigma = as<arma::mat>(sigma_);
#>   33 :   int ncols = sigma.n_cols; // Corrected syntax
#>   34 :   arma::mat Y = arma::randn(n, ncols);
#>   35 :   return wrap(arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma));
#>   36 :   
#>   37 : END_RCPP
#>   38 : }

Examine the Output

Explore the structure and key components of the generated dataset:

# View structure of the output
str(DataGen.out)
#> List of 12
#>  $ delta1        : num [1:100, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ delta2        : num [1:100, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ u.1           : num [1:100, 1:5] 0.206 1.437 0.28 0.71 -0.543 ...
#>  $ u.2           : num [1:100, 1:5] 0.468 1.595 -0.152 -1.13 -0.165 ...
#>  $ S.1           : num [1:100, 1:100] 1.393 -0.529 2.842 1.97 0.438 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:100] "1" "2" "3" "4" ...
#>   .. ..$ : chr [1:100] "1" "2" "3" "4" ...
#>  $ S.2           : num [1:100, 1:100] 9.553 1.097 -6.412 7.909 0.147 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:100] "51" "52" "53" "54" ...
#>   .. ..$ : chr [1:100] "51" "52" "53" "54" ...
#>  $ S.1.0         : num [1:100, 1:100] 2.019 0.0913 2.4329 1.0762 -0.636 ...
#>  $ S.2.0         : num [1:100, 1:100] 2.471 0.644 0.321 -1.221 -0.615 ...
#>  $ X.group.source:'data.frame':  100 obs. of  30 variables:
#>   ..$ .data_1 : int [1:100] 1 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_2 : int [1:100] 0 1 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_3 : int [1:100] 0 0 1 0 0 0 0 0 0 0 ...
#>   ..$ .data_4 : int [1:100] 0 0 0 1 0 0 0 0 0 0 ...
#>   ..$ .data_5 : int [1:100] 0 0 0 0 1 0 0 0 0 0 ...
#>   ..$ .data_6 : int [1:100] 0 0 0 0 0 1 0 0 0 0 ...
#>   ..$ .data_7 : int [1:100] 0 0 0 0 0 0 1 0 0 0 ...
#>   ..$ .data_8 : int [1:100] 0 0 0 0 0 0 0 1 0 0 ...
#>   ..$ .data_9 : int [1:100] 0 0 0 0 0 0 0 0 1 0 ...
#>   ..$ .data_10: int [1:100] 0 0 0 0 0 0 0 0 0 1 ...
#>   ..$ .data_11: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_12: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_13: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_14: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_15: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_16: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_17: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_18: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_19: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_20: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_21: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_22: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_23: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_24: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_25: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_26: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_27: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_28: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_29: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_30: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>  $ X.group.target:'data.frame':  100 obs. of  30 variables:
#>   ..$ .data_1 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_2 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_3 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_4 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_5 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_6 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_7 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_8 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_9 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_10: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_11: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_12: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_13: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_14: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_15: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_16: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_17: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_18: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_19: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_20: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_21: int [1:100] 1 0 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_22: int [1:100] 0 1 0 0 0 0 0 0 0 0 ...
#>   ..$ .data_23: int [1:100] 0 0 1 0 0 0 0 0 0 0 ...
#>   ..$ .data_24: int [1:100] 0 0 0 1 0 0 0 0 0 0 ...
#>   ..$ .data_25: int [1:100] 0 0 0 0 1 0 0 0 0 0 ...
#>   ..$ .data_26: int [1:100] 0 0 0 0 0 1 0 0 0 0 ...
#>   ..$ .data_27: int [1:100] 0 0 0 0 0 0 1 0 0 0 ...
#>   ..$ .data_28: int [1:100] 0 0 0 0 0 0 0 1 0 0 ...
#>   ..$ .data_29: int [1:100] 0 0 0 0 0 0 0 0 1 0 ...
#>   ..$ .data_30: int [1:100] 0 0 0 0 0 0 0 0 0 1 ...
#>  $ pairs.rel.CV  :'data.frame':  305 obs. of  3 variables:
#>   ..$ row : chr [1:305] "17" "116" "21" "81" ...
#>   ..$ col : chr [1:305] "77" "146" "142" "113" ...
#>   ..$ type: chr [1:305] "related" "related" "related" "related" ...
#>  $ pairs.rel.EV  :'data.frame':  305 obs. of  3 variables:
#>   ..$ row : chr [1:305] "10" "1" "50" "42" ...
#>   ..$ col : chr [1:305] "130" "92" "140" "71" ...
#>   ..$ type: chr [1:305] "related" "related" "related" "related" ...

# Print the first few rows and columns of the S.1 matrix
cat("\nFirst 5 rows and columns of S.1:\n")
#> 
#> First 5 rows and columns of S.1:
print(DataGen.out$S.1[1:5, 1:5])
#>            1          2         3         4          5
#> 1  1.3925742 -0.5291006  2.842317  1.969923  0.4383936
#> 2 -0.5291006 12.0059956  7.980915  2.551015 -2.5642034
#> 3  2.8423166  7.9809147  8.981994  3.166662 -2.1011907
#> 4  1.9699229  2.5510151  3.166662  6.419417 -6.7080622
#> 5  0.4383936 -2.5642034 -2.101191 -6.708062  5.6919244

# Print the first few rows and columns of the S.2 matrix
cat("\nFirst 5 rows and columns of S.2:\n")
#> 
#> First 5 rows and columns of S.2:
print(DataGen.out$S.2[1:5, 1:5])
#>            51         52         53         54         55
#> 51  9.5531329  1.0969313 -6.4123782  7.9090043  0.1468358
#> 52  1.0969313  3.6596876 -0.9416046 -2.6162838 -5.4793315
#> 53 -6.4123782 -0.9416046 -1.6856890 -0.7816355 -3.5461235
#> 54  7.9090043 -2.6162838 -0.7816355  9.0438723 -2.6619630
#> 55  0.1468358 -5.4793315 -3.5461235 -2.6619630 -0.3270468

Notes

  1. Custom Parameters: Modify parameters like p, n1, n2, n.group, and others to test different scenarios.
  2. Reproducibility: The seed parameter ensures reproducibility of results.
  3. Applications: Use the generated data for testing rare group detection algorithms or performance benchmarking.

Summary

This vignette demonstrated how to use the DataGen_rare_group function to simulate structured data for rare group analysis. Adjust input parameters to suit specific use cases or experimental setups. For further details, refer to the package documentation.