BONMI<-function(W,r,weights=NULL,rsvd.use=FALSE){
#Input:
##W: a list of PPMI matrices, with rownames and colnames being the features
##r: rank
##weights: the weight vector for the PPMI matrices. If weights = NULL, then it will be estimated from data.
##rsvd.use: If rsvd.use=TRUE, we will use the 'rsvd' function to calculate the svd, which is much faster than the 'svd' function when r is small.
m = length(W)
codes = NULL
for(s in 1:m){
codes = union(codes,rownames(W[[s]]))
}
codes = sort(codes)
if(is.null(weights)){
weights = sapply(1:m, function(s){
if(rsvd.use){
set.seed(1)
fit = rsvd(W[[s]],r+1)
}else{
fit = svd(W[[s]],r+1,r+1)
}
w = fit$d[r+1]/sqrt(nrow(W[[s]]))
return(w)
})
}
Wc = matrix(0,nrow=length(codes),ncol=length(codes))
C = matrix(0,nrow=length(codes),ncol=length(codes))
Weis = matrix(0,nrow=length(codes),ncol=length(codes))
rownames(Wc) = colnames(Wc) = codes
rownames(C) = colnames(C) = codes
rownames(Weis) = colnames(Weis) = codes
for(s in 1:m){
id = match(rownames(W[[s]]),codes)
Wc[id,id] = Wc[id,id] + weights[s]*W[[s]]
C[id,id] = C[id,id]+1
Weis[id,id] = Weis[id,id]+weights[s]
}
Wc[C>0] = Wc[C>0]/Weis[C>0]
Wo = Wc; Wo[C==0] = NA
W.new = list()
for(s in 1:m){
Is = match(rownames(W[[s]]),rownames(Wc))
W.new[[s]] = Wc[Is,Is]
}
fits = lapply(1:m, function(s){
if(rsvd.use){
set.seed(1)
return(rsvd(W.new[[s]],r))
}else{
return(svd(W.new[[s]],r,r))
}
})
Xs = lapply(1:m, function(s){
U = embedding(fits[[s]],r)
rownames(U) = rownames(W[[s]])
U
})
Wm = matrix(0,nrow=nrow(Wc),ncol=ncol(Wc))
M = matrix(0,nrow=nrow(Wc),ncol=ncol(Wc))
rownames(Wm) = colnames(Wm) = rownames(Wc)
rownames(M) = colnames(M) = rownames(Wc)
for(s in 1:(m-1)){
for(k in (s+1):m){
name12 = intersect(rownames(W[[s]]),rownames(W[[k]]))
ids = match(name12,rownames(W[[s]]))
idk = match(name12,rownames(W[[k]]))
Osk = Procrustes(Xs[[s]][ids,],Xs[[k]][idk,])
Wsk = Xs[[s]][-ids,]%*%t(Osk)%*%t(Xs[[k]][-idk,])
id1 = match(rownames(W[[s]])[-ids],rownames(Wm))
id2 = match(rownames(W[[k]])[-idk],rownames(Wm))
Wm[id1,id2] = Wm[id1,id2]+Wsk; Wm[id2,id1] = Wm[id2,id1]+t(Wsk)
M[id1,id2] = M[id1,id2] + 1; M[id2,id1] = M[id2,id1] + 1
}
}
Wm[M>0] = Wm[M>0]/M[M>0]
Wm[C>0] = Wc[C>0]
if(rsvd.use){
set.seed(1)
fit.W = rsvd(Wm,r)
}else{
fit.W = svd(Wm,r,r)
}
X = embedding(fit.W,r)
rownames(X) = rownames(Wm)
return(X)
}BONMI function
Description
The BONMI function is the main function for learning a unified low-dimensional representation from a list of PPMI matrices. It performs dimensionality reduction and aligns the matrices to ensure consistency across datasets, optionally using randomized SVD for efficiency.
Arguments
W: A list of PPMI matrices, where each matrix has row and column names corresponding to features.r: The target rank for dimensionality reduction (number of components to retain).weights(optional): A vector of weights for each matrix inW, used to influence their contributions during the embedding process.rsvd.use(optional, defaultFALSE): A boolean flag indicating whether to use randomized SVD (TRUE) or standard SVD (FALSE).
Output
- Returns a list containing the learned low-dimensional representations of the input matrices and the aligned transformation matrices for consistency across datasets.
See also
See BONMI_package for code example.