BONMI function

Description

The BONMI function is the main function for learning a unified low-dimensional representation from a list of PPMI matrices. It performs dimensionality reduction and aligns the matrices to ensure consistency across datasets, optionally using randomized SVD for efficiency.

BONMI<-function(W,r,weights=NULL,rsvd.use=FALSE){
  #Input: 
  ##W: a list of PPMI matrices, with rownames and colnames being the features
  ##r: rank
  ##weights: the weight vector for the PPMI matrices. If weights = NULL, then it will be estimated from data. 
  ##rsvd.use: If rsvd.use=TRUE, we will use the 'rsvd' function to calculate the svd, which is much faster than the 'svd' function  when r is small. 
  
  m = length(W)
  codes = NULL
  for(s in 1:m){
    codes = union(codes,rownames(W[[s]]))
  }
  codes = sort(codes)
  
  if(is.null(weights)){
    weights = sapply(1:m, function(s){
     if(rsvd.use){
        set.seed(1)
        fit = rsvd(W[[s]],r+1)
    }else{
      fit = svd(W[[s]],r+1,r+1)
    }
    w = fit$d[r+1]/sqrt(nrow(W[[s]]))
    return(w)
    })
  }
  
  Wc = matrix(0,nrow=length(codes),ncol=length(codes))
  C = matrix(0,nrow=length(codes),ncol=length(codes))
  Weis = matrix(0,nrow=length(codes),ncol=length(codes))
  
  rownames(Wc) = colnames(Wc) = codes
  rownames(C) = colnames(C) = codes
  rownames(Weis) = colnames(Weis) = codes
  
  for(s in 1:m){
    id = match(rownames(W[[s]]),codes)
    Wc[id,id] = Wc[id,id] + weights[s]*W[[s]]
    C[id,id] = C[id,id]+1
    Weis[id,id] = Weis[id,id]+weights[s] 
  }
  Wc[C>0] = Wc[C>0]/Weis[C>0]
  Wo = Wc; Wo[C==0] = NA
  
  W.new = list()
  for(s in 1:m){
    Is = match(rownames(W[[s]]),rownames(Wc))
    W.new[[s]] = Wc[Is,Is]
  }
  
  fits = lapply(1:m, function(s){
    if(rsvd.use){
      set.seed(1)
      return(rsvd(W.new[[s]],r))
  }else{
    return(svd(W.new[[s]],r,r))
    }
  })
  
  Xs = lapply(1:m, function(s){ 
    U = embedding(fits[[s]],r)
    rownames(U) = rownames(W[[s]])
    U
  })
  
  Wm = matrix(0,nrow=nrow(Wc),ncol=ncol(Wc))
  M = matrix(0,nrow=nrow(Wc),ncol=ncol(Wc))
  rownames(Wm) = colnames(Wm) = rownames(Wc) 
  rownames(M) = colnames(M) = rownames(Wc) 
  
  for(s in 1:(m-1)){
    for(k in (s+1):m){
      name12 = intersect(rownames(W[[s]]),rownames(W[[k]]))
      ids = match(name12,rownames(W[[s]]))
      idk = match(name12,rownames(W[[k]]))
      Osk = Procrustes(Xs[[s]][ids,],Xs[[k]][idk,])
      Wsk = Xs[[s]][-ids,]%*%t(Osk)%*%t(Xs[[k]][-idk,])
      id1 = match(rownames(W[[s]])[-ids],rownames(Wm))
      id2 = match(rownames(W[[k]])[-idk],rownames(Wm))
      Wm[id1,id2] = Wm[id1,id2]+Wsk; Wm[id2,id1] =  Wm[id2,id1]+t(Wsk)
      M[id1,id2] = M[id1,id2] + 1; M[id2,id1] = M[id2,id1] + 1
    }
  }
  Wm[M>0] = Wm[M>0]/M[M>0]
  Wm[C>0] = Wc[C>0]

 
  if(rsvd.use){
    set.seed(1)
    fit.W = rsvd(Wm,r)
  }else{
  fit.W = svd(Wm,r,r)
  }
  
  X = embedding(fit.W,r)
  rownames(X) = rownames(Wm)
  return(X)
}

Arguments

  • W: A list of PPMI matrices, where each matrix has row and column names corresponding to features.
  • r: The target rank for dimensionality reduction (number of components to retain).
  • weights (optional): A vector of weights for each matrix in W, used to influence their contributions during the embedding process.
  • rsvd.use (optional, default FALSE): A boolean flag indicating whether to use randomized SVD (TRUE) or standard SVD (FALSE).

Output

  • Returns a list containing the learned low-dimensional representations of the input matrices and the aligned transformation matrices for consistency across datasets.

See also

See BONMI_package for code example.