--- title: "An Introduction to the package OSNMTF" author: "Xiaoyao Yin" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{An Introduction to the package OSNMTF} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette presents the **OSNMTF**,which implements a noval framework named orthogonal sparse non-negative matrix tri-factorization (OSNMTF) to conduct bi-clustering in **R**. The objective is to provide an implementation of the proposed method, which is designed to obtain cancer subtyping, gene set functional enrichemnt and subtype specific drug target identification. It was achived by factorizing the data matrix (e.g. mRNA data with each row as a sample) into the row coefficient matrix, the association matrix and the column coefficient matrix. Orthogonal constraints was introduced to improve the interpretability and rank the importance of genes. Sparsity constraint was introduced to meet the prior knowledge that each cancer subtype should be related to only a few gene sets. ## Installation The latest stable version of the package can be installed from any CRAN repository mirror: ```{r,eval=FALSE} #Install install.packages('OSNMTF') #Load library(OSNMTF) ``` The latest development version is available from https://cran.r-project.org/package=OSNMTF and may be downloaded from there and installed manually: ```{r,eval=FALSE} install.packages('/path/to/file/OSNMTF.tar.gz',repos=NULL,type="source") ``` **Support**: Users interested in this package are encouraged to email to Xiaoyao Yin (yinxy1992@sina.com) for enquiries, bug reports, feature requests, suggestions or OSNMTF-related discussions. ## Usage We will give an example of how to use this packge hereafter. ### Simulation data generation We generate simulated data with five row clusters and four column clusters via the function *simu_data_generation*. The simulated data matrix Sim is a similarity matrix of two group of samples X1 and X2. The first group of samples X1 is comprised of 100 samples with 100 features, belonging to 5 clusters, and each cluster consists of 20 samples with mean {10,20,30,40,50} and variance 1. The second group of samples X2 is comprised of 80 samples with 100 features, belonging to 4 clusters, and each cluster consists of 20 samples with mean {5,10,15,20,25} and variance 1. The data can be generated by running: ```{r,eval=FALSE} simu_data = simu_data_generation() ``` **Structure of the simulated data**: The simulation data has clear data structure, as shown in the Heatmap: