The MASTA algorithm is a semi-supervised learning method and it requires three input data files.
A long form longitudinal data for predicting the time-to-event outcomes [longitudinal
]
A follow up time data to inform the length of follow-up time for each patient [follow_up_time
]
A labeled data with time-to-event outcomes and baseline predictors [survival
]
In Step I of the MASTA algorithm, longitudinal
and follow_up_time
will be used to extract features from estimated subject-specific intensity functions of individual encounters. In Step II of the MASTA algorithm, survival
and follow_up_time
will be used to train and evaluate risk prediction models with survival outcomes. The MASTA
package contains these three data files as a sample.
One subject has one record in this data. The variable train_valid
indicates which cohort each subject belong to, training (1) or validation (2).
?follow_up_time
head(follow_up_time)
## id fu_time train_valid
## 1 1 49.41273 1
## 2 2 13.93018 1
## 3 3 12.55031 1
## 4 4 14.85010 1
## 5 5 80.65708 1
## 6 6 42.64476 1
One subject has one record in this data. The variable event_ind
indicates whether the subject has an event (1) or not (0). For those who do not have event (i.e., event_ind=0
), event_time
in this data set should be the same as the fu_time
in follow_up_time. The current version of the MASTA package requires that at least one baseline predictor is included in this data.
?survival
head(survival)
## id event_ind event_time cov_1 cov_2 cov_3
## 1 1 1 9.36345 79 1 0
## 2 2 0 13.93018 81 0 0
## 3 3 0 12.55031 55 1 1
## 4 4 0 14.85010 72 1 0
## 5 5 0 80.65708 83 1 1
## 6 6 1 15.70431 47 1 0