Survival analysis is a common procedure in biostatistics. Fortunately, R has extensive packages supporting these methods, including survival and KMsurv.
Generating Data
First, let’s load the packages and generate a dataframe with two dates, one for a hypothetical intervention date, and a second for an event or censoring date. The number of patients will be n = 2000.
The resulting data frame will appear as follows:
To generate a survival object, though, we need to take the difference between the event_date and the origin_date. (We merely generated the dates above to facilitate learning how to take the difference between dates). The difference between dates can be taken as follows:
Fitting and Plotting the Survival Model
Now we will fit a survival model with a constant.
Plotting Survival Curves for Multiple Groups
What about the (common) case in which there are multiple groups?
Let’s start by generating another group of n = 2000 patients whose survival is longer, but with a greater standard deviation.
We’ll need to bind the two data frames together, so we’ll add a column to indicate the group names to each data frame before we stick them together.
We’ll now generate and plot the survival model as before, with the new data frame kmdf_2groups. Note, however, that the term which was formerly constant is now set to the group variable of the new data frame, such that we get two survival curves (one for each group).
Conclusion
Generating survival curves in R is very simple, and can be quite rewarding (particularly if you are using a real data set).