Chapter 3 Methods

The online tool adopted random forest (Breiman 2001) to reduce false positive rate by incorporating information from multiple metabolic analytes.

There are many trees in a random forest model and each tree has a vote in a binary classification. We defined the RF Score as the fraction of votes for true positive. The ROC curve is drawn from the RF Score. There is a direct correlation between RF Score and sensitivity. In this case we decided to use sensitivity as the cutoff to separate true positives and false positives in random forest.

In order to decide the default cutoff, we repeated the 10-fold cross validation for 1000 times and calculated RF Score at which we can get the same sensitivity as the state NBS program for each repeat. The median of RF Score is considered at which we can get expected sensitivity as the state NBS program.

References

Breiman, Leo. 2001. “Random Forests.” Machine Learning 45 (1). Springer: 5–32.