Regression of Results by Subgroup used to Predict using New Data using R

by DataEdLinks   Last Updated January 14, 2018 12:26 PM

I have a large data file (LMTESTData) that contains internal data and the results of an external assessment. Rather than manually subset, I have tried a number of variants on By and ddply to run a linear regression without success.

colnames(LMTESTData)
 [1] "StudentNumber" "SubjectCode"          "SubjectName"          "ExamMark"    "AssessmentMark"   "U"                "hmkk"            
 [8]  "TESmk"  "Year"

The regression model is lm(hmkk ~ ExamMark + AssessmentMark) for each SubjectCode .

Once the model is working, my next challenge will be to predict hmkk given SubjectCode, ExamMark and AssessmentMark for each StudentNumber.

Dummy Data Set

LMTESTData = data.frame(StudentNumber = 1:100, SubjectCode = c("A","B","C","D","E"),hmkk=rnorm(mean=72, 100),
                ExamMark=rnorm(mean=62, 100),AssessmentMark=rnorm(mean=68, 100))
Tags : r lm


Answers 1


This is classic R lapply-split and if you were delivering just the coefficients (or perhaps predict()-ions) it could be with sapply delivering a matrix:

lapply( split(LMTESTData, LMTESTData$SubjectCode) ),
         function(d) lm(hmkk ~  ExamMark + AssessmentMark, data=d) 
         )
42-
42-
August 01, 2015 07:03 AM

Related Questions