I have a large data file (LMTESTData) that contains internal data and the results of an external assessment. Rather than manually subset, I have tried a number of variants on By and ddply to run a linear regression without success.
colnames(LMTESTData)  "StudentNumber" "SubjectCode" "SubjectName" "ExamMark" "AssessmentMark" "U" "hmkk"  "TESmk" "Year"
The regression model is
lm(hmkk ~ ExamMark + AssessmentMark) for each SubjectCode .
Once the model is working, my next challenge will be to predict hmkk given SubjectCode, ExamMark and AssessmentMark for each StudentNumber.
Dummy Data Set
LMTESTData = data.frame(StudentNumber = 1:100, SubjectCode = c("A","B","C","D","E"),hmkk=rnorm(mean=72, 100), ExamMark=rnorm(mean=62, 100),AssessmentMark=rnorm(mean=68, 100))
This is classic R lapply-split and if you were delivering just the coefficients (or perhaps
predict()-ions) it could be with
sapply delivering a matrix:
lapply( split(LMTESTData, LMTESTData$SubjectCode) ), function(d) lm(hmkk ~ ExamMark + AssessmentMark, data=d) )