by DataEdLinks
Last Updated January 14, 2018 12:26 PM

I have a large data file (LMTESTData) that contains internal data and the results of an external assessment. Rather than manually subset, I have tried a number of variants on By and ddply to run a linear regression without success.

```
colnames(LMTESTData)
[1] "StudentNumber" "SubjectCode" "SubjectName" "ExamMark" "AssessmentMark" "U" "hmkk"
[8] "TESmk" "Year"
```

The regression model is `lm(hmkk ~ ExamMark + AssessmentMark)`

for each SubjectCode .

Once the model is working, my next challenge will be to predict hmkk given SubjectCode, ExamMark and AssessmentMark for each StudentNumber.

Dummy Data Set

```
LMTESTData = data.frame(StudentNumber = 1:100, SubjectCode = c("A","B","C","D","E"),hmkk=rnorm(mean=72, 100),
ExamMark=rnorm(mean=62, 100),AssessmentMark=rnorm(mean=68, 100))
```

This is classic R lapply-split and if you were delivering just the coefficients (or perhaps `predict()`

-ions) it could be with `sapply`

delivering a matrix:

```
lapply( split(LMTESTData, LMTESTData$SubjectCode) ),
function(d) lm(hmkk ~ ExamMark + AssessmentMark, data=d)
)
```

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger