There are two primary sources of data for the indicators project. One is the LMS and the other is the student administration system. The LMS contains the activity logs while the student administration system contains results and campus data on each student. The trouble is that we only have read access to both of these data sources which makes reporting difficult as we can’t create temporary tables and storage areas for the enormous amount of data we are processing.
The solution was to copy both data sources to a third database system, in this case MS SQL Server, so we can quickly create data stores of processed data. Its taken about two weeks to get to this stage and we still need to develop some sort of system that automatically refreshes the data on a regular basis so that our data is up to date.
For our presentation on Tuesday we are generating a chunk of data that looks at student results against LMS hit counts and makes some comparisons between campuses. We have limited the size of our dataset to 2007,2008, Terms 1 and 2 only. This equates to over one thousand courses and around 175,000 student courses so the amount of data to be processed is huge and its taking ages to run the scripts to aggregate the data into some meaningful format. For example the script that generates basic activity data for each student in this dataset will run for around 30 hours.
There are still some outstanding todos on the technical front before we can sink our teeth into the analysis such as:
- Check the indexing on SQL Server.
- Fix my local MySQL issues.
- Aggregate the activity and grade data based on the four terms being analyzed.
- Backup the SQL Server.
Our approach to the non-technical aspects of the course is emergent. We are pulling some very general data from the system across a wide cross section of courses and students and the results we get will direct the next stage of the project. For example in a recent post I suggested that there is an indication that LMS hitcounts do relate to student success. If there are exceptions to this, and the preliminary data suggest there are, we will investigate further and try to figure out why keeping in mind that this data is potentially indicative but is by no means absolute as there are just too many variables to consider. Again I keep coming back to the fact that teaching and learning is incredibly complex with many dimensions. Technology adds another complication to the mix.
A quick mind dump of some factors influencing our interpretation of and the accuracy of the captured data:
- We are measuring activity vs result. This makes an often wrong assumption that student result is an indicative measure of good teaching and learning.
- CQUni has a wide variety of courses ranging from nursing to engineering, business etcetera. Each individual course has vastly different assessment requirements. An example may be a nursing student has to demonstrate she can find a vein to draw blood while an engineering student has to recite a particular formula. Systemic data doesn’t differentiate between the two.
- Tracking student behavior through a system isn’t necessarily accurate. Example one student downloads a PDF while the next student clicks on the same PDF every time they refer to it.
- Staff development activities. A staff member who has redesigned their course will statistically change the course.
- Policy interventions. A faculty recently mandated minimum course requirements. This will affect the resulting data.
Our anticipated direction into March will be based on the broad data collected over the next few days. We intend to look at this data and seek advice from others as to aspects of the data that require further investigation.