This is just a quick update on the Indicators project as described in an earlier post. We are currently re-arranging the data to ensure it’s in a format that is quick and easy to query and is limited to just the data we are interested in. Initially our investigations are limited to 2007 and 2008, terms 1 and 2, and only courses with more than 50 Flex students. So the scope of our research is approx 390 courses with approx 40,000 student instances. We’ve limited the current research to flex students as this cohort engages wholly online and to include other cohorts would confuse the result. Some tasks that are being or have been completed include:
• An instance of MySQL was configured on a de-commissioned server that was available and met our requirements.
• The Blackboard database holds data from 2004 to the present day which equates to 200 million rows of data. As we are only interested in 2007 term 1 and 2, and 2008 term 1 and 2 we are migrating only these terms across to the MySQL instance to limit the size of the data set and improve query times. It’s anticipated that the resulting data set will be approx 10-20 million rows.
• The results database has in excess of 500,000 rows and by drawing only the rows required we could reduce this to around 50,000 rows.
• Once the data sets have been rationalized we can create some intermediate tables based on some processed data from the original datasets. Queries that take a long time to run can be executed and the results stored in the MySQL db for analysis.
• Some more research is required to look at dividing student activity into categories. For example distinguishing between an administrative hit and a pedagogical hit. An administrative hit might be a click on the folder to access a course PDF that would be a pedagogical hit. Its not a good method I know but it may give us an idea of “signal-to-noise” within a course.
• One thing that is going to take some time is the session data. Login and logout times are recorded by the system and this can be extracted and reported on to determine when and for how long the students are using the system during term.