Identifying ‘at risk’ students is the easy bit

We are currently trialling a simple system designed to help teaching academics identify students who may be at risk of failing. We are hoping to identify struggling students earlier than we previously could so as to more efficiently target support and academic interventions. Basically we are amalgamating data from the student information system (SIS) and the learning management system (LMS) which in our case, is Moodle. At my university we have a large array of academic disciplines and students can study via a number of modes other than online or distance learning. This makes predictions based on previous student results, student demographics and student patterns of online behavior from the LMS very difficult at best. Our thinking, based in complexity science, is the information needs to be directed to the point and time where it can best be used to influence the outcome. In our case, and for this particular trial, the point of need sits with the academic teacher.

The following is a ‘dummy’ screen grab from the system which is explained further down.


  • Mail merge, simply allows the teaching academic to select multiple students in order to send a personalized email.
  • Prior fail indicates whether or not this student has failed this particular course previously.
  • Pass rate is the number of courses this student has passed out of the number they have attempted.
  • Load is the number of courses that this student is attempting this term.
  • GPA is a self explanatory.
  • Clicking on the student name takes the teacher to another page with more details on the students academic history.
  • Week. This is arranged by weeks of the term. (Moodle courses at my university are made available to students two weeks prior to the official term start). The numbers in these columns are simply the number of clicks that these students have made within the course site during that week.

As the title of this post suggests, this is the easy part. Once a student has been identified by the teaching academic as potentially being at risk of failing, then what? The reasons that students fall into the ‘at risk’ category are as extraordinarily diverse as our students. Some may be struggling academically, some financially or personally, and some (like me) are just struggling for time. Every student’s situation is different and a one-size-fits-all approach to intervention is not going to work. This is why I continue to be fascinated by the extraordinary effort universities are putting into refining their ‘at risk’ student identification algorithms. Accurate statistical models are all well and good, but  all rather pointless without an intervention strategy that is capable of dealing with diversity and complexity.

Quick Indicators Project Update

Some progress was made today on the Indicators project. We’ve consolidated back to to four broad aims.

  1. Col’s Masters project.
  2. Ken’s Masters project.
  3. Produce something useful for the wider CQUniversity community.
  4. Produce some publications.

One of the first things that we need to do is to get some publications happening as soon as possible based on the Indicators project.. Some ideas were bandied around today such as:

  • Col and Ken. Shane Dawson has several publications that link nicely to the data we are developing with the Indicators. One suggestion was to draw on some of the research he performed at QUT and compare the data extracted there with our Indicators data extracted in the CQUniversity context.
  • Nathaniel, Rolley and Col. Perhaps some data visualizations around some general indicators data such as overall results in the CQUni context.

The Indicators Project

The following writing is an attempt to get the project down on paper and arrange my thinking moving forward. It summarizes information already posted and articulates (poorly) our intentions. Some considerations to note:

  • the project is very flexible and based on extracted data may tangent off into different directions depending on what the data indicates. Like all data mining projects of this vein it can really only ever produce an indication of a pattern without absolute confirmation, hence the title.
  • The project is agile and collaborative using a safe-fail approach. Expect many iterations.

The Indicators Project.
Course management systems (CMS), also known as learning management systems (LMS) are Web based environments that allow instructors to deliver courses online by providing a set of tools that facilitate learning such as documents, multimedia, quizzes and assignments. The Indicators project at CQUniversity will attempt to analyze data captured by the commercial Blackboard LMS based on a framework that uses the seven principles of good practice in undergraduate education in order to achieve three broad goals.
•    Exploration. We intend to benchmark courses based on user interactions against the framework in order to identify potential lead( and lag)  indicators of effective and  less effective practice within CQUniversity’s unique context. We can evaluate the seven principles framework in the context of our eLearning technologies.
•    Information sharing. The information captured and processed can be made available to staff and potentially students to better inform their future practices and behaviors. More on this further down but suffice to say we wish to produce something useful to the staff and students rather than create another metric for administrators to measure course/staff/student performance. Many of our teaching staff are subject matter experts who’ve received only limited instruction in course design and especially online course design so a tool that can visualize their course mechanics in context could be useful.
•     Technical Information. Information on tool and feature usage as well as user behavior over time can be modeled to inform the project folk responsible for the transition to a single LMS (Moodle) so they can make informed decisions based on such things as peak loads, peak load times and outage planning.

This work will build on previous work such as Dawson and Heathcoate who used LMS data mining to supplement qualitative course evaluation with quantitative LMS data. On particularly interesting aspect of their research looked at disucussion forum usage.

“Data on discussion forums usage was mined to look at a) the overall averages of contributions to discussion forums, b) the amount of posting and replies (where replies may indicate an ongoing conversation) and c) the amount of learner to learner interactions vs learner to teacher or teacher to learner interactions (which may indicate how much peer support and interaction is occurring)” (Heathcote and Dawson 2005)

CMS are widely used throughout Universities around the world and a great deal of effort has been put into research and ideas on how to structure online course environments, however most of these studies do not rely on empirical data from the actual use of a web based information system (Berg 2004). In an effort to provide some empirical data in the CQUniversity context, we investigated the possibility of extracting user behavior data from our Blackboard LMS. Most LMS, such as CQUniversity’s Blackboard system, record every click users make within the system and store it in a database that can be used for further reporting. Due to the quantity of data that is stored, it’s often advised by the vendors to purge this activity data on a regular basis to prevent excessive database size causing slow query times. Due to a combination of circumstances this hasn’t been happening at CQUniversity and we now have a complete record of every users click within the system since it was first implemented in 2005.

We began with a mountain of data stored in two main locations.
•    The student administration system containing student results and demographics
•    The LMS database containing user activities within the system.
This separation of data and security issues on the databases that held the information meant that we couldn’t query the data as quickly and easily as we would have liked. The first step was to create a project database with the appropriate permissions that enabled us to query and modify large amounts of data very quickly and easily without affecting the operation of the original data sources. Following this process we still had an enormous amount a data to contend with and we decided to limit the data set to four terms across two years. Our rationalized data-set contains:
•    Over 151,000 student/course units
•    Over 26,000 unique students.
•    Over 1500 courses.
•    Over 34,000,000 hits or clicks (interestingly on 1.8 million of these are direct hits on documents and links).
•    Over 266,000 forum posts or replies.

Armed with this wealth of data we began a project that looked at ways of utilizing the data to produce something useful and worthwhile to staff, students and administrators. This is proving to be a significant challenge as indicated by other researchers such as Dawson & McWilliam in a report to the Australian Learning and Teaching Council.

“With IT systems capturing a variety of student and teaching data, the challenge for (higher education institutions) is to interpret this data readily and accurately and translate such findings into improved teaching practice“(Dawson 2008)

The seven principles of good practice in undergraduate education (Chickering 1996) has provided us with a framework by which we can start analyzing the mountains of data and early indications are that the data tends to loosely affirm the chosen framework. For example in previous posts I’ve produced a graph of the average hits for students across all 151,000 units.

Avg Hits per Grade
Avg Hits per Grade

This graph could be interpreted to support the 5th principle which “emphasized time on task” while the following graph could be interpreted as supporting either the 1st or 2nd principles which are “encourages contact between students and faculty” and “develops reciprocity and cooperation among students” as it represents student grade vs discussion board participation.

Discussion board participation
Discussion board participation

So assuming that the seven principles is a good place to start the next step is to develop a series of questions based on the seven principles to interpret the data. For example the 1st principle is “encourages contact between students and faculty”. With the data set we have how can we query the data to test against this principle? The following questions could be applicable:

  • Does the course have a discussion board?
  • What is the staff/student ratios of postings to replies?
  • What percentage of the students are visiting the discussion board and how many of these are actively participating?
  • Does discussion board participation have an effect on the student’s grade?
  • Does staff participation have an effect on the student’s grade?
  • How often did announcements get posted in the course?
  • How often did emails get sent to the student cohort?
  • Are there differences in discussion participation rates based on cohort? (online, campus, international)

Once we have a full set of questions and the data has been extracted and manipulated into a usable format we can make comparisons of the data based on a variety of different criteria. For example student results for specific cohorts, such as campus and age, can be evaluated against aspects of the LMS like discussion boards usage, hitcounts, contents etc.

The project also has some parallel activities associated with it that will take advantage of some of the processed data. One of these is the personal learning environments (PLE) project who are interested from the perspective of how students are interacting with the available tools within the LMS. The library folk are interested in gathering some information on how students are accessing course related materials from the library website and another possibility is finding candidate courses for curriculum re-design based on observations of interactions with the course design.

The project may also shed some light on other research findings to do with online learning, that can be tested in the distinct context of CQUniversity such as:

Online discussion/learning may be more supportive of experimentation, divergent thinking, exploration of multiple perspectives, complex understanding, and reflection than F2F discussion. (Parker and Gemino, 2001;Picciano 2002 taken from Swan 2004)

Online discussion/learning may be less supportive of convergent thinking, instructor directed inquirey, and scientific thinking than F2F discussion. (Parker and Gemino, 2001;Picciano 2002 taken from Swan 2004)

The quantity and quality of instructor interactions with students is linked to student learning. (Jiang and Ting, 2000 taken from Swan 2004)

Student learning is related to the quantity and quality of postings in online discussions and to the value instructors place on them.(Jiang and Ting, 2000 taken from Swan 2004)

Interactions with course interfaces are a real factor in learning; difficult or negative interactions with interfaces can depress learning. (Hillman et al., 1994; Hewit, 2003 taken from Swan 2004

So far I’ve described the mechanical or easy side of the Indicators project where we look at what’s happened and try to infer something from it. In David’s Blog he describes these as lag indicators whereas the more interesting and last part of the project is to develop some lead indicators to assist staff and students. We haven’t discussed this to any extent but to it might take the form of an RSS feed to staff that provides information to staff of potential issues. The example David uses is an early quiz that may be testing for a key concept that is built on throughout the course. If there is an unexpectantly high failure rate in this early quiz then the academic can quickly adjust the content to ensure that this concept is covered appropriately to prevent issues later on in the course. However this is a long way down the road and there is a lot of work before that.

I recently received some polite, and not unjustified, criticism of the project because it didn’t have clearly defined methodology and planning structure. However we made the point early to develop the project in much the same way as agile software development occurs; with many iterations, teamwork, collaboration and process adaptability throughout the life of the project drawing upon the advice and experience of as many other folk as are willing to talk to us. A safe fail approach is being used to expect to see many iterations as the project progresses.


Alan Berg, V. M., Frank Benneker Blackboard 6 usage patterns and implications for the University of Amsterdam. Central Computing Services, Universiteit van Amsterdam. Amsterdam, Unversiteit of Amsterdam: 5.

Chickering, A. W., Ehrmann, S. C. (1996). Implementing the seven principles: Technology as lever. Retrieved on February 20, 2009 from

Heathcoat & Dawson. (2005). “Data Mining for Evaluation, Benchmarking and Reflective Practice in a LMS.” E-Learn 2005: World conference on E-Learning in corporate, government, healthcare and higher education.

Dawson, S. M., Erica. (2008). Investigating the application of IT generated data as an indicator of learning and teaching performance, Queensland University of Technology and the University of British Columbia: 41.

Swan, K. (2004). Relationships between Interactions and Learning in Online Environments., SLOAN-C: A collection of research findings and their implications for practice.