Risk analytics. What are we learning?

In a previous post I introduced an ‘at risk’ system we are trialing this term. Very simply, we are combining data from Moodle and the student information system into a webpage whereby the teaching academic can quickly ascertain which students may need some extra support. Feedback from teaching staff has identified a number of improvements that will need to be made for the next version which is due to start in term 2 of this year. These are:

  • More intervention options are required at different levels. At the moment we are only providing the teaching staff with a mail merge option as an initial intervention. This is obviously inadequate and there should be a range of intervention options for teaching staff to choose from. These may include things like phone calls or SMS messages and the like. The important point being that the details required for the teaching academic to conduct an intervention need to be provided.
  • Linked with intervention options are the levels of intervention. One thing we are seeing is students who have low levels of Moodle activity across a number of their courses. The intervention for a student who is not engaging across all of their courses is probably not the responsibility of a particular teaching staff member. The system should notify the student support centre by default in addition to providing the teaching academic with the option of referring the student to the student support centre.
  • Tracking triage. At the moment our system does not track interventions that are made, based on the information provided. I am thinking of rectifying this with a doctor’s surgery approach. So when an academic raises an intervention event for a particular student, a case is generated and ‘patient’ history is tracked. This allows for the tracking of intervention effectiveness along with a range of reporting options. It also generates useful intelligence for the teachers taking on these students in future terms.
  • The order of the student list. At the moment the students are sorted based solely on their GPA. The next version uses a basic algorithm that sorts the students based on the urgency with which they need support. This takes into account the student’s current course load, GPA and Moodle activity. The Moodle activity component is looking for consecutive weeks of low Moodle activity that we know from past experience is a useful indicator of a struggling student.
  • Key dates. At the moment the system does not recognize key dates throughout the term. It needs to identify mid term breaks and assessment due dates for future version.
  • Assessment submission and gradebook information. At the moment, this system does not recognize assessment due dates or Moodle gradebook grades. The grades that students receive for early assessments are a valuable indicator of how that student is tracking. This system needs to include assessment data in order to present a more complete picture to the teaching academics.

One other thing is bugging me with regards to this system. That is that the language that we are using is very much deficit language. Eg ‘at risk’ student. This suggests that the student is the problem and I do not believe that this accurately portrays the purpose of this system, which is to more efficiently personalize student support. Perhaps ‘student success indicators’ is a more positive way to frame it?

Learning analytics and TPACK

David’s recent blog post talks about institutional eLearning using TPACK as a lens. A brief conversation with David yesterday resulted in an idea for using TPACK to analyse how universities are applying learning analytics.

TPACK


David’s post talked about the three facets of TPACK:
•    Technological knowledge – how to use technologies.
•    Pedagogical knowledge – how to teach.
•    Content knowledge – knowledge of what the students are meant to be learning.
TPACK suggests that the most effective eLearning results when these knowledge types are combined.

Learning analytics
University information systems collect an amazing array of data that can be used to inform and enhance learning and teaching. The process of analyzing the collected data for the enhancement of learning and teaching is broadly known as learning analytics.

The context
One of the things that was plainly evident to me from the many conversations at the ASCILITE 2012 and Southern SOLAR flare conferences last year, was that many, if not most, universities are looking to learning analytics to improve their student retention rates. CQUniversity is no different in that we are also looking at how our collected data can contribute to a reduction in our student attrition statistics.

Our paper to ASCILITE last year pointed out some likely problems that universities will face when attempting to implement learning analytics in any meaningful way, not the least of which is the problem of organisational silos. David’s post hints at this when he says that technological knowledge is typically housed within the institutional IT division; pedagogical knowledge is housed within the central learning and teaching division and the content knowledge is housed within the faculties.

From what I am hearing, learning analytics projects in universities are mostly encapsulated within the institution IT division and it is not hard to understand why. From a senior manager’s perspective, learning analytics is about data from IT systems and this falls into the domain of the institution’s IT division. This would seem to be a logical choice on the surface, but I would suggest this approach is far less than ideal as it fails to include context.

As we pointed out in our ASCILITE paper, the interpretation of learning analytics data is almost impossible without reference to the context from which it was taken due to its non-causal nature. Then we have the problem whereby learning analytics data tends to produce clear patterns at the macro-levels (institution/school/faculty) and seemingly random patterns at the micro-levels (course/student group/students). Based on our five years of experience with a learning analytics project, any of these problems preclude the (useful) application of learning analytics by any single organisational entity within a university.

I am hoping that TPACK can help me explain things a little better. For example, we know that attrition is a complex beast and the reasons for student attrition vary greatly from student to student. Given the diversity of reasons for why students drop out of their university studies, it would seem that an approach to addressing student attrition based entirely on technology is woefully inadequate. Like effective eLearning, using learning analytics to address student attrition also needs to include pedagogical and content knowledge in addition to technological knowledge. It’s the intersection point of these three knowledge areas that will be most effective.

Where to from here
My blog post from last week gave some insight into how we are trying to address student attrition by providing teaching academics with better information within their learning and teaching context. I am currently expanding this trial and trying to move away from the overly technical approaches used to date. You will notice from the previous post that once a student who is at risk of failing has been identified, we provided the teaching academic with the ability to email the student as a way of conducting an initial intervention. TPACK would suggest that this does not go far enough in providing the teaching academic (or students) with pedagogical suggestions on what to do next. So I am now thinking about how include pedagogical advice or suggestions into the system for the next version that is due to start next term.

One example is that the system in its current form only considers the student within a single course. Some students have next to no Moodle activity across all of the courses they are currently attempting. The intervention requirements for this student exceeds the responsibility for any single teaching academic. So perhaps there is an opportunity to include the student support area as an alternative to a simple course based intervention. Additionally, there needs to be more intervention options for the teaching academics besides the mail-merge facility. Not to mention a mechanism for getting some of this data out to the students so they better appreciate their situation.

Any comments or suggestions would be warmly welcomed?

Identifying ‘at risk’ students is the easy bit

We are currently trialling a simple system designed to help teaching academics identify students who may be at risk of failing. We are hoping to identify struggling students earlier than we previously could so as to more efficiently target support and academic interventions. Basically we are amalgamating data from the student information system (SIS) and the learning management system (LMS) which in our case, is Moodle. At my university we have a large array of academic disciplines and students can study via a number of modes other than online or distance learning. This makes predictions based on previous student results, student demographics and student patterns of online behavior from the LMS very difficult at best. Our thinking, based in complexity science, is the information needs to be directed to the point and time where it can best be used to influence the outcome. In our case, and for this particular trial, the point of need sits with the academic teacher.

The following is a ‘dummy’ screen grab from the system which is explained further down.

Image

  • Mail merge, simply allows the teaching academic to select multiple students in order to send a personalized email.
  • Prior fail indicates whether or not this student has failed this particular course previously.
  • Pass rate is the number of courses this student has passed out of the number they have attempted.
  • Load is the number of courses that this student is attempting this term.
  • GPA is a self explanatory.
  • Clicking on the student name takes the teacher to another page with more details on the students academic history.
  • Week. This is arranged by weeks of the term. (Moodle courses at my university are made available to students two weeks prior to the official term start). The numbers in these columns are simply the number of clicks that these students have made within the course site during that week.

As the title of this post suggests, this is the easy part. Once a student has been identified by the teaching academic as potentially being at risk of failing, then what? The reasons that students fall into the ‘at risk’ category are as extraordinarily diverse as our students. Some may be struggling academically, some financially or personally, and some (like me) are just struggling for time. Every student’s situation is different and a one-size-fits-all approach to intervention is not going to work. This is why I continue to be fascinated by the extraordinary effort universities are putting into refining their ‘at risk’ student identification algorithms. Accurate statistical models are all well and good, but  all rather pointless without an intervention strategy that is capable of dealing with diversity and complexity.

Analytics and complexity

This post is a quick summary of the paper and presentation that we did for ASCILITE2012 in lovely Wellington, New Zealand. Basically the paper introduced the concept of analytics as information arising from interactions occurring within a complex adaptive system. You can read the full paper here.

Some definitions

  • Managerialism. Universities are increasingly managed as if they were businesses in a competitive marketplace. Accountability for public funding requires the rational allocation of resources and the intentional management of change. This teleological approach to the management of universities is known as managerialism and its  influence has extended to how universities manage their learning and teaching.
  • Educational data mining. “Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.” George Siemens, 2011 (http://www.learninganalytics.net/?paged=2)
  • Academic analytics. This is the use of data collection by educational datamining by universities. It marries statistical techniques and predictive modeling with the large data sets collected by higher education institutions, including learning management systems. Academic analytics has been described as business intelligence for HEI and is focused on the needs of the institution, such as recruitment, retention and pass rates. (Open University, 2012)
  • Learning analytics. Learning analytics is again the use of data developed through educational data mining but its more focused on better understanding and optimizing learning and the learning environment. According to George Siemens (2011), learning analytics is “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs”.

The Indicators Project

The Indicators project is an analytics project that has been running at CQUniversity since 2008. The project started when the members were responsible for supporting academic staff with their use of the, then, Blackboard learning management system. We found that the activity database table that holds a record of every staff and student click within the system had never been cleared. So we started looking at correlations between student activity within the LMS and their resulting grades. While these correlations based on aggregate data are somewhat interesting, their utility is perhaps limited as we will show later.

Some simple patterns

The following charts simply correlate student activity with the LMS with their resulting grade. Students on the horizontal axis are grouped by the final grades they received. Note that at CQUniversity, the grades are: HD=high distinction, D=distinction, C=credit, P=pass, F=fail, WF=withdraw fail. These charts are simple examples from the hundreds that we have developed over the last four years.

Simple correlation

Student clicks on Moodle against the grade they received

Slide12

The first day of Moodle access against their resulting grade

Slide13

The number of question marks within Moodle forum contributions per grade group

Analytics as the next big thing

We are noticing a large increase in the amount of hype around learning analytics as evidenced by the following:

BIG data sets showing what students do online may prove as vital to education as genome databases have been to genetics or Europe’s Large Hadron Collider to physics” (The Australian, 15th September, 2012)

EDUCAUSE and the Bill and Melinda Gates Foundation have targeted learning analytics as one of 5 categories for funding initiatives” (Educause, 2012)

Learning analytics promises to harness the power of advances in data mining, interpretation, and modeling to improve understandings of teaching and learning, and to tailor education to individual students more effectively” (Horizon report, 2011)

We urge some caution as there are well known cycles associated with the hype around new educational technologies. It also seems to us as that many are reporting on the amazing potential of analytics without a corresponding balance of healthy skepticism.

Slide18

Some potential problems

Based on our experience with the Indicators project we have identified a number of likely problems that universities will face with their analytics projects. These are:

  • Abstraction losing detail
  • Organisational structures
  • Confusion between correlation and causation
  • Assumptions of causality

Abstraction losing detail

We think Gardner Campbell summed this up nicely in his presentation to LAK12.

“…the nature of learning analytics and its reliance on abstracting patterns or relationships from data has a tendency to hide the complexity of reality” Gardner Campbell (2012)

For example if we consider the following chart that shows the correlation between student posts and replies to the LMS discussion forums and their resulting grades.

Slide22

And compare this to the average number of student forum contributions for each course across an academic year

Slide23

Or even the number of forum posts and replies for a single high achieving student

Slide24

Our experience with the Indicators project is that the devil is very much in the detail when it comes to aggregated analytics data in that the data aggregations we see at the macro level of analysis doesn’t really help us a great deal at the micro levels (single courses, students etc)

Organisational structures

Most universities are structured in a very deliberate reductionist way. People are organized into units base on their task or role within the university. Eg IT folk tend to live in the IT area, the finance folk live in the finance area etcetera. While these divisions between organizational units are imaginary, I’m sure most of us have experienced the frustration associated with a lack of cross unit cooperation. The ever constant battle for budgets often leads to inter-departmental rivalry which can hinder the cross organizational collaboration that analytics requires.

For example, because we aren’t IT and it is unusual to give non-IT folk access to the backend databases of systems such as Blackboard, Moodle, Peoplesoft etcetera, we had considerable difficulty in getting permission to access these resources in order to pursue our analytics research. The silos within universities is potentially a very significant problem as analytics is going to require a set of skills that, that given typical university structures, doesn’t usually exist in a single department. For example database administrators, educational developers and educational technologists do not typically belong in a single organisational silo.

Confusion between correlation and causation

If we look at the simple pattern from before that correlated student forum posts and replies with their resulting grade.

Slide22

And look at the following figure where the correlation did not necessarily hold true as we moved from the macro to the micro level:

Slide24

The data/information we are extracting from the learning environments is data stemming from a very complex interplay of variables and it would be dangerous to assume that we can use what happened in the past is going to happen in the future.

Assumptions of causality

One of the more worrisome problems that I can foresee is the assumption that analytics data is based on causality. To be more precise, that management introduce key performance indicators based on analytics data that is aggregated and without reference to the context in which it was gathered. We simply cannot assume that the data we are looking is representing a universal constant when in fact the underlying system is vastly more complex.

A possible path forward

So from a perspective of using analytics to enhance learning and teaching, we are much less concerned with the retrospective data representations and interpretations at the macro level even though the correlations at this level often appear to be quite distinct. We are aiming to focus more on the micro levels which is more tuned to the context in which the data is being gathered. What we are talking about here is a bottom up approach to the representation and interpretation of analytics data, and to some extent, this stands in opposition to the way that universities and their learning environments are currently being managed. We are proposing to do this by looking at analytics through the lens of complex adaptive systems.

Complex adaptive systems (CAS)

“A CAS is a dynamic network of semi-autonomous, competing and collaborating individuals who interact and co-evolve in nonlinear ways with their surrounding environment. These interactions lead to various webs of relationships that influence the system’s performance” (Boustani, 2012)

CAS are a variation on complex systems and have been described as systems that involve many components that adapt, learn or change as they interact. Each agent within a complex adaptive system is nested within other systems which are all evolving and interacting that we cannot understand any of the agents or systems without reference to the others. In simple terms, context is king when it comes to using analytics to improve learning and teaching so we can’t easily interpret analytics data without reference to the context from which it is derived.

Where to from here?

So given that analytics data is extracted from complex array of interacting systems, we are intending to focus our efforts at the course/teacher/student levels. That way the people operating within the context are the people interpreting and making decisions based on the analytics data. While we see the importance of of providing analytics derived insights to students in the future much like Purdue university have done with their signals project, initially ( given the constraints we currently have) its likely to be the teacher who has the right mix of closeness and knowledge about the context from which the analytics information is extracted. So we are aiming to provide the teaching academics with better information.

To borrow David’s car analogy, its about using analytics to augment the driver. In a lot of modern cars when you get out and leave the lights on, they turn off the lights for or give that annoying beep when you haven’t fastened your seat belt. The car is smart enough to help the driver out with the vehicles operation. These sorts of augmentations are what we would like to see within the LMS. We are concerned with using analytics to nurture evolutionary improvement from the present rather than rationally targeting some idealistic future state.

Analytics is not an IT thing

The Indicators project is an analytics project that has been running at CQUniversity for several years now. Interest in analytics appears to be booming at the moment with a large number of universities instigating projects. For me a worrying trend appears to be the IT centric approach that universities are taking. To explain my concerns, we need to first have a look at some of the definitions around analytics. The following table from Siemens (2011) broadly outlines some of these definitions.

Type of Analytics

Level or Object of Analysis

Who Benefits?

Learning Analytics

Educational data mining

 

Course-level: social networks, conceptual development, discourse analysis, “intelligent curriculum” Learners, faculty
Departmental: predictive modeling, patterns of success/failure Learners, faculty

Academic Analytics

Institutional: learner profiles, performance of academics, knowledge flow Administrators, funders, marketing
Regional (state/provincial): comparisons between systems Funders, administrators
National and International National governments, education authorities

Educational data mining is concerned with developing methods for exploring the unique types of data that come from educational settings and using those methods to better understand students and the settings in which they learn.

Academic analytics marries statistical techniques and predictive modeling with the large data sets collected by HEI, including those collected by the LMS. Academic analytics has been described as business intelligence for HEI and is focused on the needs of the institution, such as recruitment, retention and pass rates (Open University, 2012).

Learning analytics is more specific than academic analytics as it is focused exclusively on the learning process (Siemens & Long, 2011) and is often based on learning and teaching theories (Open University, 2012). Applications that apply analytics into the learning environments perhaps fit into the learning analytics definition.

As educational technologist who has been tinkering with analytics for the last few years, I get concerned when I hear IT companies saying things like:

IBM is a leader in metrics and uses learning analytics to gauge learning effectiveness, drive learning recommendations, and aid in decision making.”

or this

By analyzing student data and getting down to ever finer detail, educators and administrators using these analytic systems gain a much deeper understanding of the student, enabling the decision-makers to anticipate the next stage, the next need, specific performance challenges, and even potential outcomes, and guide the affected individual to the right action for a given situation.”

I get concerned about these things for a number of reasons, but two in particular stand out.

Analytics as an IT thing.

Based on my observations, there appears to be an underlying assumption that analytics belongs with IT departments. The rise of managerialism in higher education has meant that organizational structures are based on decomposition into specialized units with rigid command and control processes. This has lead to institutional silos that limit and inhibit cross-unit information sharing, cooperation and collaboration. Eg IT belongs to the IT department; student administration belongs to the student admin section etcetera. While I see a large role for IT in educational data mining, academic analytics and learning analytics, the interpretation and application of analytics information has to involve the educators as per the learning analytics section of George’s table above. Individual courses are highly contextual in that the patterns of behavior that students exhibit will be quite different from course to course.

Assumptions of causality.

A danger exists where correlations found within analytics data are seen as universal constants and this leads to decision making that assumes that patterns in the data are reproducible. IT companies and IT departments, and to a certain extent management, like high-level abstract data, averages and summaries. Performance based analytics data is based on exhibited student behavior and it is unlikely that this will be repeated from one course to the next due to the ever-changing context. A lot of the correlations we have found with the Indicators project are quite distinct at the institutional or even the departmental levels. These correlations are rarely as distinct when looking at individual courses or students as per the following figures that are merely comparing the number of distance student clicks within Moodle course sites and their resulting grades.

References

Siemens, G. (2011). Learning and Knowledge Analytics.   Retrieved 1/11/2011, 2011, from http://www.learninganalytics.net/?p=131

Making central and departmental IT work together

I read this article  from Campus Technology this morning and it rang some bells for me. The article is about the tension between central IT and departmental IT within universities. Competition and distrust, service duplication and lack of communication between IT areas of Utah State University are some of the problems mentioned in this article. The article mentioned three key approaches that have helped to significantly offset the aforementioned problems:

  • Central IT adopted a philosophy to engage with departmental IT people at multiple levels. This involved including departmental IT in decision-making up front which encourages a sense of ownership with the end solutions.
  • Central IT treated people as if they are all part of the same organization. This meant giving access to systems, distributing control and providing tools for them to use.
  • Central IT does not force choices on departments. This battle will always be lost as there will always be someone outside your reporting line who does not have to conform.

The approaches above resonate with me and I believe that these strategies are something that centralized IT and learning and teaching areas should take notice of.

Update on the previous post

This post is a quick expansion on the previous post that was looking at the correlations between when students first access their Moodle sites and their resulting grades. I’ve expanded the data to include a much large population (35627) of distance students. The following chart groups students by their resulting grade and shows the average day that each group first accessed their Moodle sites. Note that 0 on the y-axis represents the official starting date of term. At CQUniversity courses are available to students 2 weeks prior to start of term which explains the negative values on the y-axis. The x-axis is representative of students grades which at CQUniversity are:

HD – High Distinction
D – Distinction
C – Credit
P – Pass
F – Fail

Average First Access Day Grouped by Grade

In response to some queries about what this chart is showing. The 0 value on the y-axis is day 0 of term, that is, the day that term officially starts. Moodle courses at CQUniversity are made available to students two weeks prior to start of term. So the students receiving C, D and HD grades, on average, accessed their Moodle sites prior to the start of term. P and F students, on average, first accessed their course sites after the start of term. I hope this makes sense?