Burton, Kelley; Cuffe, Natalie --- "The Design and Implementation of Criterion-referenced Assessment in a First Year Undergraduate Core Law Unit" [2005] LegEdRev 8; (2005) 15(1&2) Legal Education Review 159

	Home \| Databases \| WorldLII \| Search \| Feedback Legal Education Review

TEACHING NOTE
The Design and Implementation of Criterion-referenced Assessment in a First Year Undergraduate Core Law Unit

The University Academic Board at the Queensland University of Technology (QUT) approved a new QUT Assessment Policy1 in September 2003, which requires a criterion-referenced approach as opposed to a norm-referenced approach to assessment. In 2004, in accordance with the QUT Implementation Plan, the QUT School of Law raised an awareness of criterion-referenced assessment and implemented criterion-referenced assessment in first year core undergraduate law units. The Implementation Plan anticipates that all law units across all year levels will implement criterion-referenced assessment between 2005 and 2007. This teaching note will distinguish norm-referenced assessment from criterion-referenced assessment and justify why QUT is implementing criterion-referenced assessment. It will focus on how the authors of this article designed, implemented and evaluated criterion-referenced assessment in a first year core undergraduate law unit, LWB143 Legal Research and Writing, in 2004. In 2004, the unit had a cohort of approximately 600 students and 12 members in the teaching team. Ten members of the teaching team were involved in marking the items of assessment and eight of them were casual academics. In light of the experience in 2004, the authors provide some insight into the way forward.

Norm-referenced assessment ranks a student’s performance against their peers and results in a normal distribution of grades, which is commonly referred to as using a bell curve or “grading on the curve”.2 Jackson identifies three problems with norm-referenced assessment.3 The first problem is that if academics use feedback from previous years to inform improvements in their teaching and learning, the success or failure of this cannot be measured by improved student outcomes. The second problem is that students become more competitive and are less likely to work co-operatively with their peers because they perceive that their marks will increase if they hamper other students. The third problem is that it does not recognise that the abilities of students in a cohort in one year may vary from the abilities of students in a cohort in a subsequent year.

In contrast, the QUT Manual of Policies and Procedures defines criterion-referenced assessment as follows:

In addition to the problems with norm-referenced assessment, the use of criterion-referenced assessment is justified because it increases the validity of the assessment task.5 Validity measures whether the desired learning outcomes are achieved.6 Another benefit is increased reliability of the assessment task.7 Reliability measures whether different markers mark a piece of work consistently and that the same marker is consistent in their marking.8 Criterion-referenced assessment also motivates students by providing them with explicit and attainable standards in advance so that they can concentrate on improving their personal best performances, rather than competing with their peers.9 In 2004, LWB143 Legal Research and Writing experienced the benefits of increased validity and reliability and these are discussed in more detail below.

Even though the QUT School of Law is moving towards the use of criterion-referenced assessment in all law units, it cannot be said that only norm-referenced assessment was used prior to the introduction of the new QUT Assessment Policy. Previously, the markers used explicit or implicit criteria and only adjusted the marks against the performance of other students where the distribution of marks for a piece of assessment or overall grades fell “well outside the norm-based guidelines”.10 Arguably, the QUT School of Law previously used a hybrid of both approaches to assessment.

For the QUT School of Law, the new QUT Assessment Policy will require a change in practice, that is, the need to design and mark according to explicit criteria and performance standards. Law academics will need to monitor the spread of marks or grades generated by the criterion-referenced assessment approach to ensure that they are not bunched at the extremes. Bunching at the extremes may suggest that the assessment task was too difficult or easy, or that there was not a shared understanding by the markers of the criteria and performance standards. However, this does not mean that the law academics should endeavour to attain a normal distribution of grades.11 The new approach by the QUT School of Law will be strongly oriented towards criterion-referencing. This is consistent with the best practice model advocated by the Centre for the Study of Higher Education, which involves “striking a balance between criterion-referencing and norm-referencing. This balance should be strongly oriented towards criterion-referencing as the primary and dominant principle”.12

In 2004, the authors designed criterion-referenced assessment sheets for four items of assessment in LWB143 Legal Research and Writing. The criteria used in the memorandum of advice, are more likely to be compatible with the learning objectives of other units and, therefore, serve as a better example to other law academics that plan to change their assessment regime to one of criterion-referenced assessment. The criterion-referenced assessment sheet used in LWB143 Legal Research and Writing in the second semester of 2004 is extracted in Appendix 1.

In this example, the assessment criteria are presented in the first column (on the left hand side of the page). The assessment criteria are aligned with the learning objectives for the unit. This alignment ensures that the assessment task is valid because the memorandum of advice is measuring the “desired learning outcomes”.13 It also compels the students to concentrate on the learning objectives of a unit.

In this example, there are four performance standards presented across the page, that is, excellent, good, sound and poor. Each performance standard has a descriptor indicating what is required to perform at a certain standard on a criterion. QUT currently has seven grades of assessment, but drafting seven performance standards for each criterion is a difficult task. The literature suggests that drafting clear criteria and performance standards continues to challenge academics.14 The authors have simplified this process by using four performance standards that correlate to the seven grades and percentages as follows:

In the second semester of 2004, the criterion-referenced assessment sheets were released to students before they did the assessment. The students were instructed to raise any questions about the criteria or performance standards with their tutor, who was one of the markers. By releasing the criterion-referenced assessment sheets in advance, the students were encouraged to become familiar with the assessment details and requirements.

In the second semester of 2004, there were ten markers in LWB143 Legal Research and Writing, each with varying degrees of teaching and marking experience. Eight of these 10 markers were casual academics and two of them had never taught the unit before. To ensure that the marking team had a shared understanding of the criteria and performance standards, the markers were provided with written marking guidelines indicating how each criterion was weighted and what was required to achieve each standard. An example of this, relating to the criterion “Analysis of the issues in light of the relevant law”, is as follows:

*Analysis of the issues in light of the relevant law*
High level of analysis of issues in light of relevant law; demonstrates creative and original thinking	9-10
Persuasive level of analysis of issues in light of relevant law; some level of creative or original thinking	7-8
Superficial level of analysis of issues in light of relevant law; little or no creative or original thinking	5-6
Lacks analysis of issues in light of relevant law; no creative or original thinking	0-4

*Analysis of the issues in light of the relevant law 10 marks* This criterion requires the students to demonstrate their understanding of the law, an appreciation of the material facts and their ability to apply the relevant law to the facts. The application of the law to the facts for each of the following five areas is worth a possible two marks: 1. Divorce 2. Stalking 3. Drug Possession 4. Drug Importation 5. Fixtures If a student has missed an issue or failed to identify a legal authority, their analysis will be incomplete and they should not receive two out of two for that area Award two marks for the area, if the student has comprehensive and correct analysis. Award one mark for the area, if the student has made a genuine effort in analysing, but could have been more comprehensive. Award zero marks for the area, if the student has made little or no effort to analyse. As an example of analysis “It is clear the dishwasher is a fixture.” = zero marks “The dishwasher is a fixture because it was physically connected to the plumbing.” = one mark
“The dishwasher is a fixture because it was connected to the plumbing, fitted in between two cupboards and below the kitchen bench and its removal revealed an untiled section of the floor.” = two marks

Analysis of the issues in light of the relevant law 10 marks

This criterion requires the students to demonstrate their understanding of the law, an appreciation of the material facts and their ability to apply the relevant law to the facts. The application of the law to the facts for each of the following five areas is worth a possible two marks:

1. Divorce
2. Stalking
3. Drug Possession
4. Drug Importation
5. Fixtures

If a student has missed an issue or failed to identify a legal authority, their analysis will be incomplete and they should not receive two out of two for that area

Award two marks for the area, if the student has comprehensive and correct analysis.

Award one mark for the area, if the student has made a genuine effort in analysing, but could have been more comprehensive.

Award zero marks for the area, if the student has made little or no effort to analyse.

As an example of analysis

“It is clear the dishwasher is a fixture.” = zero marks

“The dishwasher is a fixture because it was physically connected to the plumbing.” = one mark

“The dishwasher is a fixture because it was connected to the plumbing, fitted in between two cupboards and below the kitchen bench and its removal revealed an untiled section of the floor.” = two marks

In addition to the written marking guidelines, the markers were provided with examples of marked memorandum of understanding for the grades of 7, 6, 5 and 4 that had been done by the unit co-ordinator. These resources helped to ensure that the marking team had a shared understanding of the criteria and performance standards, as well as giving examples of the written feedback a marker would be expected to mark on a memorandum of understanding. In the first semester of 2005, the markers were invited to provide feedback on the implementation of criterion-referenced assessment sheets in the previous semester.

The authors designed a survey instrument to obtain feedback from the markers. Only six out of the 10 markers responded using the survey instrument. There was one response from a full-time academic and five responses from casual academics. Two other casual academics responded positively, but did not use the survey instrument. The markers were asked to respond to the following five statements by selecting strongly disagree (SD), disagree (D), neutral (N), agree (A) or strongly agree (SA). The table below indicates the statements put to the markers and the average of the responses as a percentage.

The markers were also asked open-ended questions so that they could provide feedback on the implementation of criterion-referencing in the unit. One of the themes emerging from this feedback was that criterion-referenced assessment increased reliability, that is, consistent marking. Reliability was particularly important in this unit because there were approximately 600 students in 2004 and 10 markers with varying degrees of marking experience. Criterion-referenced assessment sheets increased reliability by facilitating the systematic use of marking criteria and performance standards by the markers, who had a shared understanding of the marking criteria and performance standards. The literature recognises the need for the markers to have a shared understanding of the criteria and performance standards because divergent views will cause the students to have divergent views.15 The comments from the markers that supported the increased reliability of the assessment task were as follows: “Made marking a lot easier and took some of the ‘guess work’ out of marking similar assignments” and “The criterion-referenced assessment sheets helped me to justify why one piece of work was better than another and thus deserved a higher mark”.

In addition to increased reliability, another theme emerging from the markers’ feedback was that the criterion-referenced assessment sheets enabled the marker to identify strengths and weaknesses in a piece of assessment. This feedback from the markers is consistent with the literature.16 The law academics should feed this information into the structure and content of the generic feedback provided to students. They should also use it to inform future teaching and assessment approaches in the unit. As one marker said: “The criterion-referenced assessment sheets helped me to identify strengths and weaknesses in a piece of work, which was useful in providing feedback and made me feel more confident about marking consistently”.

Feedback enhances student learning and the literature asserts that at the very least it will indicate what the student has done right to meet the unit objectives and what the student has done wrong in failing to meet the unit objectives.17 The markers claimed that the criterion-referenced assessment sheets enabled them to provide worthwhile feedback to students in a systematic way and advised them what specifically to comment on. However, it is also recognised that circumstances may arise where a marker needs to tailor feedback to the needs of an individual student. For example, a particular student may have approached an assessment task in a very different way to that anticipated by the marker. A comment from a marker in LWB143 Legal Research and Writing supporting this argument was: “Students still need a certain amount of personalised feedback”.

Some law academics fear that by providing explicit criteria, performance standards and personalised feedback to law students, it provides students with ammunition when they seek a review of their grade or assessment item. The Centre for the Study of Higher Education suggests that students should be able to understand their marks when criterion-referenced assessment is used.18 The experience in this unit in 2004 was that even though a minority of students sought a review of their grade or assessment item, the markers were able to substantiate the marks by referring to the criteria and performance standards. One of the markers recognised this issue in the following comment: “They [criterion referenced assessment sheets] were useful in providing feedback to students who questioned their mark. I was able to refer to the sheet with the descriptors and advise where they did not complete the task well.”

One of the markers recognised that the overall mark using the criterion-referenced assessment was lower in some instances than if the assessment had been marked holistically because some students had just fallen short of the next performance standard for more than one of the criteria. This comment signifies the importance in legal education of content and skills, for example, not only what is said but how it is said. The comment also recognises the importance of determining the desired learning outcomes when designing the criterion-referenced assessment. To overcome this difficulty of having a prescriptive marking guide, the marker suggested that another criterion be added that would be entitled, “General overall impression”, to reward students who had been original or creative in their approach to the assessment task. However, the reliability of this new criterion would require the markers to have a consistent view on originality and creativity.

One of the markers commented that they were surprised at times by the high marks generated by using criterion-referenced assessment in 2004. Similarly, the literature indicates that academics are concerned that criterion-referenced assessment will result in marks that are skewed away from a normal distribution.19 The unit co-ordinator in LWB143 Legal Research and Writing in 2004 was conscious of this and counteracted this problem on subsequent items of assessment by providing a more prescriptive marking guide and changing the weightings of some of the criteria. The impact of this was to increase the reliability and validity of the assessment tasks.

Even though criterion-referenced assessment was used in the unit, the overall grades for the students at the end of the semester represented a normal distribution of grades. This is not the aim of the new QUT Assessment Policy,20 but it does to some extent support the notion that the assessment tasks were appropriate and that the markers had a shared understanding of the criteria and performance standards. However, the authors are continuously striving to improve their approach to criterion-referenced assessment and are using the feedback from 2004 to inform the way forward in 2005.

After engaging in self-reflection, the authors have determined the way forward in 2005 is to invite peer feedback from the 2004 markers in the unit and to have discussions with QUT Teaching and Learning Support Services. Discussion was also made with some of the delegates at the recent Australasian Law Teachers’ Association (ALTA) Conference in July in Hamilton, New Zealand.21 The two main goals are to refine the criterion-referenced assessment sheets so that they are more explicit and to engage in processes that will enhance the shared understanding of the criteria and performance standards between the markers and students.

In managing the first goal, the authors invited peer feedback from QUT Teaching and Learning Support Services on the appropriateness of the performance standard descriptors. These discussions suggested that the “excellent” performance standard descriptors for some of the criteria were too high and some of the “sound” performance standard descriptors for some of the criteria were too low.22 It is the experience of the authors that as the number of performance standards increase, it is more difficult to articulate the boundaries between the performance standards. It is expected that the wording of the performance standards and perhaps the criteria will change over time in light of experience.

In an effort to meet the first goal of making the criterion-referenced assessment sheets more explicit, the authors will indicate the weightings of each criterion to students prior to undertaking the assessment task. Some learning objectives are more important and, debatably, some are more subjective than others. The outcome of this is that the criteria are regularly weighted differently. One of the markers commented that they found it useful to know how the marks are allocated to the criteria and advocated that the students would find this information useful because they could determine which skills were being emphasised.

After allocating marks to each criterion, there are two views on how to allocate marks across the performance standards. One view is to allocate a single or narrow range of marks to each performance standard to increase the reliability of an assessment task. This makes it easier to defend marks when students apply for a review of assessment item. Awarding a single or narrow range of marks to each performance standard may benefit those law students who fall just short of the next performance standard. Further, this will not automatically lead to a bunching of overall marks for an assessment task because there are several criteria listed on the criteria sheet on which a student may fall within any of the four performance standards. The other view is to allocate a wider range of marks to the performance standards and give the markers more discretion to use their professional judgment. This less prescriptive approach awards marks to students who submit original or creative work. However, the drawback with this approach is that it decreases reliability because different markers may have differing views on originality and creativity and award marks on these factors inconsistently.

The authors plan to ensure there is an enhanced understanding of the criteria and performance standards by inviting markers to a hands-on workshop before semester starts to review the 2004 criterion-referenced assessment sheets. This initiative will give the markers the opportunity to debate the meaning of the criteria and performance standards, offer more explicit wording and give them a larger sense of ownership over the criterion-referenced assessment. The authors will continue the practice of providing the markers with examples of marked items of assessment using the criterion-referenced assessment sheets. They will also instigate more cross-marking between the markers, which will increase reliability. Another initiative is to build an online discussion forum for the markers so that they can provide words of caution or offer advice arising from their marking experience.

In 2005, the authors plan to enhance the student understanding of the criteria and performance standards by conducting a hands-on workshop inviting students to critique and apply the criteria and performance standards. The students will also be invited to provide feedback on a formal survey instrument. It is anticipated that the survey instrument will specifically question whether they understood the assessment requirements, whether the hands-on workshop helped their understanding of the marking criteria and performance standards, whether the assessment aligned with the learning objectives of the unit and whether the criterion-referenced assessment sheets provided them with worthwhile feedback on their learning and progress.

The way beyond 2005 includes building a collection of marked assessment using criterion-referenced assessment sheets as examples for markers and the law students in the unit so that they can examine what is necessary to attain each performance standard. A further goal is to determine how second and later year units in the law degree build on the first year core law unit’s criterion-referenced assessment sheets to reflect the fact that the law students are incrementally developing their skills as they progress through the law degree.23 In this light, the wording of the performance standards should be incrementally higher for each year level of the law degree.

The outcome of using criterion-referenced assessment in LWB143 Legal Research and Writing in 2004 was increased reliability and validity of assessment tasks. After reflecting on the experience in 2004 and inviting peer feedback from a range of sources, the authors have identified two main goals for 2005. These goals are to refine the criterion-referenced assessment sheets so that they are more explicit and to engage in processes that will enhance the shared understanding of the criteria and performance standards between the markers and law students.

1 QUT, Manual of Policies and Procedures (2003) cl 9.1.3 http://www.qut.edu.au/admin/mopp/C/C_09_01.html (accessed 13 October 2005).

2 P Nightingale, IT Te Wiata, S Toohey, G Ryan, C Hughes, and D Magin Assessing Learning in Universities (Sydney: University of New South Wales Press, 1996) 9.

3 S Jackson, A Project to Facilitate the Implementation of Criterion-Referenced Assessment in the School of Law (2004) QUT Teaching and Learning Support Services https://olt.qut.edu.au/udf/FELLOW09/gen/index.cfm?fa=getFile&rNum=1638031&nc=1 (accessed 13 October 2005).

9 DT Neil & DA Wadley, A Generic Framework for Criterion Referenced Assessment of Undergraduate Essays (1999) 23 Journal of Geography in Higher Education 303.

12 Centre for the Study of Higher Education, A Comparison of Norm-Referencing and Criterion-Referencing Methods for Determining Student Grades in Higher Education (Australian Universities Teaching Committee, 2002) http://www.cshe.unimelb.edu.au/assessinglearning/05/normvcrit.html (accessed 13 October 2005).

14 L Dunn, S Parry & C Morgan, Seeking Quality in Criterion Referenced Assessment, Papers presented at the Learning Communities and Assessment Cultures Conference 2002 (Northumbria: EARLI Special Interest Group on Assessment and Evaluation, 2002) http://www.leeds.ac.uk/educol/documents/00002257.htm (accessed 13 October 2005).

15 The Australian Association for Research in Education Qualitatively Different Conceptions of Criteria used to Assess Student Learning, in S Barrie, A Brew & M McCulloch eds, AARE Journal (James Cook University, 1999) http://www.aare.edu.au/99pap/bre99209.htm (accessed 13 October 2005).

16 B O’Donovan, M Price & C Rust, The Student Experience of Criterion-Referenced Assessment through the use of a Common Criteria Assessment Grid (2001) 38 Innovations in Learning and Teaching International 74.

17 Teaching and Educational Development Institute, Grades and Feedback (The University of Queensland, 1998) http://www.tedi.uq.edu.au/teaching/assessment/grades.html (accessed 13 October 2005).

21 K Burton & N Cuffe, CRAFT: Criterion Referenced Assessment for Teachers, paper presented at the Sixtieth Australasian Law Teachers Association Conference, 4-7 July 2005.

22 For example, the word “all” in the “excellent performance” standard for the first two criteria on the LWB143 Legal Research and Writing’s criteria extracted in Appendix 1 was too high and arguably almost impossible to achieve. Similarly, the word “superficial” in the “sound” performance standard for the analysis criterion was too low and was more appropriate for the “poor” performance standard.

*Performance Standard*	*Grade*	*Percent*
Excellent	7	85 – 100
Good	6 and 5	65 - 84
Sound	4	50 - 64
Poor	3, 2 and 1	< 50