Home \| Databases \| WorldLII \| Search \| Feedback Computerisation of Law Resources

Home | Databases | WorldLII | Search | Feedback

Computerisation of Law Resources

You are here: AustLII >> Databases >> Computerisation of Law Resources >> 1999 >> [1999] CompLRes 35

Osborn, James; Sterling, Leon --- "JUSTICE: A Judicial Search Tool Using Intelligent Concept Extraction" [1999] CompLRes 35 (23 July 1999)

JUSTICE: A Judicial Search Tool Using Intelligent Concept Extraction

James Osborn and Leon Sterling

Intelligent Agent Laboratory
Department of Computer Science and Software Engineering
University of Melbourne
Parkville, Victoria, 3052
AUSTRALIA
+61 3 9344 9100
{osborn, leon}@cs.mu.oz.au

1 Introduction
2 Domain
- 2.1 Concepts
- 2.2 Legal Cases
3 Available Tools
4 Methodology
- 4.2 Knowledge Engineering
- 4.3 Knowledge Representation
5 Architecture
6 Results
7 Conclusions

ABSTRACT

A legal knowledge based system called JUSTICE is presented which provides conceptual information retrieval for legal cases. JUSTICE can identify heterogeneous representations of concepts across all major Australian jurisdictions. The knowledge representation scheme used for legal and common sense concepts is inspired by human processes for the identification of concepts and the expected order and location of concepts. The knowledge is supported by flexible search functions and string utilities. JUSTICE works with both plaintext and HTML representations of legal cases over file systems, and the World Wide Web. In creating JUSTICE, an ontology for legal cases was developed, and is implicit within JUSTICE. The identification of concepts within data is shown to be a process enabling conceptual information retrieval and search, conceptualised summarisation, automated statistical analysis, and the conversion of informal documents into formalised semi-structured representations. JUSTICE was tested on the precision, recall and usefulness of its concept identifications and achieved good results. The results show the promise of our approach to conceptual information retrieval and establish JUSTICE as an intelligent legal research aid offering improved multifaceted access to the concepts within legal cases.

Keywords

Intelligent law information systems, Intelligent research aid, Conceptual information retrieval, Legal WWW agent, Legal Knowledge Representation, Legal Ontology

1 INTRODUCTION

Concept based information retrieval is currently receiving widespread attention. Within legal domains the necessity of the transition from syntax based methods to semantic ones has been apparent for many years. The benefits of concept based searching over legal data have been advocated by many AI and law researchers.

The modern day movement toward sematic information retrieval can be seen as trying to achieve the concept identification abilities of humans while providing real-time or near real-time results. This requires solving the challenging task of making the system understand legal cases in the same manner as a legal expert (Bing, 1989).

Semantic retrieval is a problem because current search engine technology does not adequately address queries for abstract concepts which have heterogeneous representations. For example, the problem of extracting all the five star movies on a critic's Web site, or all the recipes from a cook's travelogue, or the winners in legal cases. Most search engines and tools use standard lexical information retrieval techniques with some heuristics, e.g. giving a match near the top of a document a higher score or using synonym expansion on search terms. A paradigm shift toward semantics is needed so researchers can treat information collections as they would an expert human user who had fully processed and remembered that same collection.

Extracting abstract concepts is often difficult because the required information is not stored in a structured manner, and instances of the concepts are not explicitly recorded within the system as part of an abstract concept. One way of achieving concept based searching would be to force the originators of the data to adhere to a shared ontology. An ontology is a formalised specification of the conceptualisation of knowledge within a domain (Bench-Capon and Visser, 1997). Adherence to an ontology would make conceptual searches within that ontology trivial. It is an argument of this paper that to enable semantic querying, concepts should be marked up during the creation process. Initial effort invested during creation will reduce the difficulty of the task later and provide higher levels of accuracy. Of course there is a limit to the effort the creators of cases will expend, and for this reason such formalism should initially be restricted to the headnotes of cases. Once an agreed ontology is in existence, a form can be created which the author must complete. When the case is exported (i.e. distributed and published) a simple program can be used to delimit each field of the form with the correct meta-tag. Legacy data cannot be dealt with as easily and will require an automated concept identifier.

Developing a complete system which functioned over an entire case is very difficult. Rather than tackle the whole problem, we chose to focus on a subset of the problem which still provides improved access to case law.

JUSTICE is the name of the tool created, and is an acronym for A Judicial Search Tool using Intelligent Concept Extraction. JUSTICE is an attempt to bridge the gap between current syntax methods and concept identification[1]. The research was motivated by a desire to provide legal researchers with a tool which would provide conceptual based searching of case law. The initial insight was a belief that a knowledge based approach to extracting legal concepts would perform well in the domain of legal cases, especially as regards the headnote. JUSTICE is able to recognise and extract abstract legal concepts from heterogenous digital representations of legal cases. The ability to recognise concepts enables many functions including: conceptual searching; conceptual summarisation; the collection of statistics across concepts; and the ability to convert informal documents to formalised representations, e.g. plaintext to XML.

Section 2 outlines the domain within which JUSTICE works, including both concepts and legal cases. Section 3 discusses the currently available tools, and points to future developments including a formalised ontology for legal cases. Section 4 outlines the methodology used to create JUSTICE, and the knowledge representation scheme employed. Section 5 presents the architecture of the system. Section 6 shows the results and discussion and section 7 presents the conclusion.

2 DOMAIN

2.1 Concepts

A concept is defined by the Oxford English Dictionary as a general notion or an abstract idea or an idea or mental picture of a group or class of objects. The ability to completely capture an informal abstract idea within a fixed precise definition is generally regarded as impossible; a more useful view is that instances of a concept are better described as having a family resemblance (Wittengenstein, 1968). This has implications for knowledge based systems which often try to define every possible instance of a concept with inflexible rules. The approach of adding more and more heuristics to increase recognition of all instances of a concept is ultimately flawed in most domains. This limitation of the methodology was accepted for the current research because of a belief that such methods would capture enough concept instances in the legal domain to be useful. Identifying those instances which fall between the rules requires methods other than pure knowledge based approaches.

2.2 Legal Cases

A legal case is composed of two significant parts[2], the headnote and judgment (of which there may be more than one). JUSTICE focuses mainly on the headnote of a judgment. The headnote of a case provides a summary of aspects of the case. The type of concepts which appear in the headnote are sufficiently interesting to be of great use to legal researchers. Paper law report headnotes contain human summaries of facts and law, but these do not appear in the digital counterparts. Some of the concepts possible in digital headnotes include: case name, parties, citation, judgment date, hearing date, judges, representation (i.e. lawyers), and law cited.

The judgment of a case is examined for case segmentation, the order concept and the winner/loser concept. The headnote is that part of a case which is likely to be further formalised by the courts, and so is the most likely aspect to benefit from a formal ontology. We hope that this work encourages the beginnings of such a project. It is hoped that once the benefits of identifying headnote concepts are known a move toward further formalisation will be encouraged.

Extracting concepts from headnotes is a difficult problem because of the varied representations created through the currently ad-hoc process of headnote creation. Headnotes can differ across years, courts, judges, and headnote authors.

JUSTICE can extract twenty-two concepts from a case. JUSTICE records both the start and stop location of each concept, along with the concept content or the text which set an abstract concept. The start and stop markers are needed to enable accurate concept identification, and to allow for the conversion of syntax-based documents (e.g. plaintext or HTML) into semantically segmented documents, e.g. XML.

Some concepts are subsidiary or are used to segment a case into its components, e.g. start of judgment, and are not usefully used within a search. The complexity of each concept differs greatly, the simplest concept identification uses two heuristics, the most complicated twenty six.

The concepts which are identified by JUSTICE and usefully searched on include: headnote, heading section, case name, court name, division, registry, parties (initiator and answerer), judge, judgment date, citation, order, and winner/loser. The definitions of the concepts identified are mostly obvious, the winner/loser concept however requires some explanation.

Analysing legal cases in terms of winning and losing is often inappropriate; further such a distinction does not neatly divide cases. Cases with complex orders, multi-party cases with different orders for each party and single party cases are all difficult to analyse with the concept of winner/loser. Nevertheless lawyers often talk of winners and losers and such a conceptual distinction has real value especially when the interest is law and not practicalities. JUSTICE defines winning, as winning the judgment, i.e. the court rules in that party's favour. JUSTICE returns one of four answers when locating winner/loser:

a) The Initiator won.

This includes the plaintiff, applicants, appellants, prosecution etc.

b) The Answerer won.

This includes the respondent, the defendant, defence etc.

c) There was no clear decision in this case.

This means that no decision could be found, either in summary form or as free text in a part of the judgment.

d) There was no clear winner in this case.

This means that a decision was found but no clear winner was apparent and could be interpreted such that either party could have won.

3 AVAILABLE TOOLS

Current legal information retrieval tools have not changed much since the 1960s. Perhaps the most obvious change is increased availability of the document collections and the large increase in the amount of data available. A move toward semantics was fairly obvious from an early stage, but developments toward this goal have been slow in arriving. Queries on document collections can be classified in two ways: 1) The range of document segments of instances of a document collection, e.g. cases, which may be searched; and 2) The type of search available, e.g. ranked or boolean.

Most document collections are divided into jurisdictions so searches can be limited or listed according to their source. Full text searches have been around the longest, while more recent document collections offer finer grained searches, including at least title. Common fields include citation and court. Proprietary CD-ROMS from legal publishers such as Butterworths offer some of the best segmented searches, with some including judge and representation. The SCALEplus system (http://SCALEplus.law.gov.au/) provides segmented search but the usefulness of the segmented search is undermined by the unreliability and inaccuracy of the segments. Legal researchers demand high levels of accuracy. Any system which is expected to be useful must be complete and able to be trusted by the researcher. This requirement is more important for legal document collections than for other document collections. If such accuracy is not available the system will not be used.

The flexibility of search methods is increasing, e.g. additional operators like near or same paragraph are becoming available. However, most searching still relies upon lexical matching which limits what is available to the user.

Two closely related projects deserve special mention.

The Supreme Court of Canada SGML project (Poulin et al., 1997): An SGML-based Internet publishing system has been developed for the Supreme Court of Canada on an experimental basis. The project is converting Supreme Court cases into SGML and experimenting with conceptual search engines. It is difficult for them to obtain data, and they admit the digital version is not in a reliable state. This will severely limit the desirability of using the site for serious research. The project provides and/or searches on the following fields: indexed as, Dates, Present judges, Abstract, Names of the parties, Case cited, Statutes and Regulations cited, Authors cited, and Full text.

SALOMON

The SALOMON project (Moens et al., 1997) has reported good results with an implemented but publicly unavailable system that aims to automatically summarise Belgian criminal cases. The results are limited by the use of only Belgian Criminal cases which are "clearly structured and the decisions have a fixed, recurring composition" (page 116 of Moens et al., 1997). SALOMON uses a dual methodology, knowledge based and statistical which focuses on the following nine concepts: court, date, victim, accused, alleged offence, transition formulation, legal foundations, verdict, and conclusion. Comparisons across concept identification are problematic, see section 6.1. They report an across concept average recall rate of 81.2% and an average precision rate of 80.2%.[3]

3.1 Future Developments

Segmentation of cases is useful and is equivalent to concept identification for simple concepts. More complicated concepts require more sophisticated identification. For example, even if a summary order is available and can be searched, the winner/loser concept cannot be located by a syntax search.

As stated above one way of creating conceptual IR is to get judges to use formal methods to record their judgments. Although this method would be highly effective it is unlikely that judges would adopt such measures. More probable is convincing headnote authors to increase the use of formalisms when they create headnotes. It is hoped that the presented ontology will encourage such a move.

Full conceptual information retrieval would provide new multifaceted access to data. It would provide conceptual searching which would close the gap between human mental queries and current syntax approach and allow for statistical results to be gathered across concepts. Legal researchers could get better information and collect profiles across any concept or collection of concepts. This would provide new information to the community, e.g. the average time it takes a particular judge to deliver a judgment, and may affect the way lawyers argue or present a case, e.g. if a judge shows a pattern of following another judge's judicial reasoning.

3.1.1 Semantic Collections

The World Wide Web exploded with HTML largely by accident, and the suddenness of this explosion meant that the best technology was not used. The Web was always going to move in the direction of semantics, and XML is likely to be the representation format.

XML will greatly improve the depth and quality of digital collections of human knowledge and gossip on the web. Data will be more accessible, and a paradigm shift, away from format onto semantics will be encouraged. Intelligent search engines will emerge, browsers will understand documents, and web agents will add value to data in truly amazing ways.

Predicting time frames for change is largely guesswork. Some people had predicted the completed XML transformation of the entire web by mid-1997. It is now apparent that the conversion of data from HTML to XML will be a long and gradual process.

Future tools will provide semantic templates which allow documents to be saved in XML marked up format. This will be enabled by common semantic ontologies, and streamlined by the use of forms to aid data entry. These will work for newly created data, but legacy data (i.e. plaintext and HTML) must be analysed and the concepts identified. Concept identification is the process required to convert such documents into semantically marked up collections. Cases can be segmented and more abstract concepts can be contained within tags. JUSTICE is able to do this for some concepts within legal cases.

JUSTICE is currently an agent based system, and such systems have serious shortfalls for large document collections in that they do not provide real-time results.[4] The same technology however, could be used to create an XMLised collection of legal cases and then an XML aware search engine could be used to provide the same set of answers as JUSTICE, and enable statistical queries, but in real-time.

3.1.2 Legal Case Ontology - LegalCase.dtd

Accurate concept identification in complicated domains requires knowledge of an ontology. This makes concept identification more accurate and allows inter-conceptual relations to be used for integrity checking. JUSTICE has an implicit knowledge of part of an ontology which was created to cover legal cases with British heritage. This ontology was formalised, as a graph and as an XML DTD and is available, as LegalCase.dtd, from http://www.cs.mu.oz.au/~osborn. LegalCase.dtd consists of seventy six concepts most of which cover possible concepts within headnotes. It provides compatibility with past cases and allows for useful new concepts to be included. The concepts mapped within judgments are limited to facts, law and order.[5] The DTD makes the concepts explicit and shows the complex web of interrelationships between them. JUSTICE does not yet contain knowledge of the entire ontology.

JUSTICE provides a useful search tool, but it is hoped that a formalised ontology will be adopted by headnote authors and be used to explicitly represent concepts within legal cases. The conversion of legacy cases, i.e. plaintext and HTML, to semantic representations can be achieved using a tool which identifies concepts within documents and has a knowledge on the applicable ontology. The tool can insert appropriate tags at start and stop locations of simple concepts or encode more complicated concepts within stand-alone tags.

4 METHODOLOGY

4.1 Knowledge based approach

A knowledge based approach was used to capture domain knowledge. The choice of methodology was guided by a desire to create a useful tool. Each concept is described by multiple heuristics. Drawing from experts systems and good software engineering practice the rules are stored separately from the processing elements (Stefik, 1995). This allows heuristics to be easily changed or added, so adding non-Australian domains would be possible within the current architecture. Although a knowledge based approach has limitations, particularly instances falling between the spaces of formalised rules; it has proved to be a very useful approach. JUSTICE provides support for the proposition that a knowledge based approach is useful in semi-structured domains.

4.2 Knowledge Engineering

Legal domain expertise was relied upon to create rules which described the concepts. The processes which humans may use to perform the task of extracting concepts from text were considered and four ideas surfaced and guided development:

1) The expectation of information, and graded loosening of a filter;

2) Relevance Filters;

3) Flexible Pattern Matching using text, format, white-space and position; and

4) Best Guess mechanisms, and ranking schemes.

The knowledge engineering task involved examining hundreds of cases and trying to write down the rules which could be used to describe the concepts. The descriptions were constrained to using what emerged as a useful set of descriptors.

4.3 Knowledge Representation

There are many types of knowledge representation schemes possible, both generic varieties and legally specific. The available languages did not meet our specific needs and so a purpose-built language was developed.

The knowledge representation used was a set of primitives which emerged from the knowledge engineering process as useful descriptors of the domain. The set of primitives were collected into separate components of the architecture; these were not domain specific but were used to create domain specific heuristics.

The custom KR scheme consists of three components:

1) Graphical description language (the Viewer class)

The implicit assumption made in most documents (and especially HTML), that documents are for humans to view, was made explicit. When documents are created there is much value within them which is not captured by traditional syntax matching approaches. This component aims to use the information a human user extracts from text but which is lost with lexical methods. The primitives enabled by this component include: ConceptBreak, Find, FreeFlowingText, Heading, Importance_rank, LineBreak, LowerCase, NextLine, PrevLine, SectionBreak, ThisLine, UpperCase, and WhiteSpace.

The set of primitives works with both HTML and plaintext. The architecture is such that changes in HTML can easily be incorporated into the system. Dealing with HTML is often difficult because HTML is a very unreliable markup language. Tags such as <B>Supreme </B><B>Court</B> are not uncommon, especially where the text has been automatically marked up. A simple approach of stripping all tags results in useful information being lost and prevents concept positions from matching up with the original HTML source. For this reason many heuristics use the primitive find, which locates strings with regard to how they appear to a viewer not just on straight syntax matching.

2) Expected Concept Locations (the Case class)

Use of concept location has been a popular method within information retrieval and dates back to before 1960. Using expected concept order and position to guide the expected location of concepts allows for greater accuracy and better efficiency when locating concepts. The use of such a mechanism raises the possibility of trickle-down error, where a concept depends upon a concept which has been incorrectly identified. For this reason expected concept location should only ever be used as a guide. Checking mechanisms need to be in place to ensure errors are trapped; and alternative heuristics need to be defined in case expectations are not realised.

The primitives allowed when specifying position include (where X and Y are integers marking a place in the document, and X is usually the start or end of a concept): after(X,Y), before(X,Y), between(X,Y,Z), concept(X,NUM), nearEnd(X), nearStart(X), next(X,Y), prev(X,Y), within(X,Y).

3) String Utilities

These are an assorted collection of primitives which provide many useful functions including flexible pattern matching. They also allow for dealing with HTML tables (which change the meaning of tags) and handle ASCII special characters.

5 ARCHITECTURE

Figure 5.1: The architecture and control flow of JUSTICE

As can be seen from figure 5.1, JUSTICE consists of several interrelated classes:

* The FileNavigate and NetNavigate classes manage file and network data.

* Control coordinates the control flow in accordance with the options chosen.

* The Utilities class contains many useful primitives which aid the representation of the domain knowledge.

* Viewer provides many primitives which allow for pattern matching to be done at the abstraction of a human viewer. It also provides flexible pattern matching and interprets HTML and text with regard to its effect on interpretation.

* Case holds references to sections of a judgment, including concept start and stop markers and the current position of parsing. Methods for moving around a judgment are also included.

* Individual legal and common sense experts are collected within the ConceptExperts class. They combine the primitives from the other classes to describe heuristics which define the concepts.

JUSTICE works on plaintext and HTML represented cases. It is entirely written in Java 1.1 and operates over a TCP/IP network, or file system. The user simply interacts with the GUI to select the desired options. JUSTICE currently allows searching and summarisation over concepts. Cases can be processed individually or in batch mode over files and directories. Statistical queries are available but are not as yet fully automated.

Each concept is described by a collection of rules. These rules were created during the knowledge engineering process and consist of collections of primitives permitted by the system. Rules within collections of rules are ranked according to three schemes, a) appropriateness given known data; b) heuristics relating to most likely position, and c) a relative ranking scheme. These schemes provide resolution if more than one rule fires, and improve the accuracy of results.

6 RESULTS

6.1 Methodology of testing

Evaluating concept identification tools is a difficult task. Comparisons between tools are difficult because of differences in the structures of domains and the difficulty of comparing the different concepts identified. Further testing the results of such tools is a human and hence time-consuming process.

An obvious approach is to utilise the traditional measures of information retrieval, namely precision and recall. These measures must be slightly altered (Moens et al., 1997), with precision and recall defined respectively as the proportion of correct responses over the number of responses the tool returned, and the proportion of correct responses over the number of responses a human expert would return. Precision measures the degree of accuracy and any errors in the returned concepts; and recall measures the degree of completeness and any errors by reason of missing concepts.

The precision and recall statistics were collected using very strict measure of correctness. The summarisation feature of JUSTICE was used to output a listing of results over the test set of cases and then one of the authors intellectually identified the concepts within the same set of cases and compared the results. If a correct concept was identified by JUSTICE but extraneous data was also returned, e.g. a bracket, then the extraction would be recorded as incorrect. An additional metric, useable, was included to better record the usefulness of extractions. The criteria for useable correctness, was whether the extracted concept would be returned if the JUSTICE search feature, which uses substring matching was used to search for the correct concept. That is, useable counts extraction with extraneous data as correct.

The usefulness of precision/recall metrics for concept extraction tools is worthy of separate research; but was incidental to the current work. As can be seen from the results the precision and recall results statistics were often the same. This occurred because most concepts are in every case, and JUSTICE returns an answer for every case.

JUSTICE was assessed as a legal research tool, so many of the concepts which it identifies were not directly assessed, but the assessed concepts depend upon the accurate identification of the other concepts, e.g. most concepts depend on an accurate identification of the headnote start and stop concept.

The testing included testing over heterogeneous data from heterogeneous sources. JUSTICE was tested with randomly chosen and previously unseen data. Given the small test sets of the non-Australian and plaintext cases the HTML Australian results are the best indicator of the performance of JUSTICE.

6.2 Australian results

The Australian data was taken from AustlII (http://www.austlii.edu.au) and SCALEplus (http://SCALEplus.law.gov.au/). The HTML test data consisted of 100 cases taken from all the major Australian jurisdictions available.

Within the results table, HS stands for the Heading Section.

Across concepts these results are:

Precision: 96.3%

Recall: 96.1%

Useable: 98%

	Precision	Recall	Useable
HS	100	100	100
Parties	87	87	100
JudgmentDate	100	100	100
Citation	100	98	98
Court	97	97	99
Division	100	100	100
Registry	98	98	100
Judge	99	99	99
Winner/Loser	86	86	86

Table 6.2.1: JUSTICE results on HTML Australian cases.

The plaintext data test set consisted of 20 randomly selected cases.

	Precision	Recall	Useable
HS	100	100	100
Parties	75	75	100
JudgmentDate	95	95	95
Citation	90	90	90
Court	85	85	85
Division	100	100	100
Registry	100	100	100
Judge	90	90	90
Winner/Loser	75	75	75

Table 6.2.2: JUSTICE results with plaintext Australian cases, expressed as percentages

Precision: 90%

Recall: 90%

Useable: 92.8%

6.3 Non-Australian Results

JUSTICE was designed to work on Australian cases but given the similarities between case law descendent from British law, it was interesting to trial JUSTICE on such cases. Results on US and UK data before domain specific adjustments were limited to four concepts: the Heading Section, the Parties, Court and Judges.

Twenty US cases were taken from findlaw (http://www.findlaw.com).

	Precision	Recall	Useable
HS	75	75	100
Parties	0	0	100
Court	45	45	45
Judge	10	10	10

Table 6.3.1: JUSTICE results with US data, expressed as percentages

Precision: 32.5%

Recall: 32.5%

Useable: 63.8%

Fifteen UK cases were taken from two sites, (http://www.parliament.thestationeryoffice.co.uk/pa/ld/ldjudinf.htm and http:// www.smithbernal.com/casebase_search_frame.htm)

	Precision	Recall	Useable
HS	93	93	100
Parties	13.3	13.3	100
Court	5	5	53.3
Judge	5	5	5

Table 6.3.2: JUSTICE results with UK data, expressed as percentages

Precision: 29.1%

Recall: 29.1%

Useable: 64.6%

6.4 Discussion

Formal testing pointed to shortfalls within the heuristics which show the need for further refinement of the knowledge base. Formulating heuristics is an iterative process, so it was not surprising that the heuristics were not fully optimised. After examining the causes of the errors, it is believed that all concepts except the winner/loser one could be increased to very high levels of accuracy.

Results from the plaintext data are lower than the HTML data. This is to be expected as plaintext data carries much less information about the nature of its text within it.

JUSTICE performed relatively badly on non-Australian cases. This is not surprising given that no effort was made to customise concept descriptions to those domains, and that concepts have quite different representations, e.g. In UK House of Lords cases, judges are called Lords. The results also show a weakness in a knowledge based approach, namely the need to customise the knowledge base for each different domain.

Testing highlighted the fact that more inter-conceptual checking could aid in the detection of errors. Further using such checks enables a single agent to work over different domains, either different courts or different countries.

Many errors were obvious to a person with domain knowledge as errors. This is a pleasing result as it means that JUSTICE is very reliable, i.e. if it returns an answer it is highly likely to be correct. JUSTICE coupled with an expert can therefore provide very accurate answers and avoid false positives.

6.5 Extensions

Future extensions will include the complete mapping of all concepts possible in the ontology to rules within JUSTICE. Some complex concepts can be found within the judgments of cases. Extensions of JUSTICE could incorporate statistical methods for extracting these concepts from the full text of cases.

7 CONCLUSIONS

JUSTICE is a useful legal research tool providing previously unavailable concept based searching, summarisation and statistical compilation over collections of legal cases. The implementation required the identification of an ontology for legal cases which has been formalised. This is believed to be the first of its kind for Australian legal cases. The results of JUSTICE have extended previous research by increasing accuracy while also extracting concepts from heterogenous domains.

Further the identification of concepts within data has been shown to be the technique required to enable concept based searching, summarisation, automated statistical collection and the conversion of informal semi-structured plaintext and HTML into formalised semi-structured representations.

As a prototype system, JUSTICE and LegalCase.dtd provide a sound basis on which to encourage and extend efforts to increase the richness of access to legal information. It is hoped that a settled legal ontology will become commonplace for legal cases, and that previous decisions not using such an ontology can be converted using JUSTICE. Until this situation arises JUSTICE can be used by researchers to enable concept based searching, summarisation and statistical information gathering from legal cases.

8 REFERENCES

Bench-Capon, T. and Visser, P. (1997) Ontologies in Legal Information Systems; The need for Explicit Specifications of Domain Conceptualisations, Proceedings of the Sixth International Conference on Artificial Intelligence and Law, (pp. 132-??).

Bing, J. (1987) Designing Text Retrieval Systems for Conceptual Searching, Proceedings of the First International Conference on Artificial Intelligence and Law, (pp. 43-??).

Bing, J. (1989) The Law of the Books and the Law of the Files - Possibilities and Probabilities of Legal Information Systems, In Vandenberghe G, Advanced Topics of Law and Information Technology (pp. 151-??), Kluwer, The Netherlands.

Bray, J. Beyond HTML: XML and automated web processing,

http://developer.netscape.com/viewsource/bray_xml.html, copied Nov 1998.

Daniels, J. and Rissland, (1997) E. Finding Legally Relevant Passages in Case Opinions, Proceedings of the Sixth International Conference on Artificial Intelligence and Law, (pp. 39-??).

Dick, J. (1987) Conceptual Retrieval and Case Law, Proceedings of the First International Conference on Artificial Intelligence and Law, (pp. 106-??).

Greenleaf, G. (1997) The AustLII Papers - New Directions in Law via the Internet, The Journal of Information, Law and Technology (JILT) (2). <http://elj.warwick.ac.uk/jilt/leginfo/97_2gree/>

Hafner C., (1987) Conceptual Organization of Case Law Knowledge Bases, Proceedings of the First International Conference on Artificial Intelligence and Law, (pp. 35-??).

Moens, M-F., Uyttendaele, C. and Dumortier, J. (1997) Abstracting of Legal Cases: The SALOMON Experience, Proceedings of the Sixth International Conference on Artificial Intelligence and Law, (pp. 114-??).

van Noortwijk, K. and De Mulder R. (1997) The Similarity of Text Documents, Journal of Information, Law and Technology (JILT) 2. http://elj.warwick.ac.uk/jilt/artifint/97_2noor/default.htm

Poulin, D., Huard, G. and Lavoie, A., (1997) The other formalisation of Law: SGML modelling and tagging, Proceedings of the Sixth International Conference on Artificial Intelligence and Law, (pp. 82-??). http://www.droit.umontreal.ca/doc/csc-scc/en/index/permission.html

Stefik, M. (1995) Introduction to Knowledge Systems, Morgan Kaufmann Publishers, San Francisco

Sterling L., (1997) On Finding Needles in WWW Haystacks,

Proceedings of the 10th Australian Joint Conference on Artificial Intelligence (Abdul Sattar, ed.), Springer-Verlag Lecture Notes in Artificial Intelligence, Vol. 1342, pp. 25-36, 1997

Wittgenstein, L. (1968) Philosophical investigations, Blackwell, London

Zeleznikow, J. & Hunter D. (1994) Building Intelligent Legal Information Systems. Kluwer Law and Taxation Publishers, Deventer, The Netherlands.

[1] Throughout this paper, the terms concept identification and concept extraction are used interchangeably.

[2] Sometimes a third part, an Endnote, occurs which may contain a summary order and a certification by the associate of the authenticity of paper judgments.

[3] This is calculated by averaging case and segment category, alleged offences and opinion of the court using the stricter legal evaluation result. Results for individual initial structuring of cases are not given, only an average across all identifications.

[4] JUSTICE took seventeen seconds to extract the concepts from one hundred cases (4.56 megabytes) on a file system.

[5] The existence of a clear distinction between facts and law is not accepted by all jurisprudential philosophy, but is a satisfactory distinction for practical purposes.

AustLII: Copyright Policy | Disclaimers | Privacy Policy | Feedback
URL: http://www.austlii.edu.au/au/other/CompLRes/1999/35.html