Abstract: This article outlines the general approach to constructing a knowledge graph and discusses the characteristics and issues involved in constructing a knowledge graph in the legal industry.
Definition of Terms#
Ontology#
Definition: An abstract conceptual model abstracted from the objective world, expressing the concepts, properties, and relationships recognized in a domain.
Function: In a knowledge graph, ontology is the abstract representation of the knowledge graph, describing the upper-level schema of the knowledge graph. It reflects common sense or relatively constant knowledge and does not have intelligence value. It is referred to as the "schema layer" of the knowledge graph.
Example: "Plaintiff" is a concept in the field of civil litigation, and the plaintiff also has some related concepts, such as "claim."
Instances#
Definition: Concrete existence, properties, or relationships in the objective world corresponding to the ontology.
Function: The extraction and utilization of instances are crucial for the knowledge graph to obtain intelligence value. The formation of a knowledge base relies on the instantiation of the ontology, and instances constitute the "data layer" of the knowledge graph.
Example: For example, in a civil litigation case, "John Smith" is the plaintiff, and "John Smith" is an instance of the ontology "plaintiff."
Entity#
Definition: Simply put, it is the integration of ontology, instances, and relationships.
Example: For example, "plaintiff" is a concept in the ontology, which also specifies related properties such as "claim." "John Smith" is a specific plaintiff in a particular case, called an instance. Therefore, John Smith, the ontology concept "plaintiff," and the related properties are referred to as an entity.
Current Status of the Knowledge Graph Industry#
The analysis of the current status of the knowledge graph industry is divided into two parts: the classification and typical applications of knowledge graphs, and the construction patterns of knowledge graphs. The purpose is to understand the current industry technical solutions, clarify the positioning of legal knowledge graphs, and explore possible construction paths.
Classification and Typical Applications of Knowledge Graphs#
Knowledge graphs can be divided into open knowledge graphs (also known as generic knowledge graphs) and vertical domain knowledge graphs (also known as industry knowledge graphs). Open knowledge graphs contain important concepts, entities, and their relationships in almost all domains, emphasizing the breadth of knowledge coverage. Domain knowledge graphs are knowledge bases built based on specific domains or several specific domains, emphasizing the accuracy of knowledge.
The following are introductions to typical knowledge graphs collected and organized, focusing on graph names, entity counts, relationship counts, knowledge source fields, etc. These are common dimensions for evaluating and analyzing knowledge graphs. When constructing a legal knowledge graph, they are also the focus of attention.
Open Knowledge Graphs (Generic Knowledge Graphs)#
- Open knowledge graphs, also known as generic knowledge graphs, include some typical cases in China.
- Some typical open knowledge graphs in foreign countries
Vertical Domain Knowledge Graphs (Industry Knowledge Graphs)#
- Vertical domain knowledge graphs, also known as industry knowledge graphs, include typical applications in China.
Comparison of the Two Types of Graphs#
Vertical domains themselves have all the characteristics of knowledge graphs and should also absorb various technologies of open (generic) knowledge graphs to promote their own development.
However, it should be noted that due to the characteristics of vertical domains, compared with open (generic) knowledge graphs, there are significant differences in knowledge characteristics, knowledge sources, application domains, and audiences. Especially in the construction methods, there is currently no unified and mature construction process, and key technologies such as knowledge acquisition and knowledge fusion are still in the exploratory stage.
Construction Patterns of Knowledge Graphs#
It is generally believed that there are three construction methods for knowledge graphs: bottom-up, top-down, and a combination of both. The main difference between the first two methods is the order of "ontology construction" and "instance extraction." The specific analysis is as follows:
Bottom-Up Construction Mode#
Construction Method: Step 1: Instance extraction. First, extract instances, relationships, etc. from some unstructured and semi-structured data sources and add them to the knowledge base to form the data layer. Step 2: Ontology construction. Abstract concepts from the processed data layer to form the schema layer.
Application Scenario: Suitable for constructing knowledge graphs with large amounts of data, such as DBpedia, zhishi.me in the encyclopedia category, and WordNet, Da Cilin in the field of linguistics. It is mainly used for semantic search, emphasizing the breadth of knowledge, and does not require high accuracy of knowledge.
Main Disadvantages: Difficult to construct a standardized ontology layer, low accuracy.
Top-Down Construction Mode#
Construction Method: Step 1: Ontology construction. Start with the top-level concepts to build the top-level ontology, then refine concepts and relationships to form a well-structured concept hierarchy tree, using some data sources to extract ontology, namely ontology learning. Step 2: Instance extraction. Fill the extracted instances and relationships into the constructed schema layer ontology to form the data layer of the knowledge graph.
Application Scenario: It is suitable for specific domains and can perform knowledge reasoning to achieve functions such as auxiliary analysis and decision support, such as the Traditional Chinese Medicine Case Knowledge Graph. Industry knowledge graphs require professionalism and accuracy, which also requires a strict ontology layer schema.
Main Disadvantages: Strong dependence on manual work, limited ontology updates by professional personnel. Generally applicable to the construction of knowledge graphs with small amounts of data.
Hybrid Mode#
Construction Method: Step 1: Initial instance extraction. Preliminary instance extraction of data. Step 2: Ontology construction. Summarize and summarize new knowledge and data based on the extraction results to assist ontology construction and iteration. Step 3: Instance extraction. Perform a new round of instance extraction based on the updated schema layer.
Application Scenario: For example, Baidu Knowledge Graph is constructed using a hybrid method by using internal and external data as well as user data.
Main Issues: The basis for initial instance extraction is unclear and may require certain basic or pre-processing experience.
Summary
It is generally believed that the three methods of knowledge graph construction have their own advantages and disadvantages. The top-down method better reflects the hierarchy between concepts but has a strong dependence on manual work and limited ontology layer updates. It is only suitable for constructing knowledge graphs with small amounts of data. The bottom-up method updates quickly and supports the construction of knowledge graphs with large amounts of data but has high knowledge noise and low accuracy. The hybrid method is flexible but has a high difficulty in constructing the schema layer.
Construction Ideas for Legal Knowledge Graphs#
Legal knowledge graphs have strong domain characteristics. In the construction of knowledge graphs, in addition to considering the feasibility of technical paths, it is also necessary to analyze industry needs and focus.
Conflict Between Legal Thinking and General Big Data Thinking#
Conflict of Deductive Thinking
Law is a kind of social norm with a different pursuit from natural sciences. The enforcement of law is based on deductive reasoning within the framework of syllogism. In contrast, the epistemology of general big data emphasizes empirical induction.
Conflict of Causal Thinking
Causality occupies an important position in legal thinking and legal methods. This is incompatible with the "causal analysis" emphasized by empirical general big data thinking.
Conflict of Reasoning Thinking
The judicial process is usually regarded as an important consensus-building mechanism. Any decision must be based on proof, reasoning, and deliberation. Therefore, legal thinking emphasizes interpretive reasoning. Currently, deep learning algorithms, especially neural network algorithms, which are more commonly used in general big data, are continuously questioned by legal professionals due to their lack of interpretability.
From "Data-Driven" to "Knowledge + Data-Driven"#
The above industry knowledge graph construction experience and the analysis of the unique thinking conflicts in the legal industry provide guidance on the construction methods of knowledge graphs:
Based on legal theory, scientifically determine the ontology of the legal domain as a prerequisite for knowledge graph construction#
First, as a typical industry knowledge graph, it is necessary to set the domain ontology in advance to clarify the boundaries of mining and analysis. Secondly, to address concerns about interpretability, it is necessary to involve legal theory in the construction of the ontology layer. For example, theories such as the "four elements" and "three-tier hierarchy" for criminal offenses, and the "basis of claim" theory for civil cases can be used to clarify the structure of ontology construction and the relationships between its parts. In addition, the construction of the ontology of the legal domain needs to be related to business scenario requirements and form different sub-domain ontology sets for different legal materials.
Based on legal knowledge, finely divide and extract the data set as necessary preparation for instance extraction#
Legal data materials are complex and diverse, with various types and varying values. Poor input data will inevitably produce incorrect output. To avoid "garbage in, garbage out," it is necessary to classify and identify legal material data based on legal expertise. For example, when using judicial documents, factors such as the time nodes of legal changes, case types, and regions need to be considered.
Based on legal knowledge, define instance extraction rules in detail to ensure data quality and accuracy#
The definition of instance extraction rules also needs to be combined with legal expertise. For example, in the use of judicial documents, the same legal concept may appear multiple times in the text, but the determination of the concept may be contradictory before and after. For example, in criminal cases, whether the defendant constitutes voluntary surrender may have completely opposite opinions from the prosecution, the defendant, and the court. According to the writing logic of judicial documents, the final determination is written in the judgment analysis section (starting with "The court believes..."). The extraction of specific elements can only be limited to specific paragraphs to ensure accuracy.
Legal theory should also serve as an explanatory framework for analyzing the results of knowledge graph applications#
In the process of interpreting the output results of the knowledge graph, the reasoning basis and process need to be explicitly displayed. For example, for the recommendation function of similar cases, it is necessary to clarify the basis for judging similar cases, such as simultaneously satisfying specific legal ontology elements and consistent relationships between ontologies. Through the explanatory framework determined during ontology construction, practical cases can be explained.
References:
[1] Huang, H., Yu, J., Liao, X., et al. (2019). A Review of Knowledge Graph Research. Computer Systems & Applications, 28(6), 1-12. DOI:10.15888/j.cnki.csa.006915.
[2] Chen, Y., & Xing, X. (2021). Research on Dynamic Knowledge Graph Construction Technology Based on Ontology Modeling. Journal of Southwest Minzu University (Natural Science Edition), 47(3), 310-316.
[3] Wang, L. (2020). On the Construction of the "Domain Theory" of Legal Big Data. Chinese Jurisprudence, (2), 256-279. DOI:10.14111/j.cnki.zgfx.2020.02.014.