PERO2: Machine Teaching based on a Normalized Ontological Knowledge Base

— In order to extend the deployment of Machines Teaching in educational institutions, by facilitating their appropriation by human learners (teachers and students) and to explore the motivation of interactive Machines Teaching to solve problems by making a human as an indispensable pillar of the teaching process. This research framework is part of PERO2 project (intelligent system for learning of reasoning and problem-solving dedicated to the physical science domain specifically the teaching subdomain “Electricity”) which is about the integration of a semantic layer within the architecture of PERO2 this by means of representing the system’s knowledge base via a normalized domain ontology and then integrating the exploitable ontological knowledge base instead of a relational database. To design this ontological knowledge base, we proposed in the current research work a hybrid construction method taking into account two main phases: (*) Conception phase of our domain ontology called “OntoPhyScEx”. (**) Normalization also called Semantic validation phase of this domain ontology. As for the integration and exploitation of this domain ontology, it‘s been discussed in another paper.


I. INTRODUCTION
Learning of Reasoning and Problem-Solving remain fundamental issues in the curricula of formations of the students at the secondary level. Physical sciences constitute a privileged domain in learning of reasoning; in fact it allows the use of rules, laws, theorems and properties for different types of deduction in order to carry out the reasoning.
PERO [1] is a machine teaching based on an autoexplanatory resolution model that allows solving exercises of Electricity domain by generating an explanation related to each step and which leads to the resolution. Our Contribution is to improve the capacity and efficiency of the first version of the system (PERO1) by developing a second version of this one (PERO2). Such improvement takes into account the following points: Provide a declarative knowledge base to our system; PERO1 uses a MySQL relational database to store its knowledge. The drawback of traditional database is its low availability, particularly if the volume of data flowing is important. Given the amount of data handled by this system are relational data, they do not enable semantic processing and reasoning on them, it is vital to store and organize these declarative knowledge in an ontological knowledge base.
Broaden the scope of PERO2 system to touch several subdomains of physical science such as (thermodynamics, mechanics, etc.). Thus, our overall proposal addresses the following two steps: firstly the conception of domain ontology of physical science to describe the concepts and relevant properties, secondly the integration of this one within PERO2 system. In this paper we focus on the conception and construction step, whereas the integration step it has been discussed in [2].
The conception and construction step results an ontologicalbased model of formal quality criteria related to the form of the categories of knowledge to which belong the elements of ontology: Firstly it relies on the definition of the characteristics of concepts called meta-properties [3] which require rigorous analysis to define concepts and their links in a formal explicit manner and independent of any domain in order to structure the hierarchy of the taxonomy, then the modularity criteria by decomposing ontology to a set of modules. These qualitative criteria have resulted in an explicit ontological engagement, normalizing the meaning attributed to the concepts being used and then produce a reusable ontology.
To practically illustrate this model we choose a use case of RLC Electrical Circuit (see section III-A-5). The system is supposed to explore the model in order to infer the relevant concepts.
The rest of this paper is organized as follows, in Section II we shall survey related works in the construction of the domain ontologies, Section III we introduce our construction methodology by giving a detailed description of the first phase of the construction of the original taxonomy and the second phase of the semantic validation of our ontology, Section IV contains a discussion on the integration of our ontology "OntoPhyScEx" within the architecture of PERO2 and Section V concludes research work with future directions.

II. RELATED WORK
Several research studies have addressed the issue of constructing ontologies and many methodologies have emerged. We can divide them onto two categories: Methodologies based on Software Engineering, consider ontologies as software components fitting into computer systems and bringing them a semantic dimension [4] in the METHONTOLOG project, [5,6] under DYNAMO project, [7] and [8] under OntoDB, [9] as part of the SISRO method, etc. The ontology development process uses activities related to project management, development activities and integral activities. Project management regards the smooth process, which includes planning commands tasks, supervision and quality assurance. Development activities include the construction of ontology by working through its specification, its conceptualization, its formalization, its implementation and its maintenance. As to the integral activities, they serve to support the development and include knowledge acquisition management, integration, evaluation, documentation and configuration management.
Methodologies based on engineering knowledge, we mention large families: (a) semi-automatic methods rely on software tools to build ontologies such as, ARCHON [10] TERMINAE [11], DAFOE [12,13,14] or ONTOLEARN [15]. Though, their drawbacks arise from the semantic validation which is a qualitative characteristic to verify the relevance of the hierarchy of ontology. Indeed, these semiautomatic methods use dedicated software tools to achieve this validation, without systematic intervention of specialists who daily use the knowledge thus modeled (no collaborative communication between different ontologies actors involved during the conception ontology phase). (b) Methods based on a manual construction process as "OntoClean" methodology proposed by [17], "OntoSpec" methodology proposed by [18] that privileges a conception to model a domain. The advantage of this model requires a collaborative work between different actors involved in domain and enables the possibility to return to previous steps in case of anomalies.
These methodologies are focusing on the representation of knowledge in the computer systems following three levels (i) informal: aims to identify the specific knowledge domain model based on the analysis of textual corpuses. (ii) Semiformal: consists in representing knowledge of previous level in the form of taxonomy or a semantic graph composed of concepts and relations between concepts and (iii) formal: transform the taxonomic of the second tier in formal language of knowledge representation.
Our contribution proposes an ontology construction method to represent the domain knowledge of Electricity which will be adapted and integrated within PERO2 system [16]. Our choice is a hybrid method based on the methodology of [17] which requires manual modeling during the ontology construction phase.
The implementation of such ontology requires two phases ( Figure 1) : a construction phase of the original taxonomy taking into account the steps of: (1) specification of the domain knowledge based on a textual corpus analysis, (2) conceptualization (3) internal Structuring of concepts (4) Define the extensional relations of concepts (5) instantiation, while ensuring an ontological refinement throughout these steps. A semantic validation phase taking into account (6) normalization of semantic meaning lent to the concepts by using the meta-properties [19] and the implementation of this normalization in order to ensure modularity. (7) The formalization of the ontology in a formal language in order to warranty relevant criteria during the construction process.
The knowledge domain to be modeled is exercises of Electricity remaining on Electrical Circuits that will be used as a use case. The next section will address the conception phases of the domain ontology "OntoPhyScEx".

III. OUR HYBRID CONSTRUCTION METHODODOLOGY
Our domain ontology construction process follows the following two phases: the initial taxonomy extraction of concepts from valid textual data corpus throughout the common basic steps of ontology construction [20] and taxonomy validation by respecting what is advanced by [21,22] and [23] in order to have an hybrid method that meets the requirements of a conceptual domain ontology.
A. The initial taxonomy extraction phase of our ontology concepts This phase is devoted to the specification of the domain knowledge, data conceptualization, internal structuring of concepts, define the extensional relations and finally instantiation. These steps form the basis for the construction of an initial taxonomy invalid formally and semantically. The domain ontology that we seek to model is "the domain of Electricity". By the way, we identify the ontology concepts related to the physical and mathematical knowledge (e.g. Law, theorem, etc…) which are involved in the process of resolution of the exercises.

1) Specification of the knowledge domain
To formalize the goal of our ontology: it is possible to set jurisdictional issues which are tangible examples of questions that ontology must answer: -What are the other disciplines covered by our ontology?
-What does our ontology going to serve?
-Is it able to solve the exercises of physical science?
-For what kind of exercises our ontology must help us to solve?
-The users of ontology are on one hand, learners who seek solutions of the exercises; those match their education levels, and the other hand, domain experts (teachers) who propose exercises!

2) Conceptualization
The ontology should conform to the set of specifications defined beforehand via interviews with the domain actors (learners and teachers) and resources data analysis relating to the domain corpus range (educational books, tutorials series and websites of physical science). It is important to establish an exhaustive list of knowledge thus extracted from the corpus to clarify their conceptual nature (concepts, relations, properties, theorems, laws, rules, lemmas, constraints, etc.). In this list, we identified the most salient concepts in the top level domain (top ontology level = Mechanics, Electricity and Magnetism, Light, Vision, Sound, Hearing, Relativity, Astrophysics, Quantum, Nuclear Physics, Condensed Matter, Heat, Thermodynamics), then they are specialized according to Top-Down's approach [24] we start with a definition of the most general domain concepts or the most important and we continue with the specialization of concepts). Figure 2 shows on the left a brief example of extraction of domain terms of physical science and on the right an initial hierarchy of these concepts.

3) Internal structuring of Concepts
A concept is equipped with a referential semantic imposed by its extension (combines the objects manipulated through the concepts) and differential semantic imposed by its intent (expressed in terms of properties, attributes, rules and constraints) [25]. These properties are of two types: (1) the Data Properties allow connecting individuals (instances) of concepts to data values (e.g. String, number, Boolean, enumerated) and (2) Object Properties allow liaising instances of concepts to other individuals (e.g. composition, aggregation, association, etc.).
Attributes can have several facets describing the value type, allowed values, the number of values (cardinality), and other characteristics of values that the attributes may have. The cardinality of the attributes allows specifying a minimum and maximum Cardinality to describe more precisely the number of attribute's values. The concept that has an instance type attribute is often called an extended attribute.
( Figure 3) shows an example of internal structure of Resistor's object which describes these characteristics; in most cases, Resistor's object is presented with color rings (bands) around it. Each color corresponds to a digit. The mapping between numbers and colors of the bands is named (Resistor Color Code): This code is used to determine the value and type '4-band, 5-band or 6-band' of "Resistor".
Other properties can be specified as 'Tolerance', 'Temperature-Coefficient' and physical size 'Shape'.

4) Define the extensional relations of concepts
A concept has a referential semantic imposed by its extension [10] provides a connection by reference to other domain concepts using off set theory operations (reunion, intersection, complementary ...), laws concerning relations (symmetry, reflexivity, transitivity, ..), and the laws of logical axioms. The most important relation that involves semantic commitment of subsumption is the particular binary relation 'is-a': A concept will be subsumed by another if and only if its extension is included in its parent: A concept c1 subsumes a concept c2 if c2 is more specific than c1, and the instances relating to the concept c2 will be instances of c1, on the other hand only a part of the instances of c1 will be instances of c2.
( Figure 4) shows a use case of Electrical Circuit, the concept parent "Electric Circuit" has two concepts child: "Direct Electric Circuit" and "Alternative Electric Circuit", and each of two concepts have some sub-concepts. The set of concepts is structured hierarchically within a network of concepts, and are linked by conceptual properties of type 'is -a' and semantic relations. To simplify the example, the sign '^' indicates that the concept is subsumed by other concepts.

5) Instantiation
Instances also known as (Individuals) are the basic unit of ontology; they are the things that the ontology describes or actually could describe. Instances can model concrete objects. Therefore, constitute a formal part of ontology and are a way to describe the interest entities (see example below): ( Figure 5) shows the conception of our use case of RLC Electrical Circuit in Alternating Current in our domain ontology:  Concept "RLC AC Electrical Circuit" has instance "RLC AC Electrical Circuit_1" and an extended Data Property (AC-Voltage = instance-of (Class AC-Voltage)).
 Concept "Resistance/Impedance" has instance "R = 40Ω" where R is the value of Data Property (Has-Symbol: String) of concept "Resistance". 6) Ontological progressive refinement process integrating the steps of construction of the original taxonomy in order to improve it, taking into account the domain that we are modeling, applications and uses of ontology as well as the quality of ontology itself.
The application of the five steps outlined above enables to produce taxonomy as a conceptual graph corresponding to ontology sub-modeling. The nodes of this graph are specifically the concepts of Electricity and arches are the relations between concepts. (Figure 6) illustrates part of our ontology. We admit that the relations between the concepts are a specific relation 'is-a' and we seek to correct these relations through the validation phase discussed in the next section.
The Teaching domain ontology should conform to the set of specifications defined beforehand via interviews with actors of the domain (learners and secondary school teachers) and analysis of data resources related to the corpus of application domain (books, tutorials and web resources of the physical sciences of secondary school).

B. Semantic validation phase (Normalization)
Normalization of ontologies is a notion which finds its origins in the normalization of information systems for databases. In fact in the case of a relational model, when modeling a domain, a relational database is often developed on the basis of a conceptual model represented in a modeling language such as (Merise 1 , UML 2 etc…) is expressed in terms of: classes, properties, relations, instances and constraints of a problem domain. An analogous normalization has been discussed in [17, 23, 26 and 27]. The goal of this normalization is to add constraints to the construction of ontology so that the source ontology meets the five criteria proposed by [22]; (1) Accuracy of domain, (2) reuses (3) modularity, (4) maintainability, (5) scalability. The semantic validation requires passage through two phases:

1) Normalization of semantic meaning lent to concepts
It depends on the human interpretation of meanings of terms to be represented in the ontology. The same concept may have different meanings carried out by different domain experts. The aim is about to explicit the meaning given to this concept by associating an independent interpretation of its context of use. To ensure this normalization, the choice is brought on the methodology "OntoClean" proposed by [26] based on the definition of meta-properties of concepts (identity, strength, unity, dependency) in order to structure and to test the coherence of the hierarchy by imposing some constraints on the use of these meta-properties:

Identity : a concept Ø carries an identity property, if it exists an Identity Condition (IC) of this concept allows
to conclude as to the identity of two instances of this concept. For example, the concept "student" carries an identity property linked to the "number" of the student, two students being identical if they have the same number.
Rigidity : a concept is rigid if each instance of concept holds rigidity property to exist. For example, the concept "Person" is rigid, but "Student" is not rigid.

Dependency : a concept C1 is dependent on concept C2 if for any instance C1, there is an instance of C2
which is neither part nor constituting the instance of C1. For example, "parent" is a concept dependent on "child" (and vice versa), because the existence of a parent implies that of a child.
a) The meaning adjustment process lent to the concepts of our ontology takes place as follows: Assigning meta-properties to concepts: (Figure 7) shows the assignment of these meta-properties applied to the initial taxonomy of ( Figure 6). We combine beside each concept notations of meta-properties by bold letters preceded by the sign "+", "-" or "~" corresponding to: carrying the metaproperty, not carrying the meta-property, and `anti` metaproperty [27]. These notations are assigned on a simple and natural intentional reasoning. The assignment of meta-properties discussed by [17] requires a combination of these meta-properties (which are not independent). This combination produces eight types of properties that help to structure the taxonomy hierarchy and that are classified into: Sortals 3 : "Types, Quasi Types", "Mixins", "Material Roles", "Sortals Phased". Non-Sortals : "Attributions", "Formal Roles", "Categories". (Table 1) below shows a detailed description on the choice of these meta-properties attributed to each concept of our domain ontology.  [17] on the hierarchy of the taxonomy throughout the assigned metaproperties, to help the designer to infer modeling inconsistencies in the hierarchy. The verification of these constraints on the taxonomy of ( Figure 7) shows the presence of the following anomalies : Violation of Unity constraint (+U can't subsume ~U) : The concepts "Resistor", "Inductor" and "Capacitor" having the Unit Property +U are subsumed by "Electric Circuit" that carries a Unit Property ~U. The "Electric Circuit" concept with a Unit Property ~U, subsumes the concepts "Electric Field" and "Magnetic Field" having the Unit Property +U. This means that there is a confusion between specific relation of constitution and subsumption relation 'is-a'. In other words: "Electric Circuit" is not "Resistor" but they consist of "Resistor". The concepts "DC Electric Circuit" and "AC Electric Circuit" cannot subsume respectively "Ohm's Law" and "Kirchhoff's Law" because these links violate the constraint of Unity (~U can't subsume +U). A law is a generalized description of a model applied to use cases, so it will be appropriate to use a more specific relation than a simple 'is-a'' relation.

Violation of Dependency constraint (+D can't subsume -D) :
The concept "Electricity" carrying the Dependency Property +D, subsumes "Amount of Electric Charge" that carries -D. This means that there is confusion between subsumption relation and a specific relation of constitution. In other words: "Amount of Electric Charge" is not "Electricity" but is a component of "Electricity". The concept of "Electricity" cannot subsume "Electric Circuit", "Capacitor", "Inductor" and "Resistor" because (+D can't subsume -D), rather a specific relation will be more appropriate as already explained with "Amount of Electric Charge". Same explanation to "Magnetism" and its relation to "Magnet".
Violation of Identity constraint (+I can't subsume -I) : The associated link "AC Electric Circuit" or "RLC AC Electric Circuit" to the "Impedance" property was removed because of incompatible IC (+I can't subsume -I), same thing for "Ideal DC Electric Circuit" to "Resistance".
Violation of constraint (Properties with incompatible ICs) :We know that "ohm's law" and "Kirchhoff's law" are disjoint, "Electricity" and "Magnetism" although they have an inconsistent identity. For this, it is better to add a new concept "Electric & Magnetic Laws" assigned by the combination (+O+U-D+R) which provides its own condition of identity (CI) and which subsumes "ohm's law" and "Kirchhoff's law". Finally the concepts "Capacitor", "Resistor" and "Inductor" (+I-O+R-D+U) should be subsumed by an "Electric Components" concept assigned by the combination (+O+R-D+U) and providing its required instead of being directly subsumed by "Electricity".
( Figure 8) shows the result of correction of the taxonomy in (Figure 7) within the constraints of the 'OntoClean' methodology. This taxonomy identified a set of disjoint and structured concepts (rigid) called primitive concepts, and defined concepts (non-rigid) which represent the backbone taxonomy.

2) Implementation
The previous step of normalization provides an independent explicit analysis of the ontology of any implementation tool. [23] has proposed a methodology for the implementation of this normalization in a formal language. The purpose of this one is to achieve modularity which aims to decompose the ontology taxonomy to a set of hierarchies (modules) homogeneous disjoint. This hierarchical decomposition must meet the following four criteria of normalization: -The branches of each hierarchy should form homogeneous disjoint trees. i.e. no domain concept should have more than one primitive parent.
-Each branch of the hierarchy of primitive concepts in the taxonomy domain must be uniform and logic, namely the principle of specialization should be subsumption and should be based on identical or progressively narrower criteria throughout.
-The hierarchy of primitive concepts should clearly distinguish: The "Self-Standing" concepts correspond to all types of concepts to represent the physical and conceptual world for example, ideas, processes, human beings live, organizations, etc. The "Refining" concepts are concepts that represent types of values or quantitative or qualitative values. e.g. "small, medium, large, mild, moderate, severe, etc…".
-The axioms, the constraints of 'range' and 'domain' should never imply that any primitive concept of domain is subsumed by more than another primitive concept of domain.
The consequences of such decomposition is to support the evolution and the update of the ontology following the requirement changes (e.g. the context of use is changed, or the domain knowledge is expanded) in the ontology. Such changes must lead to updates in a small number of modules.
( Figure 9) shows the implementation of normalization that produces a decomposition of our ontology into two hierarchies: hierarchy of Self-Standings concepts that correspond in methodology « OntoClean » to "Types, Quasi-Types", "Categories" and some of concepts used to build the representation of types "Formal Roles" and "Material Roles". Hierarchy of Refining concepts that also match types "Attributions", "Mixins". Figure 9. Result of implementation of Normalization phase according to [23,26].
Linking the taxonomies skeleton of Self-Standing and Refining is ensured by definitions (indicated by "⊂, ≣") and restrictions (indicated by "→"), (Table 2) below shows an example:

3) Formalization
After normalizing our ontology, we end this work by the transcription of concepts throughout a formal and operational language of knowledge representation. The resulting ontology is operational in the sense that it can include reasoning mechanisms. The ontology we are considering in the implementation of this method is the format OWL2 4 . This model offers designers a vocabulary consisting of a set of symbol, variables, functions, constructors, and predicates to define concepts in a domain in terms of classes, properties and relations.

WITHIN PERO2
The architecture of PERO2 intelligent system for reasoning and solving math problems applied to physical science is based on an AMASystem (Adaptive Multi-Agent System) incorporating our domain ontology "OntoPhyScEx". The architecture of PERO2 consists of two layers: Adaptive Multi-Agent Layer and Ontology Integration Layer The Adaptive Multi-Agent System layer : Allows interaction of intelligent agents, sharing formative educational resources and cooperate among them in order to dynamically produce resolutions to the various exercises of physical science. Each agent has its own behavior and communicates with its environment by sending messages. This layer considers four agents: -Planner Agent: plans and organizes the solutions to the learner.
-Explainer Agent: generates corresponding explanations for each State of the solution plans.
-Indexer Agent: assures adding indexing to the solution of exercise.
-Mediator Agent: it is an intermediate agent between the learner (the user) and other agents of the system; Receives requests from users and interacts with the other agents of our system to supply a response.
The Ontology Integration layer : Within the framework of increasing the intelligence in our system and to explore its potential, we favored the addition and integration of a semantic layer, which consists of a formal representation of the declarative knowledge based on ontology, coupled with an inference mechanism to analyze and reason on ontology. The usefulness of this ontology is to enable interaction in direct mode when interrogating the system instead of delayed mode (the case of database). Thus, it supports communication between agents; while following the rules of inferences defined in this ontology. For more details please refer to [2].

V. CONCLUSION And Future Scope
The outcome of the current research is an ontological model that represents semantically the system's knowledge that are related to Electricity domain, starting from various pedagogic and practical concepts involved in the different exercises and their solutions. The benefits of this, is not only limited to a semantic representation of knowledge, but also to integrate the ontological model in order to explore the fully potential of this model by means of performing an analyze and reasoning carried out by system's algorithms.
We also look forward throughout this work. In one hand, to extend the representation of different subdomains of physical science. We consider each subdomain as an instance of a defined ontological-based meta-model while ensuring the modularity criteria in terms of several instances ontology models.
In the other hand, we will ensure the integration via a semantic layer which is based on the ontologies models and machine learning algorithms dedicated to perform analyze and reasoning of data to support the decision making of PERO2 system, thus to supply learners with appropriate explanation of solution of exercises.