Vocabulary standardization and semantic interoperability in education

The semantic content is an essential constituent of the models of innovative teaching methods. It is gradually built on the basis of normative efforts of various actors in the areas of the technological industry, telecommunications, information technology, language engineering, science information and documentation etc. Semantic networking represents actually an advanced step of a long process of digital information analysis since computer coding to semantic tagging and data retrieving. This process is now consolidated by a wide range of standards that provide very high levels of technical, organizational and semantic interoperability.

The e-Learning has already initiated a long process of standardization with the aim to achieve optimum levels of adaptation, integration and convergence of its assets and educational resources. Its objective is to establish a general interoperability operational framework (actors, tools, services and content) and to set up world-wide resource sharing mechanisms. These standardization efforts affect many facets of learning educational technologies. Semantics is one of the sides around which several international organizations and standardization structures are working. Normative progresses for e-Learning systems interoperability are already achieved by regional and international bodies such as the W3C, ISO and CEN. Education is in the process of developing its own references that will produce a form of a global educational ICT-based governance.

Among the most important normative references in use or in development, are those developed and adopted by standardization structures such as ISO/TC37 for standardization of vocabularies or ISO/IEC SC36 JT1 for standardization of e-Learning and ITLET (Information Technologies for Learning, Education and Training) terminology. These standardization bodies still conduct hard work for the improvement of semantic interoperable standards for education.

Vocabulary as main entry for semantics

Standardized terminology is indeed one of the foundations of what is being developed in semantic networks. In the context e-Learning, standardized terminology of ITLET is essential and constitutes a mandatory alternative for developers of metadata schemas and application profiles. Controlled (standardized) vocabulary solves problems occasioned by the diversity of sources used to represent data values of specific metadata elements. Apart from predefined lists of possible values ​​for metadata elements in an application profile, some items are increasingly associated with terminology records and registries where they take their representative values. Vocabulary is also important for content developers. They are increasingly involved in the SEO (Search Engines Optimization) of their scientific or educational productions using appropriate terminology ​​to meet common conceptual and semantic values. To cope with the exponential opening of networks and information systems, and to face the risk of dispersion of resources, standardization of terminology is acting as a uniting agent of separate initiatives and models. The e-Learning community works for some time to develop its own vocabulary tools in order to build specialized ontologies to be used in the construction of a worldwide semantic e-Learning network.

However, our question here, before progressing towards a more prescriptive approach to the analysis of the area related to the semantics and linguistic diversity of the e-Learning, is about the broad epistemological outline of terminology.

Terminology between vocabulary and concepts

Before going into the details of the terminology field, it is useful to define terminology and characteristics of currents thoughts that govern its development. Unless one is expert in linguistics, we often tend to confuse the terminology with the vocabulary or lexicon. We also tend to confuse the use contexts of expressions such as "Word", "Term", "Concept" and “Notion” though the difference between them is fundamental.

According to the Oxford dictionary, a vocabulary is “The body of words used in a particular language". The Cambridge dictionary online defines Vocabulary as “all the words known and used by a particular person”. In general, a "word" is a sequence of sounds or graphics characters forming a semantic unit which can be distinguished by a separator (typographic white written or oral pause). A vocabulary is a list of "vocals" (the term is not common) or "words" that were determined for the immediate needs for information or communication. A vocabulary may, however, be defined and controlled by a community for practical needs, without that meaning or relationship between words are defined. Once it is organized and structured in a hierarchy, a vocabulary get specialized and becomes a taxonomy with clear links between "words".

If we start from definition that terminology is the study of both specialized vocabularies and vocabularies that can also meet the current language, a first form of relationship between "Terminology" and "vocabulary" can be pictured: the terminology is a science and vocabulary is one of its application forms.

In terminology, a term is the inseparable combination of a denomination and an idea (scientifically called concept) which represents the meaning. A term is therefore a word in a language that refers to an underlying concept. The terms and their definitions are collected in terminological registries that preserve the relationships connecting one word with another. The most conventional relationships are the synonyms, antonyms (two words of opposite meanings), heteronyms (two identical words having different meanings) and relationships of "See Also" (a term referring to another). In heterogeneous information systems, data conflicts can exist between different areas (administration, health, trade etc.). The most frequent cases of this kind of conflict is the presence of synonyms (two different words with the same meaning), homonyms (words of the same spelling and same pronunciation but with different meanings) of hypernyms and hyponyms (designating hierarchy levels by the concept of "more general" or "more specific"). In an information system, terminological standards often take several forms: nomenclatures, thesauri, data dictionaries, business reference etc. They facilitate the linking of one term to the other and return from one document to another using their hyperlinks.

The "Notion" or the "Concept" is at a higher level of abstraction than the "Term" or "Word". This is, in linguist meaning, the relationship between a "signified" and "signifier" or "signified" and "referent" (Pottier, 1992). According to the Oxford English Dictionary online, the concept means generally “Senses relating to thought or understanding." From a philosophical standpoint it is “a general idea or notion, a universal; a mental representation of the essential or typical properties of something, considered without regard to the peculiar properties of any specific instance or example”. This is, in general, an abstract idea or a mental concept that differs from both the object being represented and the word or verbal statement used for its representation.

Teresa Cabré, director of the Institut Universitari Lingüística Aplicada of the University Pompeu Fabra in Barcelona, provides the following ​​definition: "The concepts or mental representations of objects are the product of a choice of relevant characteristics that define a class of objects and not individual objects' (Cabré 1998 & 1999). According to ISO 704:2000, in natural language, concepts can take the form of words, names, definitions or other linguistic forms. In artificial language, they can take the form of codes or charts formulas. They can take the form of icons, images, texts, diagrams or other graphic representations. The concepts can also be expressed with the human body as they are also in sign language, facial expressions or body movements. The "concept" is a fundamental notion shared by terminology and ontology.

The ontology is not easy to define generally and definitively because it is rooted in very different contexts such as philosophy, psychology, linguistics, computer science or artificial intelligence. However, we can start from the very common definition of Gruber that identifies ontology as an explicit and formal specification of a shared conceptualization (Gruber 1993). Conceptualization takes shape in the formal vocabulary of concepts, their relationships as well as assumptions specific to a particular area. The ontology provides a conceptual base shared between one or more communities on the basis of their common understanding of one or more areas. Unlike vocabulary, words or terms, ontology seeks to represent the meaning of concepts and relationships between them in a semantic network.

Indeed, concepts do not exist as isolated units of thought. They are always connected to each other in what is commonly called a "conceptual system". Our system of thought creates relationships between concepts and constantly refines them to determine whether these relationships are officially recognized or not. In organizing a scheme of concepts in any discipline, it is necessary to keep in mind the area of ​​knowledge that has led to the concept and examine the expectations and goals of the users affected by this discipline. If one starts from the definition of Rastier (Rastier, 1995), "every scientific discipline has an ontological function" from the moment it produces its own "conceptual system" through specialized terminology.

Since the advent of the Semantic Web, ontologies have experienced a resurgence of interest that results in a proliferation of domain ontologies. The heterogeneity of these specialized ontologies has been a problem of interoperability among information systems based on different ontologies. The integration of information technology are now one of the solutions that offer unified views on local ontological sources through a comprehensive ontology schemas (Benhlima & Chiadmi, 2006).

The semantic networking is broadly defined in the relations between the words constituting an ontology. It should be recalled here that the semantics are different from terminology by the fact that it is concerned with the relationship between the denomination and the signified, meanwhile terminology is primarily interested in the relationship between the real object and the concept that it represents. Relations between two terms representing concepts are generally dependent on interpretation that a human can make of them.

But in the current semantic digital networks, formal semantics can enable automatic controls to check consistency between information recorded to describe knowledge. Relationships such as "more general than" or "more specific than" may be formally designed by comparing the properties of different terms. This relationship of the general to the specific is the terminology used to describe links. It presupposes the existence of a term "summit" for each family. From an illustrative point of view, knowledge and relationships are now classified by taxonomies, thesauri or classification schemas. They take the form of conceptual graphs or semantic networks.

Vocabulary standards

In the development of international standards, it is therefore unavoidable to work on the basis of concepts defined through consensus so that the declination of these concepts in terms in as many languages would not be a simple translation of a source language to a target language.

We often take as example the designation of the successive educational levels which are far from being universal: kindergarten, primary school, secondary school (separated or not in college and high school), university or vocational education… It is clear in fact that the inter-cultural and linguistic terminology is much more than a one-to-one correspondence.

It is only by confronting the diversity in segmenting concepts, ruled by differences in cultures and languages that one can be assured to write interoperable international standards capable of responding to the multitude of worldwide diversity of uses. It is especially by this approach that it will be possible to localize these standards by translation or adaptation so that national actors can then use them, each in its proper cultural and linguistic context. This method is based on a theory named onomasiological (from the concept to the term), contrasting with a semasiological method (from the term to the concept) of vocabularies construction.

In the world of education, the development of terminology standards concerning technologies, descriptions of institutions, disciplines, certification modes (degrees, levels), teaching styles, legal contexts etc. are more and more necessary for the international circulation of educational and training resources. The constant spread of new modes of teaching by ITLET made teaching, training and learning the universal values ​​of development. The implementation of multilingual terminology resources, open educational resources (OER), which are a real "public good" is therefore essential in a context of respect for the languages ​​used in the world. However, at present, a global standardized interoperability is ensured, in part, by one language: English. The current offer of terminologies is totally inadequate because only a few languages ​​are available because of the terms-oriented methods and not concepts. Yet these resources are useful to all those who need to know, fully and reliably, the state of supply in technologies and digital training programs in the world. This is one of the functions of SC36 (full name ISO/IEC JTC1 SC36); that of not providing an entire corpus of an e-Learning multilingual terminology, but implementing the standard design procedures and development of these corpus and their appropriation by international experts, each context, language and culture.