Automatic Generation of Ontologies from Tabular Structures on the Web
Efficient automatic information handling has become increasingly important in information society. Most information on the Web is presented in the form of semi-structured or unstructured documents, encoded as a mixture of loosely structured natural language text and template units. The lack of metadata, which would precisely annotate the structure and semantics of documents, and ambiguity of natural language makes automatic computer processing very complex. The Semantic Web aims to overcome this bottleneck.
The central contribution of the dissertation presents a novel method for automatic generation of knowledge models such as ontologies from arbitrary tabular structures found on the Web. An accompanying implementation is reflected in a system named TARTAR (Transforming ARbitrary TAbles into fRames) which is a component of the multi-agent system OntoGeMS (Ontology Generation Multi-agent System). The method is based on a grounded cognitive table model introduced by Hurst. The methodology is stepwise instantiated in four steps. In the first step, a table is transformed into regular matrix form. In the following two steps the table is handled from a structural and functional, and in the last step from a semantic point of view. The outcome of the method is threefold: a knowledge frame, an ontology, and a knowledge base, all encoded in an F-Logic representation language. The frame makes explicit the meaning of cell contents, the functional dimension of the table which is comparable to the relational schema, and the meaning of the table based on its structure. In the ontology the concepts are arranged into a directed acyclic graph, where the arcs represent relations among concepts and also the types of relations. The table content is formalized according to the frame into the knowledge base.
The empirical evaluation is performed from four perspectives. The efficiency of the method is measured according to the portion of correctly transformed tables belonging to two domains, tourist and geopolitical, enabling us to prove the domain independency of the approach. Usability of the approach clearly shows the syntactic and semantic correctness of generated frames that are compared to the manually annotated frames. Approach applicability is shown from two views. By querying the content of tables encoded in the knowledge base, it is shown that returned answers are true and complete in all cases. The querying is enabled by the use of the inference engine OntoBroker. In the last case of the evaluation we make use of the automatically generated ontologies for automatic construction of wrappers. Ontologies generated in this way can substitute hand-crafted heuristics that are used as a foundation for wrapper construction tasks. The benefits are clearly shown in terms of better adaptability, easier extensibility, and domain independence.
The present research work opens a number of potential for further research in information handling and the promotion to the Semantic Web.
- ontology learning/generation
- semantic web
- tabular strucuture
- information extraction
- intelligent agent