developerguide

  Previous: Making Your Tools into Index: Developer Guide Next: Combinatorial Comparison System Manual

Sections:

U-Compare Type System

The U-Compare type system is designed to be both shared and to be used in comparison and evaluation.

The U-Compare type system descriptor file can be found as below. Launch UCLoader once, unarchive u-compare.jar file in the [your.user.home]/.U-Compare/jars directory. A jar file is compatible with zip so you can use unzip tools to unarchive. (If you need the developmental version, please unarchive u-compare-devel.jar.)

In the unarchived directory, /org/u_compare/U_compareTypeSystem.xml, is the type system definition file. Please note that the xml file should be located under /org/u_compare/ path on your classpath setting, i.e. in the descriptor files it should be referred as <import name="org.u_compare.U_compareTypeSystem">.

Currently we have this single file for the type system, but we plan to divide the file into a couple of sub type system files. In this case you will find <import> declarations in the main U_compareTypeSystem.xml file, which refer to the dependent sub type systems. This is just for browsing the type system, it is enough to specify the main type system only as above for using the type system.

Shared, But Not Common, Type System

We are not planning to make our type system as a common single type system. It is apparently impossible to create a single common type system, because the concept or category which a type represents could be more or less different, even for the level of the indivisual person. However it is also apparent that lacking the type system compatiblity largely spoils the interoperabiliy which UIMA could provide.

Then the solution would be creating type system converters. Because creating a type system converter whenever a new type system is created/updated is not realistic, using a shared type system which could bridge local type systems would be the better way.

Information loss is more or less inevitable when converting type systems, but the less the better. We have designed the U-Compare sharable type system to decrease such information loss as less as possible. Firstly, by making types as hierachical as possible, we can use the intermediate types to share the middle level category information. Secondly, UniqueLabel and its descendant types (like the Penn Treebank tagset) to make category labels unique, rather than the text string feature, assures the higher level of the uniqueness.

Type System for Comparison and Evaluation

One of the central features of U-Compare is to compare similar annotations over different components or evaluate versus gold standard data. Because we can only use the types and type system but not the instance information in the descriptor level, U-Compare detects which annotations to compare, based on the types and their type system hierarchy. It turns out to require the type system design to be used for comparison, i.e. hierarchical enough and organized properly to share the ancestor types.

Currently the Apache UIMA type system should be in the tree structure class, but Apache UIMA will provide the multiple inheritance as ECore compatible in future along with the UIMA specification. Then we can make more proper type system hierarchy.

Although the basic direction is clear as above, it is very difficult to express the real world concepts in such a hierarchy, as the researches of the ontology building show. Actually it would be impossible to create a single unique type system. Even how deep the type system should be is a large problem, there seem to be no clear solution exist.

Using and Extending U-Compare Types

First you need to find a proper supertype which is expressing the type of your instance, in the U-Compare type system. Then it is recommended to create your own type by extending that supertype, defined in your own type system descriptor file. It allows you to change the supertype without modifying the code so much, also allows you to add your own features. U-Compare type system will be expanded continuously, there might be more suitable supertype appear in future.

Covered Field

Currently the U-Compare type system covers basic syntactic/semantic/document annotation types. If you are planning to contribute your component but have any type of annotations not covered by our type system, please contact us.

As for the explanation of each type, please refer to our paper (Kano, et al., 2009) presented in NAACL SETQA workshop. The paper can be downloaded from http://www.aclweb.org/anthology/W/W09/W09-1504.pdf, which includes diagrams of the main part of the type system and overview description for each type.

Previous: Making Your Tools into Index: Developer Guide Next: Combinatorial Comparison System Manual