EnhancementStructure

From IKS Project
Jump to: navigation, search

FISE Enhancement Structure

FISE Enhancement

Each enhancement in FISE is an description that contains the following metadata

  • rdf:type fise:Enhancement: Specifies, that this description is a FISE enhancement
  • dc:creator: the Enhancement Engine creating the enhancement
  • dc:created : the local system time, when the annotation was created
  • fise:extracted-from: the content item this enhancement is part of. This links to the ID of the content item as assigned by FISE.
  • dc:type: the type of the enhancement (e.g. heading, section, sentence, Language, Location, Person, Location, Entity ...). The intension is to define some often used types within the FISE specification, but keep it open for extensions by enhancers. Values should be URIs defined in some controlled vocabulary. If there is also an Ontology available for such types, they may be also defined as additional rdf:types
  • dc:requires <fise:Enhancement>: Specifies that this enhancement depends on an other annotation. This reflects, that the referred annotation had an huge impact in the calculation of this enhancement.
  • dc:relation <fise:Enhancement>: Specifies that this enhancement is related to the referred annotation. This indicates this this enhancement is still valid if the referred annotation is changed/removed.
  • fise:confidence <float>: The level of confidence in the range from 0 to 1

FISE Text Annotation

This type adds the possibility to add metadata that describe annotations of parts of the text. This type is intended to be used in addition to the FISE enhancement type if an enhancement is based on a part of the content.

  • rdf:type fise:TextAnnotation: Specifies, that this description is a FISE text enhancement
  • fise:start: the character position of the start of the selection. If start is not defined it is assumed, that the selection starts at the beginning of the document
  • fise:end: the character position of the end of the selection. If end is not defined it is assumed, that the selection ends at the end of the document.
  • fise:selected-text: The text selected by the enhancement. Adding this is optional, because it makes not sense to do such a thing if major parts of the document are selected. (e.g. an enhancement for the language of a document might select the whole document. This property would therefore duplicate the text of the document)
  • fise:selection-context: The context of the selected text. Also an optional property. This adds the possibility to specify the context used to extract things like persons, organizations, locations ... form natural language documents. Note, that this could also be realized, by defining the context as an other TextEnhancement and adding an dc:requires relation.

Entity Annotation

This kind of enhancement refers an named entity that was recognized within the text. This type is intended to be used together with the FISE enhancement type.

  • rdf:type fise:EntityAnnotation: Specifies that this description is a FISE entity annotation
  • fise:entity-reference: This refers to the URI identifying the Entity
  • entity-label: The label(s) of the referred entity
  • fise:entity-type: This property can be used to specify the type of the entity (Optional)

The occurrences of the entity within the content (the exact positions within the text where this entity is referred) are determined by outgoing dc:relation links.

List of FISE enhancement types

This is the list of FISE enhancement types used as values for dc:type. Note that enhancers can define there own types.

TODO: This list is only for brainstorming. The usage of the dc:type property within FISE enhancements need to be discussed in more detail.

Language specific (Types usually used by natural language processing frameworks) TODO: See some NLP frameworks for typical types

  • sentence
  • person
  • organization
  • location
  • time

Content structure specific (Types referring to the structure of content) TODO: See Std. like NTIF, HTML5, ... for typical types

  • title
  • section-title
  • section
  • introduction
  • header
  • footer
  • ...

Entity Annotations This should define a basic typology of entity types. Note that For EntityAnnotations the fise:entity-type property is used to store all the types of the referred entity. Currently all enhancement engines that create EntityAnnotations do use Concepts defined by the dbpedia ontology.

The idea is to store the types used by the referred entity by using the fise:entity-type property and to use the dc:type property to align the type of the entity to set of types commonly used by fise enhancement engines.

Example

Content: "In May I will travel to Paderborn"

Now the example assumes that three enhancements are created by two enhancement engines. "Enhancement_1" is created by an NLP enhancer that recognizes, that based on the sentence Paderborn should be some kind of location. "Enhancement_2" and "Enhancement_3" are created by an enhancement engine that uses enhancement of the NLP Engine and try to link them to named entities as defined in dbPedia.org. This engine creates two enhancement representing two possible matches.

Enhancement_1

Enhancement_2

Enhancement_3

Proposals

This covers suggestions for extensions

Annotation of Categorizations

A Category Annotation assigns the parsed content (or pars of the parsed content) to an category of some categorization system.

Current State:

By using the current available enhancement structure one would use an Entity Annotation to link to the category. However the intention of Entity Annotations is to annotate entities recognized within the content and not to describe the assignment of the content to an categorization.

Possible Solutions:

  1. Use the dc:type of an EntityAnnotation to indicate, that this entity annotation represents an categorization of the content
  2. Define an own fise:CategoryAnnotation class that describes such kind of content enhancements