OpenNLP-NamedEntityExtractionEnhancementEngine

From IKS Project

Jump to: navigation, search

Overview

This engine "reads" the text content of Content Items with media type "text/plain" written in english. It leverages the sentence detector and name finder tools of the OpenNLP project bundled with statistical models trained to detect occurrences of names of persons, places and organisations.

Whenever an occurrence of one of those type is found it creates a fise:TextAnnotation with the dc:type set to one of the following URIs from the DBpedia ontology:

  1. http://dbpedia.org/ontology/Person
  2. http://dbpedia.org/ontology/Place
  3. http://dbpedia.org/ontology/Organisation

The extracted fise:TextAnnotation also holds the text of the name of the entity as occurring in the content item, it starting and ending positions as number of characters from the beginning of the text and the text of the three enclosing sentences

Output

Here an example for such an TextAnnotations selecting the text "Vienna" form the content "The IKS community Workshop will take place in Vienna".

urn:enhancement:text-enhancement:id1
     a       fise:TextAnnotation , fise:Enhancement ;
     dc:type
             dbpedia:Place ;
     fise:selected-text
             "Vienna"^^xsd:string ;
     fise:selection-context
             "The IKS community Workshop will take place in Vienna"^^xsd:string ;
     fise:start
             "46"^^xsd:int ;
     fise:end
             "52"^^xsd:int ;
     fise:confidence
             "0.9773640902587215"^^xsd:double ;
     fise:extracted-from
             urn:content-item:id1 .

The selected text is annotated first by specifying the start end end position of the selection within the content. In addition the selected text and the context of the extraction is noted. This information can be used to recalculate the position within the content even if character numbers are no longer valid. Confidence levels are in the range from [0 TO 1].

Those extraction can then be further used by geonames.org-LocationEnhancementEngine or EntityMentionEnhancementEngine.

In the following the sample sentence "Dr. Patrick Marshall (1869 - November 1950) was a geologist who lived in New Zealand and worked at the University of Otago." to demonstrate all types of TextAnnotations currently created by this Engine.

First a TextAnnotation for the country "New Zealand":

urn:enhancement:text-annotation:id1
     a       fise:TextAnnotation, fise:Enhancement ;
     dc:type
             dbpedia:Place ;
     dc:creator
             "eu.iksproject.fise.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine"^^xsd:string ;
     dc:created
             "2010-06-21T23:38:52.266+02:00"^^xsd:dateTime ;
     fise:extracted-from
             urn:content-item:id1 ;
     fise:start
             "74"^^xsd:int ;
     fise:end
             "85"^^xsd:int ;
     fise:selected-text
             "New Zealand"^^xsd:string ;
     fise:selection-context
             "Dr. Patrick Marshall (1869 - November 1950) was a  geologist who lived in New Zealand 
              and worked at the University of Otago."^^xsd:string ;
     fise:confidence
             "0.9791822129956813"^^xsd:double .

Second a TextAnnotation for the organization University of Otago:

urn:enhancement:text-annotation:id2
     a       fise:TextAnnotation, fise:Enhancement ;
     dc:type
             dbpedia:Organisation ;
     dc:creator
             "eu.iksproject.fise.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine"^^xsd:string ;
     dc:created
             "2010-06-21T23:38:52.247+02:00"^^xsd:dateTime ;
     fise:extracted-from
             urn:content-item:id1 ;
     fise:start
             "104"^^xsd:int ;
     fise:end
             "123"^^xsd:int ;
     fise:selected-text
             "University of Otago"^^<http://www.w3.org/2001/XMLSchema#string> ;
     fise:selection-context
             "Dr. Patrick Marshall (1869 - November 1950) was a  geologist who lived in New Zealand 
              and worked at the University of Otago."^^xsd:string ;
     fise:confidence
             "0.6517408806512693"^^xsd:double .

Finally a TextAnnotation for the person Dr. Patrick Marshall:

urn:enhancement:text-annotation:id3
     a       fise:TextAnnotation, fise:Enhancement ;
     dc:type
             dbpedia:Persom ;
     dc:creator
             "eu.iksproject.fise.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine"^^xsd:string ;
     dc:created
             "2010-06-21T23:38:52.247+02:00"^^xsd:dateTime ;
     fise:extracted-from
             urn:content-item:id1 ;
     fise:start
             "4"^^xsd:int ;
     fise:end
             "20"^^xsd:int ;
     fise:selected-text
             "Patrick Marshall"^^xsd:string ;
     fise:selection-context
             "Dr. Patrick Marshall (1869 - November 1950) was a  geologist who lived in New Zealand 
              and worked at the University of Otago."^^xsd:string ;
     fise:confidence
             "0.9964792789738544"^^xsd:double .

Configuration

  1. eu.iksproject.fise.engines.opennlp.models.path: a class loading path to a folder that holds name finder and sentence detector models following the folder structure of the OpenNLP default models: http://opennlp.sourceforge.net/models.html