Geonames.org-LocationEnhancementEngine

From IKS Project

Jump to: navigation, search

This engine creates fise:EntityAnnotations based on the http://geonames.org dataset. It does not directly work on the parsed content, but processes named entities extracted by some NLP (natural language processing) engine. This engine creates EnityAnnotations for Features found for named entities in the geonames.org data set. In addition it adds EntityAnnotations for the continent, country and administrative regions for entities with an high confidence level.

Contents

Processed Annotations (Input)

This engine consumes fise:TextAnnotations of type dbpedia:Place. More concrete it filters for enhancements that confirm to the following two requirements and consumes the text selected by the TextAnnotations:

 ?textAnnotation rdf:type fise:TextAnnotation .
 ?textAnnotation dc:type dbpedia:Place
 ?textAnnotation fise:selected-text ?text


Here an example for such an TextAnnotations selecting the text "Vienna" form the content "The IKS community Workshop will take place in Vienna".

urn:enhancement:text-enhancement:id1
     a       fise:TextAnnotation , fise:Enhancement ;
     dc:type
             dbpedia:Place ;
     fise:selected-text
             "Vienna"^^xsd:string ;
     fise:selection-context
             "The IKS community Workshop will take place in Vienna"^^xsd:string ;
     fise:start
             "46"^^xsd:int ;
     fise:end
             "52"^^xsd:int ;
     fise:confidence
             "0.9773640902587215"^^xsd:double ;
     fise:extracted-from
             urn:content-item:id1 .

Typically such enhancements are created by engines that provide named entity extraction such as the openNLP-NamedEntityExtractionEnhancementEngine.

Created Enhancements (Output)

The LocationEnhancementEngine creates two types of EntityAnnotations. First it suggests Entities for processed TextAnnotations and second it creates EntityAnnotations for the hierarchy of regions the suggested Entities are located in. Suggested Entities are connected with the "dc:relation" attribute to the TextAnnotation they enhance. EntityAnnotations representing the hierarchy define a dc:requires attribute to the EntityAnnotation.

Entity Suggestions

Entity suggestions are EntityEnhancements that suggest Features of the geonames.org dataset for an processed TextAnnotation. This suggestions are currently only calculated based on the fise:selected-text of the TextAnnotation. The following example shows three EntityAnnotations for the TextAnnotation used in the above example. See the fise:relation statements at the end of each of the two EntityAnnotations.

TODO: link to the used WebService

The first Entity found in the geonames.orf dataset is the capital city in Austria with an confidence level of 1.0:

urn:enhancement:entity-enhancement:id1
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "1.0"^^xsd:double ;
     fise:entity-label
             "Vienna"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/2761369/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPLC ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:relation
             urn:enhancement:text-enhancement:id1 .

With lower confidence levels there are a lot of other populated places with the name "Vienna" found in the geonames.org dataset.

urn:enhancement:entity-enhancement:id2
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Vienna"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/4496671/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:relation
             urn:enhancement:text-enhancement:id1 .
urn:enhancement:entity-enhancement:id3
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Vienna"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/4825976/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place , dbpedia:Settlement , dbpedia:PopulatedPlace , geonames:P.PPL ;
     fise:extracted-from
             urn:content-item:id1 ;
    fdc:relation
             urn:enhancement:text-enhancement:id1 .

Entity Hierarchy Enhancements

Entity Hierarchy Enhancements describe the regions that contain suggested Features based on the geonames.org dataset. Enhancements describing this hierarchy are added for all suggested entities with a confidence level above the value of "eu.iksproject.fise.engines.geonames.locationEnhancementEngine.min-hierarchy-score". The default value for this property is 0.8. The hierarchy web service provided by geonames.org is used to calculate the regions:

The following example shows the entity hierarchy enhancements for the suggested entity for Vienna (Autria). Please note the dc:requires relation to this EntityAnnotation at the end of each of the following enhancement.

First the enhancement for the continent Europe:

urn:enhancement:entity-hierarchy-enhancement:id1
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Europe"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/6255148/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place, geonames:L.CONT ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:requires
             urn:enhancement:entity-enhancement:id1 .

Next the enhancement for the country "Austria", classified as an independent political entry within geonames.org

urn:enhancement:entity-hierarchy-enhancement:id2
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Austria"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/2782113/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.PCLI ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:requires
             urn:enhancement:entity-enhancement:id1 .

Now three enhancement describing the different hierarchies of administrative regions within Austria. First the "Bundesland", next the "Stadtteil" and last the "Gemeindebezirk".

urn:enhancement:entity-hierarchy-enhancement:id3
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Vienna"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/2761367/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM1 ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:requires
             urn:enhancement:entity-enhancement:id1 .
urn:enhancement:entity-hierarchy-enhancement:id4
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Politischer Bezirk Wien (Stadt)"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/2761333/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM2 ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:requires
             urn:enhancement:entity-enhancement:id1 .
urn:enhancement:entity-hierarchy-enhancement:id5
     a      fise:EntityAnnotation , fise:Enhancement ;
     fise:confidence
             "0.42163702845573425"^^xsd:double ;
     fise:entity-label
             "Gemeindebezirk Innere Stadt"^^xsd:string ;
     fise:entity-reference
             http://sws.geonames.org/2775259/ ;
     fise:entity-type
             geonames:Feature , dbpedia:Place, dbpedia: AdministrativeRegion, geonames:A.ADM3 ;
     fise:extracted-from
             urn:content-item:id1 ;
    dc:requires
             urn:enhancement:entity-enhancement:id1 .

The last two hierarchy levels are no longer valid for the meaning of "Vienna" as selected by the TextAnnotation, but added, because the geonames.org dataset locations the Feature of cities exactly in the center. However if the TextAnnotation would describe a precise address such hierarchy levels would completely make sense.

Configuration

The LocationEnhancementEngine provides currently three configurations

  1. min-score (default = 0.5): The minimum score (confidence) that is required for entity suggestions
  2. max-location-enhancements" (default = 5): The maximum numbers of entity suggestions added (regardless if there would be more results with a score > min-score.
  3. min-hierarchy-score (default = 0.8): The minimum score (confidence) that is required that hierarchy enhancements are added for an suggested entity. To add hierarchy enhancements for all suggested entities min-hierarchy-score needs to be set to a value smaller equals than min-score.