ZemantaEnhancementEngine
From IKS Project
This Engine is work in Progress. This page is currently used to discuss how enhancements of http://www.zemanta.com/ can be mapped to the EnhancementStructure of FISE.
Contents |
Zemanta Enhancements
Recognition
First Zemanta defines an URI for the Recognition:
<http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634#Recognition6> a <http://s.zemanta.com/ns#Recognition> ; <http://s.zemanta.com/ns#anchor> "protein" ; <http://s.zemanta.com/ns#confidence> "0.4727" ; <http://s.zemanta.com/ns#doc> <http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634> ; <http://s.zemanta.com/ns#object> <http://s.zemanta.com/obj/8a740c0db3a6d9ecfbf24c761d45d1a7> .
The object relation links to the identified entity
<http://s.zemanta.com/obj/8a740c0db3a6d9ecfbf24c761d45d1a7> a <http://s.zemanta.com/ns#Object> ; <http://s.zemanta.com/ns#target> <http://rdf.freebase.com/ns/en/protein> , <http://en.wikipedia.org/wiki/Protein> , <http://dbpedia.org/resource/Protein> ; <http://www.w3.org/2002/07/owl#sameAs> <http://rdf.freebase.com/ns/en/protein> , <http://dbpedia.org/resource/Protein> .
Note that Zemanta also provide information about the entities referred by the target attribute
<http://en.wikipedia.org/wiki/Protein> a <http://s.zemanta.com/ns#Target> ; <http://s.zemanta.com/ns#targetType> <http://s.zemanta.com/targets#wikipedia> ; <http://s.zemanta.com/ns#title> "Protein" .
and
<http://rdf.freebase.com/ns/en/protein> a <http://s.zemanta.com/ns#Target> ; <http://s.zemanta.com/ns#targetType> <http://s.zemanta.com/targets#rdf> ; <http://s.zemanta.com/ns#title> "Protein" .
In contrast to FISE entity enhancements that also includes the type of the referred entity Zemanta provides only the title.
Category
Zemanta uses http://www.dmoz.org/ to provide categories for the submitted text
<http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634#Category7> a <http://s.zemanta.com/ns#Category> ; <http://s.zemanta.com/ns#confidence> "0.0711" ; <http://s.zemanta.com/ns#doc> <http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634> ; <http://s.zemanta.com/ns#target> <http://d.zemanta.com/cats/dmoz/Top/Business/Materials/Metals/Zinc> .
The target relation links to a node defining some more information about the Category
<http://d.zemanta.com/cats/dmoz/Top/Business/Materials/Metals/Zinc> a <http://s.zemanta.com/ns#Target> ; <http://s.zemanta.com/ns#categorization> <http://s.zemanta.com/cat/dmoz> ; <http://s.zemanta.com/ns#targetType> <http://s.zemanta.com/targets#category> ; <http://s.zemanta.com/ns#title> "Top/Business/Materials/Metals/Zinc" .
Note that http://s.zemanta.com/cat/dmoz is an individual defined in the Zemanta ontology
Related
Zemanta also provides enhancements for related documents.
<http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634#Related6> a <http://s.zemanta.com/ns#Related> ; <http://s.zemanta.com/ns#confidence> "0.0002" ; <http://s.zemanta.com/ns#doc> <http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634> ; <http://s.zemanta.com/ns#target> <http://www.neatorama.com/2010/05/20/scientist-claims-to-have-created-first-synthetic-life-form/> .
As for category enhancements it provides additional information about the target
<http://www.neatorama.com/2010/05/20/scientist-claims-to-have-created-first-synthetic-life-form/> a <http://s.zemanta.com/ns#Target> ; <http://s.zemanta.com/ns#published_datetime> "2010-05-20T22:42:04Z" ; <http://s.zemanta.com/ns#targetType> <http://s.zemanta.com/targets#article> ; <http://s.zemanta.com/ns#title> "Scientist Claims to Have Created First Synthetic Life Form" ; <http://s.zemanta.com/ns#zemified> "0" .
Keyword
This type of Enhancements do provide Keywords that can be used e.g. to tag the text submitted to Zemanta
<http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634#Keyword5> a <http://s.zemanta.com/ns#Keyword> ; <http://s.zemanta.com/ns#confidence> "0.0656" ; <http://s.zemanta.com/ns#doc> <http://d.zemanta.com/rid/774cfe98-c030-4f7f-8cfd-635953062634> ; <http://s.zemanta.com/ns#name> "Biology" ; <http://s.zemanta.com/ns#scheme> "general" .
Zemanta FISE Mappings
Mapping of Recognitions
In FISE a recognition is represented typically by
- a TextAnnotation that defines a part of the text
- an EntityAnnotation that links to the entity and provides some information about the entity including the name and the type.
Zemanta uses a similar design. It represents recognitions by
- a Recognition that defines the text anchor (no start/end positions; no context of the anchor; just the selected text; If the text would contain the same word several times one could not distinguish different meanings of the same word)
- an Object that plays the role of an mediator between the Recognition and one or more identified Entities. I looks like, that an Object links only to a single entity. However Zemanta can include links to several Individuals about that Entity if they are marked with owl:sameAs as shown by the above Example.
- a Target representing meta data about the linked element. This metadata include the title as well as the target type.
Mappings:
- Recognitions can be mapped to fise:TextAnnotations. For that the start/end position(s) of the anchor within the text need to be calculated. The context should not be defined, because Zemanta does not use one. Note: For each appearance of the anchor within the text, an own Text Annotation need to be created.
- Objects can be mapped to fise:EntityAnnotations. For each fise:TextAnnotation created for the Recognition an dc:related reference need to be added to the fise:EntityAnnotation. The target relations defined by Object need to be added as fise:referred-entity relations to the EntityAnnotation. The title attribute of the zemanta Object need to be added as fise:entity-label to the EntityAnnotation. Zemanta dose not provide information about the type of the referred Entity. This means, that such Information would need to be retrieved directly by following the Linked Data URIs. Note also that Zemanta may refer more than one entities. However if an owl:sameAs relation is defined between the entities, than all the referred entities can also be added as fise:referred-entity to the fise:EntityAnnotation.
Example:
This Example is generated by the current implementation of the ZemantaEnhancementEngine and shows the FISE enhancements for the recognition of "protein" as used in the above example for an Zemanta recognition.
First the fise:TextAnnotation. Note that the start end end positions are calculated based on an search for the anchor. If the anchor would appear several times in the content, than one fise:TextAnnotation is generated for each appearance. If some other Engine would have already created an TextAnnotation for an appearance, than the existing annotation would be used instead of creating an new one.
<urn:enhancement-42b5bdc9-b5d5-a88f-8dd5-ed0d73a0bd80>
a <http://fise.iks-project.eu/ontology/TextAnnotation> , <http://fise.iks-project.eu/ontology/Enhancement> ;
<http://fise.iks-project.eu/ontology/confidence>
"0.466"^^<http://www.w3.org/2001/XMLSchema#double> ;
<http://fise.iks-project.eu/ontology/end>
"290"^^<http://www.w3.org/2001/XMLSchema#int> ;
<http://fise.iks-project.eu/ontology/extracted-from>
<urn:eu.iksproject.fise:test:engines.zemanta:content-item-cedd19e5-f719-ce04-5510-f1cda95e4aa2> ;
<http://fise.iks-project.eu/ontology/selected-text>
"protein"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://fise.iks-project.eu/ontology/start>
"283"^^<http://www.w3.org/2001/XMLSchema#int> ;
<http://purl.org/dc/terms/created>
"2010-07-06T08:18:10.950+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
<http://purl.org/dc/terms/creator>
"eu.iksproject.fise.engines.zemanta.impl.ZemantaEnhancementEngine"^^<http://www.w3.org/2001/XMLSchema#string> .
Second the fise:EntityAnnotation that represents the entities recognized by Zemanta. Please note that multiple values of the fise:entity-reference property are only added if Zemanta provides an owl:sameAs mapping between those entities. Entities are only added if the zemanta:targetType is equals to http://s.zemanta.com/targets#rdf.
<urn:enhancement-d7ad76ce-4410-56ee-6a5d-ea5d9278a6c4>
a <http://fise.iks-project.eu/ontology/EntityAnnotation> , <http://fise.iks-project.eu/ontology/Enhancement> ;
<http://fise.iks-project.eu/ontology/confidence>
"0.466"^^<http://www.w3.org/2001/XMLSchema#double> ;
<http://fise.iks-project.eu/ontology/entity-label>
"Protein"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://fise.iks-project.eu/ontology/entity-reference>
<http://rdf.freebase.com/ns/guid/9202a8c04000641f800000000002ef58> , <http://dbpedia.org/resource/Protein> ;
<http://fise.iks-project.eu/ontology/extracted-from>
<urn:eu.iksproject.fise:test:engines.zemanta:content-item-cedd19e5-f719-ce04-5510-f1cda95e4aa2> ;
<http://purl.org/dc/terms/created>
"2010-07-06T08:18:10.952+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
<http://purl.org/dc/terms/creator>
"eu.iksproject.fise.engines.zemanta.impl.ZemantaEnhancementEngine"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://purl.org/dc/terms/relation>
<urn:enhancement-42b5bdc9-b5d5-a88f-8dd5-ed0d73a0bd80> .
Mapping of Categories
Zemanta category enhancement do not define a specific occurrence in the submitted text. The annotated Entity would represent the category within dmoz. However dmoz categories are currently not available as linked data, so Zemanta uses an workaround for describing dmoz categories as non-dereferenceable URLs (his mail discusses this issue: http://www.mail-archive.com/public-lod@w3.org/msg01358.html).
A pragmatic solution would be to remove the leading "Top/" and adding the dmoz base URL "http://www.dmoz.org/" to the literal provided by the zemanta:title attribute of the zemanta:Target description if the zemanta:targetType is equals to http://s.zemanta.com/targets#category and the categorization is equals to http://s.zemanta.com/cat/dmoz. This would result in the URL of the category on the demoz web page.
In general a zemanta category annotation should be mapped to an fise:EntityAnnotation. As an alternative we could create an own annotation type within the Fise enhancement structure. An according suggestion is already added to the EnhancementStructure#Annotation_of_Categorizations.
Example:
As an example the following listing contains the EntityEnhancement representing the Zemanta Category annotation as used in the above example. Currently the dc:type fise:Category is added to the annotation to indicate, that this entity annotation represents an categorization. The fise:entity-reference property contains the URL of the according page on the DMOZ webpage.
<urn:enhancement-71c8f6bc-e341-1946-6c30-ba4cddebdd44>
a <http://fise.iks-project.eu/ontology/EntityAnnotation> , <http://fise.iks-project.eu/ontology/Enhancement> ;
<http://fise.iks-project.eu/ontology/confidence>
"0.0711"^^<http://www.w3.org/2001/XMLSchema#double> ;
<http://fise.iks-project.eu/ontology/entity-label>
"Top/Business/Materials/Metals/Zinc"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://fise.iks-project.eu/ontology/entity-reference>
<http://www.dmoz.org/Business/Materials/Metals/Zinc> ;
<http://fise.iks-project.eu/ontology/entity-type>
<http://s.zemanta.com/ns#Category> ;
<http://fise.iks-project.eu/ontology/extracted-from>
<urn:eu.iksproject.fise:test:engines.zemanta:content-item-32b2b272-1511-125b-99c3-976a3205b8ee> ;
<http://purl.org/dc/terms/created>
"2010-07-20T12:34:10.429+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
<http://purl.org/dc/terms/creator>
"eu.iksproject.fise.engines.zemanta.impl.ZemantaEnhancementEngine"^^<http://www.w3.org/2001/XMLSchema#string> ;
<http://purl.org/dc/terms/type>
<http://fise.iks-project.eu/ontology/Category> .
Mapping of Keywords
A mapping of Keywords is currently not planed.

