IKS Blog  The Semantic CMS Community

FISE GOSS iCM Integration Validation and Online Demo

From IKS Project
Jump to: navigation, search

This page describes the ongoing validation and demonstrationof the integration of IKS FISE into the GOSS iCM CMS.

Contents

Overview

The GOSS Semantic Demo is a demonstration of how iCM can be integrated with the IKS/FISE entity extraction and enhancement framework to enhance articles with semantic data. This information is then used to generate RDFa mark-up for those pages containing the article text. As well as FISE it also makes use of the Jena/SDB triple store, the Joseki SPARQL query interface and Pubby to publish information about the entities we create.

The demo adds a new tab to the iCM article editor (tabbed view only), labelled "Semantic". To retrieve entities from FISE, the user expands the tree in the left hand pane. This causes the combined article text (including summary etc) to be passed to the FISE server, which extracts and enhances entities (people, places, and organisations) and returns these to iCM. An expanded tree is now shown containing entities divided into people, places and organisations. As well as those distinctions, the appearance of different entities reflects their properties as follows: 1) Entities which are already known to iCM are shown with a black/solid icon representing the type of the entity. They can be expanded to show any associated resources. If they have already been related to the article then their text is shown in bold. 2) Entities which are not already known to iCM are shown with a grey icon. If they are enhanced entities they can be expanded to show the external URIs they link to. A content menu allows external URIs to be previewed.

An entity can be double-clicked or drag-and-dropped onto a node in the right hand pane to associate it with the article. It is also possible to double-click a currently associated entity to break the link. Entities that are known to iCM will be added to the article immediately but when a new entity is added a dialog box is shown which allows the user to specify the category for the new entity and which external URIs should be associated with it. The ability to select specific external URIs makes it possible for the user to resolve ambiguity as to which entity is being referred to.

Newly added entities are immediately added into the iCM metadata system. An event handler adds newly created entities to the triple store, typically within 15 seconds or so. The actual association between entities and articles is not recorded until the article is submitted; at this point the association between the metadata representing entities and the article is committed to the main iCM database and after a few seconds another event handler adds information about the article (including its links to the associated entities and Dublin Core metadata) to the triple store.

Once the article is published, it is possible to view it in the context of the site. Selecting "view source" on the article page reveals that there is a hidden 〈div〉 on the page containing RDFa markup. By using a tool like the W3C RDFa extractor (http://www.w3.org/2007/08/pyRdfa/) or the Sindice inspector (http://inspector.sindice.com/) and pointing it at the page, the information extracted can be seen in a formatted form. This tool also provides links to the URIs for entities that have been generated by iCM, clicking on these will return a page of information about the entity generated by Pubby. It is also possible to query the triple store directly using SPARQL.

Demo

A YouTube video showing the integration in action can be seen here.

To access the online demo system please send an email to tom.cooke or gary.ratcliffe at gossinteractive.com. We will provide you with a user account and access details.

Validation

A copy of the validation report can be downloaded from here

The report documents the results obtained when using IKS FISE to identify entities and external resources relating to those entities based on real world data from various UK based organisations:

  • Local government in England, Scotland and Wales.
  • Non-profit central government organisations.
  • Utility company.
  • Police Force

It also compares the performance of different entity extraction engines.

Lessons learned

  • Modular approach is good.
  • Build process could be improved to provide more 'cut-down' distribution. For example without the UI.
  • Straightforward to interface with service via REST interface.
  • Steep learning curve for implementing new engines. Maybe a tutorial or similar could be provided.
  • Dependency on, potentially unreliable, external resources can be a problem. Entity hub is starting to address this.
  • Real challenge is the availability of good quality domain specific external resources.

Software components used

The initial development work started with the FISE system. The validation reports was written based on a build from the Apache Stanbol (incubation) project.

A number of additional extraction and enhancement engines, written by GOSS, were used during the evaluation:

  • OpenCalais Named Entity Extraction
  • OpenNLP-1.5 Named Entity Extraction
  • JNet-1.5 Named Entity Extraction
  • DBPedia Lookup Enhancement
  • Freebase Enhancement
  • OpenCyc Enhancement

In addition a triple store and SPARQL Endpoint out side of FISE/Stanbol was used.

Industrial Validation Metrics

Editing in progress.

Validation Questions
Yes / Strongly
Agree
Agree
Disagree
No / Strongly
Disagree
Don't know / n.a.
Comments
Do I understand what IKS FISE is?
Yes





Does IKS FISE add value to my product?

Yes




Is that added value demonstrable/sellable to my customers?

Yes



A compelling message about the benefits of semantic markup is still needed.
Can I run IKS FISE alongside with or inside my product?
Yes




OSGi adds a steep learning curve if you wish to extend it.
Is the impact of IKS FISE on runtime infrastructure requirements acceptable?

Yes



UI components are not required in a production environment where the users interaction with FISE/Stanbol is always via the CMS.
How good is the IKS FISE API when it comes to integrating with my product?

Yes



We didn't encounter any problems.
Is IKS FISE robust and functional enough to be used in production at the enterprise level?




Yes
Very domain dependent. The current extraction and enhancements would not give a high enough relevant hit entity rate to satisfy many of our clients.
Is the IKS FISE test suite good enough as a functionality and non-regression "quality gate"?




Yes
We've not gone through enough FISE/Stanbol releases to judge yet. The FISE to Stanbol move introduced quite a bit of change during our integration work.
Is the IKS FISE licence (both copyright and patents) acceptable to me?

Yes




Can I participate in IKS FISE's development and influence it in a fair and balanced way?

Yes



We've seen nothing to suggest otherwise.
Do I know who I should talk to for support and future development of IKS FISE?

Yes



Also dependent on the successful from incubation to full Apache project.
Am I confident that IKS FISE still going to be available and maintained once the IKS funding period is over?




Yes
It all depends on getting (CMS) developers to support the Apache project. That includes us. Hopefully yes.
Does IKS help in retrieving relevant information fast and efficiently for decision making / to solve problems?






Does IKS help in creation of business relevant information that can be shared fast and efficiently / making implicit knowledge explicit to increase competitive advantage?






Does IKS help managing the business processes to increase the flexibility for changing customer needs or processes?






Does IKS help to increase contacts with potential customers to aquire new customers or to increase customers loyality?






Does IKS help in selling complex products, which require individual, fast and efficient configuration?






Does IKS help in communication of events to attract new customers and to inform/take care of existing ones?






Does IKS help establishing personalized customer relationship management?


Personal tools
Early Adopters