IKS Blog  The Semantic CMS Community

Zaizi Alfresco FISE Integration Validation and Online Demo

From IKS Project
Jump to: navigation, search

Contents

Overview

This page describes the integration of IKS FISE with Alfresco ECM by Zaizi.

All documentation including links to online demo are available at http://fise.zaizi.com/.

The code is available for download at http://code.google.com/p/alfresco-fise/.

Introduction

Our first integration between IKS FISE (later Apache Stanbol) is focused on the use of the enhancement engines and shows how the content within Alfresco can be semantically enriched and how this semantic information can be used to improve document organization, classification and search.

Features

  • All content uploaded into Alfresco via Web client, CIFS, IMAP, FTP, WebDAV, etc are all converted to text and posted to standalone FISE server.
  • When viewing accessing content in Alfresco, the extracted entities from FISE are shown.
  • The entities can be selected to list all other content classified with that entity.
  • A graph of related entities and documents can be displayed
  • It is able to handle the following document formats: MS Office, Open Office, Adobe PDF, HTML, Plain text, etc.
  • All request to FISE (Stanbol) Server are performed using a custom REST Java Client easy to integrate within Alfresco

Architecture

  • The FISE integration code waits for new content creation in Alfresco.
  • When content is created it starts a background thread to convert the document to text and post to FISE.
  • Content creation / upload needs to be fast. Therefore we use a background thread.
  • This asynchronous approach means we can not assume we can update the content once FISE returns the extracted entities as the content can be locked, checked out or versioned.
  • So content is stored in FISE and extracted entities are not stored in Alfresco.
  • Every time content is accessed we call FISE to get the entities.
  • We run SPARQL query on FISE to get related content IDs for an entity. These are filtered in Alfresco to only content the user has read access to.
  • Only content text is sent to FISE. No metadata extracted from documents are sent.
  • Administrators can select which content they want to be enhanced by FISE by applying aspect via Alfresco content rules.

Issues:

  • FISE does not support concurrent creation of content yet. Alfresco thread sleeps for few seconds between content post.
  • FISE stores content in memory after restart content is lost. Problem if entities are not stored in Alfresco.

Roadmap / Ideas

  • Synchronous integration and creation of entities in Alfresco.
  • CMIS tracker in FISE to track and get new content from CMIS compliant repositories.
  • FISE support for metadata.


Demo

All documentation including links to online demo are available at http://fise.zaizi.com/.

The code is available for download at http://code.google.com/p/alfresco-fise/.

Validation

We managed to integrate Stanbol Enhancement Engine with Alfresco in order to provide Document Semantic Enrichment capabilities in this ECM. Following IKS stack specification, we consumed Stanbol's enhancers REST API. Because Alfresco is completely based in Java, we decided to develop a Java Client for the Stanbol Enhancement services. This client is able to consume any enhancer o enhancement chain services configured in the target Stanbol instance, doing REST request and serializing the responses in ready and easy to use Java objects. The Stanbol client has been recently extended to support also EntityHub and ContentHub services. It's open source and has been hosted in github: https://github.com/zaizi/apache-stanbol-client. As many others Early Adopters in the first stages, we only integrated the Content Enhancement component. Our purpose was to develop a Proof of Concept to show our potential customers how is possible to extract entities and structured content from unstructured text and also the benefits of the Semantic Technologies in the Enterprise Content Management. Here we found the typical compromise between accuracy and performance. Using the current OpenNLP models, with large documents, we got too many irrelevant and incorrect entities. This issue could be improved by applying higher thresholds to the engines. Also, in order to release a production ready integration, we should go to a "suggestions model" in which the extracted entities could be filtered and refined by the user using a custom User Interface within Alfresco Share. VIE is a good candidate for such UI, but as far as we know, entity extraction refinement is not compatible with ContentHub (because of the enhancement process is automatic when a document is stored), which is necessary in order to perform semantic search.

Lessons learned

With Apache Stanbol we could create richer interfaces and features that improves the user experience. Our ambition is to provide an ECM for the user that “Do less. Get more”. If the user experience is good, he or she will increase its productivity.

Zaizi sees potential in Apache Stanbol, even tough we consider it is not production ready. At least, not in big systems. It works very well in closed domains and it is not very accurate extracting entities in some texts. If you want to build a system to do things for you, it must be reliable.

The products has improved in the last two years, and lately the community is very active. We expect to see a really good and robust product very soon, so we are already presenting it to our customers and the feedback is very positive.

Software components used

  • Alfresco ECM
  • IKS FISE and later Apache Stanbol
  • Apache Solr

Industrial Validation Metrics

Validation Questions
Yes / Strongly
Agree
Agree
Disagree
No / Strongly
Disagree
Don't know / n.a.
Comments
Do I understand what IKS FISE is?
X




Does IKS FISE add value to my product?

X



Is that added value demonstrable/sellable to my customers?

X



Can I run IKS FISE alongside with or inside my product?
X




Is the impact of IKS FISE on runtime infrastructure requirements acceptable?

X



How good is the IKS FISE API when it comes to integrating with my product?

X



Is IKS FISE robust and functional enough to be used in production at the enterprise level?


X


Is the IKS FISE test suite good enough as a functionality and non-regression "quality gate"?

X



Is the IKS FISE licence (both copyright and patents) acceptable to me?
X




Can I participate in IKS FISE's development and influence it in a fair and balanced way?
X




Do I know who I should talk to for support and future development of IKS FISE?
X




Am I confident that IKS FISE still going to be available and maintained once the IKS funding period is over?

X



Does IKS help in retrieving relevant information fast and efficiently for decision making / to solve problems?

X



Does IKS help in creation of business relevant information that can be shared fast and efficiently / making implicit knowledge explicit to increase competitive advantage?

X



Does IKS help managing the business processes to increase the flexibility for changing customer needs or processes?
X




Does IKS help to increase contacts with potential customers to aquire new customers or to increase customers loyality?

X



Does IKS help in selling complex products, which require individual, fast and efficient configuration?

X



Does IKS help in communication of events to attract new customers and to inform/take care of existing ones?

X



Does IKS help establishing personalized customer relationship management? X


Personal tools
Early Adopters