Validation report 2 - Industrial Use Cases
Be aware: This is not the latest version of D6.2, the current version is available from within the IKS intranet for review purposes. As soon as the final version is available, it will be made available here again.
On this page, the results of IKS Task 6.2 "Validation of IKS through industrial use case application developers" are reported. The overall objective of this task is described in the Description of Work (p. 76) as follows: "To produce a robust and scalable architecture, the reference implementation of the Interactive Knowledge Stack will be tested through large scale real life field studies. In order to guarantee that an industry-strength Semantic CMS Stack is created, the IKS components will be tested and validated through industrial use case applications. The use cases will be selected among the ones which can be operated by knowledge workers with clear organisational needs and goals but without sophisticated knowledge of advanced formalisms or technologies."
As a result of the current developments within IKS, the project has brought up several reusable components with clearly defined objectives. Details can be found in the following four public IKS deliverables (cf. http://wiki.iks-project.eu/index.php/Deliverables):
- D5.1 Presentation & Interaction Environment - Design and Implementation
- D5.2 Business Logic Modelling Environment - Design and Implementation
- D5.3 Semantic CMS wrappers - Design and Implementation
- D5.4 Semantic CMS reasoning and data persistence components
Next, the methodology of the validation procedure is described before an examplary integration and validation story is presented. Then, the aggregated validation results of five industrial CMS companies are provided. Afterwards, the results are discussed and finally, an outlook is provided with regard to the next steps of IKS component development.
Method and Validation Instrument
In order to validate the IKS stack, industry partners have been encouraged to send their feedback on their integration approaches and experience with selected IKS components, i.e. components that were relevant to them at the time of validation. The following components were target of evaluation:
- Stanbol Enhancer and Enhancement Engines (doc: http://incubator.apache.org/stanbol/docs/trunk/enhancer.html and http://incubator.apache.org/stanbol/docs/trunk/engines.html)
- Stanbol EntityHub (doc: http://incubator.apache.org/stanbol/docs/trunk/entityhub.html)
- Stanbol Ontology Manager (maybe: http://wiki.iks-project.eu/index.php/KReS or http://localhost:8080/ontonet)
- Stanbol Rules (on local stanbol instance: http://localhost:8080/rules)
- Stanbol Reengineer
- Stanbol Reasoning
- Stanbol Factstore (on local instance: http://localhost:8080/factstore)
- Stanbol Contenthub (doc: http://incubator.apache.org/stanbol/docs/trunk/contenthub.html or on local instance: http://localhost:8080/contenthub)
- Stanbol CMS Adapter (doc: http://incubator.apache.org/stanbol/docs/trunk/cmsadapter.html)
- VIE Vienna IKS Editables (doc: https://github.com/IKS/VIE)
- annotate.js - Annotation frontend widget (doc: https://github.com/IKS/annotate.js)
Any form of validation feedback was allowed with regard to those components. However, the following questions were given as a help to write the feedback and to structure the validation results:
1. Do I understand the goal of this component?
2a. Does the website describe this component and its benefits in a way I understand properly?
2b. Is the documentation on how to use this component easy to understand?
3a. Does this component add value to my product?
3b. If this component does not add value, why not?
3c. Is the added value demonstrable / sellable to my customers?
3d. If this added value is demonstratable / sellable, please provide a simple use case.
4. Is the impact of this component on runtime infrastructure requirements acceptable?
5. Can this component be easily integrated with my product?
6. Is this component robust and functional enough to be used in production at the enterprise level?
7. Is the component's test suite good enough as a functionality and non-regression âquality gateâ?
8. Can I participate in the development of this component and influence it in a fair and balanced way?
9. Do I know who I should talk to for support and future development of this component?
10. Am I confident that this component is still going to be available and maintained once the IKS funding period (Dec. 2012) is over?
In addition to those component-related questionnaire items, the following two questions were added in order to get feedback regarding the perceived acceptance of licenses used by the validated components, i.e. about details of copyright and patents:
- Is the license of the Apache Stanbol components (both copyright and patents) acceptable to me? (cf. http://www.apache.org/licenses/LICENSE-2.0)
- Is the license of the VIE components (both copyright and patents) acceptable to me? (cf. https://github.com/IKS/VIE/blob/master/LICENSE)
Those questions were made available on the IKS website as a reference for the industrial partners (cf. the Adoption Questionnaire: http://wiki.iks-project.eu/index.php/Adoption-Questionare). The validation period started in November 2011 and lasted until the mid of February 2012. During this period, the following industrial CMS companies have participated:
- Adobe, USA / Switzerland
- Alkacon, Germany
- Nuxeo, France
- Pisano / CIC, Germany
- Polymedia, Italy
- InsideOut10, Italy
The individual feedback with regard to selected IKS components is available in the Appendix of this report. With the exception of InsideOut10, where a phone interview was conducted by the IKS team, the validation feedback was received electronically. The interview with InsideOut10 is made available in the next section as an exemplary integration story of IKS components. The remaining feedback of the validation task was aggregated by component and discussed in detail (cf. Section Validation Feedback).
An Integration Story
The following validation interview with Andrea Volpini from InsideOut10 (cf. http://blog.iks-project.eu/tag/insideout10/) describes the integration of the Stanbol Enhancer with default engines and the Refactor Component of Stanbol Rules on the WordLift project, a WordPress plugin (cf. http://www.slideshare.net/cyberandy/wordlift-20-pitch-at-jboye11-in-aarhus). The interview was conducted by Reto Bachman-GmÃ¼r in February 2012 and exemplifies the application of the validation instrument above.
- Reto: Do you understand what the Stanbol Enhancer is?
- Volpini: Yes
- Reto: Does the Stanbol Enhancer add value to your product?
- Andrea: Yes, definitively. However we couldn't really compare the results with commercial products like AlchemyAPI or OpenCalais, most of the results from the Stanbol enhancer are based on dbpedia while the commercial services rely on other dataset other than dbpedia. We tried using using OpenCalais and Zemanta within the Stanbol Enhancer: the processing takes significantly longer, and we saw no competitive advantage in using them in the enhancement chain rather then using them directly. The performance using the Enhancer without OpenClaias and Zemanta is quite good. Requests are often handled in less than a second. We are wondering if the Enhancement Engines could be improved by getting terms other than from dbpedia (Freebase for instance). It would be valuable to get related concepts using an open index of the web or an existing corpus of documents as source. For example, if I enter David's name (David Riccitelli) AlchemyAPI will also find Rupert as we appear together on some Websites, since there are no dbpedia pages about us the Stanbol Enhancer will not suggests this association (and this deeply impact the overall service). An option might be to integrate CommonCrawl (or Freebase). This could be an additional source alongside dbpedia and make the result of the enhancer more comparable with ones of commercial services.
- Reto: Is that added value demonstrable/sellable to your customers?
- Andrea: Yes, to a certain extent. We sold our configuration of WordLift (WordPress plugin adding schema.org to post and pages) to the Italian energy provider Enel. We must say that Enel has been an extremely supportive customer with a strong interest in fostering an open source initiative and they have been helping us in the whole project.
- Reto: Can you run the Stanbol Enhancer alongside with or inside my product?
- Andrea: The Enhancer runs alongside our product which interacts with it using the REST API. However we had to add other OSGi components to extend the REST API and to fit our needs.
- Reto: Is the impact of Stanbol Enhancer on runtime infrastructure requirements acceptable?
- Andrea: We had to get a quite large machine. A bit higher than our usual standards but still reasonable; our production instance of Stanbol runs on RackSpace.
- Reto: How good is the Stanbol Enhancer API when it comes to integrating with your product?
- Andrea: There are two levels: The stanbol OSGi infrastructure is very flexible, you can tailor almost everything and use the very latest features. It is easy to create components inside the OSGi infrastructure. The rest api is harder, you cannot choose another chain, you have to submit text and no URL is supported, so we had to add our components. The REST API is easy to use but less powerful. Practical example: the kind of responses from REST are not optimized for a production environment. We run tests with news article pages with a wide number of comments. Through the REST API you would get roughly 1MB of results. It will return more than one annotation for the same entity, it just isn't optimized to be fast and performant. By adding an OSGi component we could tailer the response to suit our needs, we just select the triples that matters to us - a 1MB response for a news page (including comments) is not acceptable.
- Reto: Is the Stanbol Enhancer robust and functional enough to be used in production at the enterprise level?
- Andrea: We had to work on it especially from a security point of view. We are running a Stanbol instance on the cloud which interacts with the infrastructure of our customer, for this we had to secure our Stanbol instance. We experienced stability issues related to dbpedia: running the same query multiple times yield different results. We are now using the Entityhub with a local index which solved the problem, but it was a nasty problem as it occurred in production and with the client's editorial team working on it and getting different results or no results at all.
- Reto: Is the Stanbol Enhancer test suite good enough as a functionality and non-regression "quality gate"? In other words: Are you confident that as long as the bar is green the functionality is still there?
- Andrea: The tests are not always up to date so we have to do testing when deploying updates. Recently we found that enhancer wasn't working and we were surprised to be the first to find this out. So we rely on our final integration test as the the test that comes with Stanbol are incomplete. Passing the Stanbol tests is a preliminary achievement furter tests are needed to guarantee that things work as we expect them to. When we find an issue we mail it to the list. If it takes longer than half a day we roll back, that's what we have our version management of binaries for.
- Reto: Is the Apache Stanbol licence (both copyright and patents) acceptable to you?
- Andrea: Most flexible license ever. Its nice to work in an open source project were people are contributiong without boundaries of bureaucracy and license, everybody is working towards a common objective. Our Wordlift 2.0 plugin will be MIT licensed.
- Reto: Can you participate in Enhancers's development and influence it in a fair and balanced way?
- Andrea: Yes absolutely.
- Reto: Do you know who you should talk to for support and future development of Stanbol Enhancer?
- Andrea: We always talk to Rupert and he talks to himself ;) Kidding aside Rupert's support is literally amazing and we are really thankful for his endless effort in supporting our project.
- Reto: Are you confident that the Enhancer is still going to be available and maintained once the IKS funding period is over?
- Andrea: This depends on the involvement within the Apache Foundation. The process of getting out of the borders is still incomplete to make us feel confident. None of the industrial partner has yet taken the lead of building a sustainable community. It is yet to be seen how the project develops within Apache. We will certainly be involved in the future development but we are afraid that the IKS core team will not be fully replaced by others.
- Reto: What about the Refactor Engine?
- Andrea: Same as above apart from the use case. Here, WordLift calls the enhancer, the triples are then refactored to the schema.org ontology using the Refactor Engine.
- Reto: What about stability and performance?
- Andrea: We had a bit more stability issues and the CNR team is currently working to enhance the performance of the refactor ensuring it will not degrade the overall performances; we recently opened a ticket and we're directly working with the CNR team to fix the issues.
- Reto: ... and support?
- Andrea: We also have direct interaction with CNR.
- Reto: Any final thoughts?
- Andrea: We would really like the mentioned functionality of guessing relations from unstructured knowledge on the web. CommonCrawl might be part of a solution for this; Freebase also could bring a great advantage. The performance of the Enhancer is already valid but results are too narrow. Using the web (or LOD) would be a great improvement compared with using just the dbpedia index.
Most use cases that were mentioned in the feedback use the Stanbol Enancer and Enhancement Engines. In one case VIE is used without Stanbol components.
- Suggesting links and images
- Enriching content
- adding annotation for search engine optimization (using Stanbol Rules for conversions to schema.org Ontology)
- Answer to: which relevant 'use case' was in mind before implementation? (we should reuse the wording/ point to findings from D1.3 (cf. intranet) if they fit to indicate 'progress')
- a vertical view on how IKS components have been integrated
- Links to Online-DEMOs if available, otherwise screenshots, screencasts, etc. (cf. http://wiki.iks-project.eu/index.php/Validation)
An overview summarizing all the answers can be found here: http://wiki.iks-project.eu/images/a/ae/Aggregated-overview.pdf
Looking at adoption an expirienced benefit there are two Stanbol components with results that are notably different from the others. On one side we have the Enahncer which is the most understtod component with the highest number of mentioned use cases, on the other side we have the factstore which is neither understood nor adopted.
The GUI components are also described as bringing value to the products or as being expected to bring value.
Documentation and Website
- There is a goal problem, the webpages for the stanbol components are written in a very technical point of view. That makes it hard to get a picture of what a single component does. Most of the main goals I understand because I'm in touch (discussions) with the members of the IKS project. To understand the business use cases and benefits the webpages are not really good: No selling arguments.
- Multiple reports also criticize the documentation of not introducing relevant terms.
- Many components lack examples or tutorials, others lack high level description describing the benefits of the component
Group of Components
To give a better overview of the answers we grouped the components into three categories:
- Annotation Backend Components
- Extended Semantic Components
- Front-end Components
Annotation Backend Components
Entityhub and Enhancer are mostly used together. The results for the Entitihub are mostly similar except for the documentation which is considered significantly worse. This is conistent with the fact that in common usage scenarios the client code interacts with the enhancer and then the entityhub is only used indirectly.
The problem mentioned includes also the data format. RDF formats can not yet be easily integrated due to the lack of CMS support.
Multiple time limitations of the REST API was criticized as not offering enough features, namely the inability to filter the results. The java API offers more feature so that InsideOut10 could work around the limitation of the REST API by adding OSGi components to the framework and extending extending the REST API with them.
Another limitation lies in the dbpedia index. Dbpedia data is considered to be more or less useless for most customer projects (Alkacon). The documenation on how to create and use custom indexes is considered o be hard. InsideOut10 also considers the dbpedia basis as to narrow, they suggest CommonCrawl might be part of a solution for this; Freebase also could bring a great advantage. They consider using the web (or LOD) to relate entities to be a great improvement compared with using just the dbpedia index.
Extended Semantic Components
Ontology Manager, Rules, Reengineering and Reasoning provide the capabilities to furter process, enahnce and modify RDF data.
Usecase not currently needed, might be of use once we have more semantic data in CMS
- aggregated feedback per module / component > the raw results
- which components have been adapted and why
- why some components haven't yet been adapted
- Key findings / lessons learnt
The understanding and the adoption of the Enhancer as well as the GUI components is quite high. They are considered to add value to several products even though being only demonstrable but not sellable yet. So the integration is considered promising but still in progress. This interpretation explains the smaller adoption and understanding of the extended semantic components as they presuppose existing RDF data and address issues resulting from dealing with espcially larger sets of such data.
The factstore is a small component that wasn't undergoing development in recent times. As no other component seems to be depending on discontinuing the module should be evaluated.
While also having no positive estimate for ongoing development after the funding period the situation for the Contenthub is different than for the Factstore. The Contenthub is being used by the CMS Adapter which wasn't yet available when starting the evaluation and thus not part of the questionnaire. Nevertheless Nuxeo gave a feedback on that component as well. The component is also now integrated in the Adobe integration demo SlingStanbol. It is possible that the Contenthub will be in use mainly as a backend component similarly to the Entityhub, i.e. a component that is needed even though it is not typically directly accessed by the client code.
Four industrial members of the consortium and one industrial early adopter have given feedback to the software developed by IKS. Most industrial partners embrace the enhancement facilities of Stanbol and the client-side semantic libray VIE. The feedback provide some concrete improvement suggestions both to increase competitiveness of the components as well as for improving accessibility and documentation.
As a next step the concrete suggestions should be converted into issues added to the issue tracking systems of the respective software modules.
Appendix - Validation Results
Adobe, USA / Switzerland
Evaluation feedbak: http://wiki.iks-project.eu/images/d/d7/Feedback-adobe.pdf
Evaluation Feedback: http://wiki.iks-project.eu/images/c/ca/Feedback-alkacon.pdf
Pisano / CIC, Germany
Evaluation Feedback: http://wiki.iks-project.eu/images/e/eb/Feedback-pisano.pdf