Early Adoption and Validation Proposal
netlabs.org is working on a framework which facilitates interfacing RDF based data. We heavily rely on ontologies in our framework, which means we cache them and do quite some reasoning on top of them to figure out how data can be interfaced and shown in the optimal way. This includes figuring out relationship between ontology classes and attributes. Right now this reasoning is pretty minimalistic and mainly done in code, which means we analyze triples our self.
The goal of the Early Adopter's project is to implement the reasoning parts on top of Stanbol based RESTful services. This will allow us to use the power of the Stanbol rules and reasoning services to infer additional relationships between ontologies and also use the ontology manager to cache commonly used ontologies with Stanbol to speed up internal services in our framework.
We will use Stanbol to:
- cache ontologies: Right now we fetch them from the various official sites which are often very slow and unreliable
- reason/infere (and cache the reasoned knowledge) relationships between ontologies: This includes figuring out which attribute belongs to which class, which class to which superclass etc. We can do much smarter matching with Stanbol than we do now in code, which will improve our interfaces
- implement several strategies to figure out matches between RDF data: This again improves our user interface. As an example, an interface could figure out that the attribute foaf:based_near is a spacial thing and thus can be shown by any interface class which can show a map.
UC 1 - Framework/data perspective
- Create class trees and property trees for ontologies: The goal is to find out how classes and properties are related to each other. In our framework UI widgets match to RDF properties. But we cannot and do not want to implement a class for every property so the relationship between properties can help the system to figure out which widget might be the best choice to interface a certain information, even if the widget designer did not necessarily think of that upfront.
- Clean up consumed RDF data: RDF based data is often incomplete. As an example, many RDF datasets available in the LOD cloud do not assign the proper classes to URIs but just use the attributes. This makes it difficult for our framework to figure out what a certain URI is really about and thus it is difficult to match it to the proper UI widgets as well. RDF classes are also used by our framework to match available views on certain data, this would heavily improve this matching as well.
UC 2 - End users (using applications implemented on our framework)
- Much smarter user interfaces that can adopt to the selected (RDF based) data and choose the best representation for that data on a particular device (smartphone might look different than the desktop web browser)
- Faster experience because we cache the ontologies and additional inferred knowledge in Stanbol
- Can handle incomplete or partially wrong RDF data better, which makes consuming LOD more convenient
- All ontologies used must be available in the Internet and dereferencable by our framework. This is true for most commonly used ontologies but there seem to be a few exceptions of the rule.
- Poorly made ontologies will provide little or no use for reasoning. While we did not extensively analyze commonly used ontologies we already did spot some flaws in some referenced ontologies. It will be interesting to see how well our approach scales in the (current) RDF real world.
- While many ontologies remain pretty static it is still important to handle caching properly. We will run tests to see if Stanbol handles caching correctly from both client and server perspective while caching ontologies.
Potential Further Enhancements
The following ideas could be addressed after the initial scope of development for this Early Adopter's proposal:
- It might be useful to have some kind of meta-ontologies which describe and match relationships between classes and attributes among different ontologies in the semantic web. This could improve inference and make matching between similar things easier. We have some ideas about how this could be implemented and it would be interesting to see how Stanbol can help on that.
- There are several large RDF based knowledge bases available like YAGO2 and/or UMBEL which can also be used as an ontology. It will be interesting to see how these knowledge bases can be integrated into our framework using Stanbol to gain additional knowledge.
- The same applies for any vocabulary based on ontologies like SKOS. While it is not completely clear yet how such vocabularies could be used it would be interesting to play around with it.
- We got some encouraging feedback on the Stanbol mailinglist about this proposal. We would work closely with the developers of the various Stanbol components mentioned during this project to assure a successful outcome.
- Our framework will be released as Open Source software under a liberal license later in 2012 (we hope). We will work closely with the group of current developers which themselves are experienced open source software contributors for many years.
- All work done on top of Stanbol will be documented and released to the public at the end of the project in September 2012. This could be done in the official Stanbol documentation if requested by the developers.
- Adrian Gschwend will do the major work on this particular project and report both success and issues back to the Stanbol developers.
- netlabs.org is involved in a FP7 SME project which starts in July 2012. The outcome of this proposal would be directly applied there to see how much benefits we have from a client perspective. Also there are two ongoing projects with the Berne University of Applied Sciences running on which we would also validate the prototype.
The validation for the Stanbol integration will be initially on our FreeBSD jail servers. If necessary we can and will provide access via OpenVPN to Stanbol developers and at a later stage in the project to a wider community for evaluation and feedback.
Currently caching of our framework is done purely in our code. We will use Stanbols ontology caching services to evaluate if we can increase the user experience. Our internal unit test framework can be used to measure the performance of the Stanbol based implementation of the services.
Step 0: Project inception: Stanbol familiarization - required technical setup of Stanbol installations, documentation overview, API overview, sample REST queries, rules and reasoner samples. First contact with the development team in case of open questions.
Step 1: Stanbol bootstrapping: Setup of Stanbol in our FreeBSD jail, first RESTful interaction with our own Stanbol instance via curl and web interface, loading and unloading of ontologies in the ontology manager.
Step 2: First inference: Implement basic inference samples on top of Stanbol modules: attribute to class mapping and class to superclass mapping. This is the base to replace the current code-only inference used in our framework.
Step 3: Ontology caching: Extend the dereferencing engine used in our framework to get ontologies via Stanbol RESTful services instead of fetching them directly. Make sure ontology caching is done properly from a HTTP caching perspective in both internal and Stanbol based RESTful services.
Step 4: Accessing Stanbol from our framework: Replace the internal inference services in our framework with Stanbol based RESTful services to do the same mapping (implemented in step 2). This provides the technical base for more advanced mapping later.
Step 5: First performance tests: Compare the new Stanbol based inference and caching services with the former internal services. First optimizations in case of performance degradation due to external RESTful calls. This could be done by pre-fetching some commonly used ontologies and relationships from Stanbol based on usage statistics.
Step 6: UI widget selection: Up to step 5 the enhancements are purely on a data/ontology level which means that there is no direct visual benefit yet for the user, it is solely used to provide the base functionality in the back-end of the framework. In this step we will extend the UI widget matching on the front-end, which means we will be able to use rules to for example figure out that foaf:based_near is a special case of a WGS84 Spatial Thing. This will require enhancements on the UI level of our framework and again integration of Stanbol based rules services.
Step 7: Stanbol Rule Deep Dive: Based on the first basic rules we will dive deep into the inference possibilities and comparing the options Stanbol provides like SPARQL Construct, SWRL and Jena Rules. To provide some useful examples we will apply this on some more complex (and sometimes wrong) data in the LOD cloud to show how this can be used on real world data. An additional goal of this step is to compare the complexity and power of the different rule syntax from an engineer (not researcher :) perspective.
Step 8: UI verbosity: Based on some more advanced rules gathered in step 7 we will show how a minimal UI widget implementation can be extended with powerful widgets for particular data. The goal is to show how UI/widget programmers can easily add new functionality to the UI without bothering about the rest of the framework. This should also prove that it is possible to add a widget for a particular ontology class/attribute which can be re-used outside of its original scope by the framework with the help of Stanbol.
Tasks will be carried out beginning July 9 2012 and completing by September 30 2012 (12 weeks).
Performance of Contract
The terms of the contract are:
- Start of contract 9 July 2012
- Demo preview available 17 September 2012
- Demo system available 30 September 2012
- Validation interview in September 2012
- End of Contract 30 September 2012
- Total remuneration for this contract is 6500 Euro.
- The validation phase for the Stanbol integration will initially be internally on our FreeBSD Jail servers, then subsequently deployed to a public-facing running instance, accessible on the web to a wider community for evaluation and feedback.
- The work will be presented at an IKS workshop in XYZ 2012
- The work will be presented at an IKS workshop in XYZ of 2012.