Semantic-search-benchmark
From IKS Project
Experimental Task: Semantic Search
Contents |
Business Need
Retrieving relevant information fast and efficiently for decision making / to solve problems.
Procedure for this challenge for a CMS provider
- Analyse the search application of your CMS according to the following schema by simply marking (as yes/no) those elements in the columns called "features", "functionality" and "interface components" which your CMS supports
- Select functionalities and interface components which are not yet supported and - depending on your business focus - choose some of them for protoypical implementation in the benchmark challenge.
- Implement the most appropriate functionality / feature. The implementation effort should not exceed 1 week of work.
- Provide IKS with access to a test system, demonstrating and documenting the new features that you implemented.
- Report your experiences and further requirements to IKS, so we can consider the functionality for a future reference implementation in the Stack.
Features and Metrics for Searching
The following tables use the Semantic Search Survey (1) and have been extended with the phase "Interaction with Results/Query Refinements" by Wernher Behrendt and Andreas Gruber. The structure describes search phases, its features, functionalities and interface components.
Phase: Query construction
| Feature | Functionality | Interface components |
|---|---|---|
| Free text input |
|
|
| Operators |
|
|
| Controlled terms |
|
|
| User feedback |
|
|
Phase: Search algorithm
| Feature | Functionality | Interface components |
|---|---|---|
| Syntactic matching |
|
|
| Semantic matching |
|
|
Phase: Presentation of results
| Feature | Functionality | Interface components |
|---|---|---|
| Data selection |
|
|
| Ordering |
|
|
| Organization |
|
|
| User feedback |
|
|
Phase: Interaction with results and query refinement
| Feature | Functionality | Interface components |
|---|---|---|
| User model |
|
|
| Output/device model |
|
|
| Domain content model |
|
|
| Truth maintenance |
|
|
Related User Stories
- I have a collection of 30'000 documents, and I want to find the five documents that talk about or where edited by John Smith. Problem is, there are three John Smiths in my company, and the two others appear in lots of documents.
- When visiting a house rental website, I can formulate queries like “recent pages that talk about houses to rent in the french part of Switzerland” and the website search engine understands them.
- I'm working with a digital asset management system, and I want to find images that are similar to the one I'm looking at, either in terms of the real-world objects that the images represent, or in terms or graphical similarity (colors, shapes, etc.).
Timeline
- Start July 2009
- End August 2009
Intro for the CMS providers in IKS "semantically enhanced" CMS: What is it?
(by Wernher Behrendt)
IKS is about improving the technological capabilities of CMS platforms to make them "semantically" more powerful.
"Semantics" originally, is the study of meaning - a subfield of "linguistics".
"Semantics" in the way it is used for web-based information systems is a new hype word for anything that is considered "smart", "intelligent", "logic-based", "rule-based", "ontologically sound" etc.
Since "semantics" is about meaning, it follows that any software which does some useful job has obviously got some encoding of "meaning"! In other words: The functionality provided by the CMSs of our industrial partners have (a lot of) encoded semantics. But there is two kinds of semantics: implicit and explicit. Suppose your CMS allows the definition of workflows and the definition of user roles. So you may be able to define different kinds of content publishing workflows. A copy-editor may be allowed to open anybody's web pages and improve the wording. A simple author may be allowed to just create single pages in one category, but is not allowed to change other authors' texts. If your CMS has some "configuration language" to define these roles and workflows, then we would consider it to be more explicit / declarative than a CMS where each workflow is defined as bespoke php-code (or Java code) and where changing the workflow requires re-programming.
When we get to a situation where the description of one editing workflow from CMS 1 can be imported (and re-used) to define the same workflow in CMS 2 then these two systems are "semantically interoperable". To achieve such interoperability is the purpose of standards. The more it is possible for any two systems to interoperate - particularly if they can complement each other - the more degrees of freedom we give to the users. Sure, the fear is that it may mean "freedom to choose another CMS supplier", but much more so, it means the freedom to re-use what is mine anyway (the content I built using your system) in another context in which I NEED to use my content. Now: if YOUR system does not support that context then your system becomes an inhibitor to my business process! Many of you would like to sell their systems to organisations, but the user organisations are reluctant to buy your system because they fear user lock-in.
This is where "semantics-based interoperation" forces you (the CMS providers) to make a strategic decision:
Do you want to base your business model on "user lock in", then you want to keep (proprietary) fences around your system. Or do you want to base your business model on "selling the capability to do X, Y and Z" then your system may only be providing "Y", but because it is interoperable with the other systems that provide the "X" and the "Z", your system will also be bought, because you are providing a value, rather than inhibiting my capabilities.
We suggest that the semantic benchmarks for IKS take this kind of perspective: "IKS wants to specify useful functionality in a way that enables maximum capability for customers to create, exchange, enhance content and re-use it in many different business- and other usage settings". What does this mean for our approach to designing "Semantic Benchmarks for CMSs"?
First of all, we should start by asking "what functionality would users like to have?" and group our benchmarks so that we cover a good range of current and likely future, requirements for content and knowledge management. We should NOT start by asking: "do you use RDF, OWL, ontology editors, etc." Instead, we should ask: "can I define a workflow and roles in your CMS?"; "can I define different kinds of content (e.g. the basic structure of a technical report or the Agenda of a meeting in your CMS)?".
Once the answer is "yes" to any such functionality, we should ask for implicitness or explicitness of the functionality. Here is, on an ordinal scale, a grading of semantic explicitness, from "poor" to "excellent", for any sort of functionality:
(- - ) "we have to program this function in the implementation language of the CMS"
(-) "we have a proprietary configuration module with which you can parameterise the functionality"
(+) "we use an existing standard to encode the information about this functionality"
(++) "we use an explicit model which conforms to a standard and which is interoperable with at least one more explicit model provided by some other supplier (possibly following the standards which hold for this other system)".
(1) An analysis of search-based user interaction on the Semantic Web; M. Hildebrand, J.R. van Ossenbruggen, L. Hardman; 2007, INS-E0706, ISSN 1386-368, http://db.cwi.nl/rapporten/abstract.php?abstractnr=2098.

