Apache Stanbol is hosted at http://incubator.apache.org/stanbol/index.html
Apache Stanbol graduated to full Apache project status in September 2012. It is an open source modular software stack of components for semantic content management.
Apache Stanbol components are meant to be accessed over RESTful interfaces to provide semantic services for content management. Thus, one application is to extend traditional content management systems with (internal or external) semantic services. Additionally, Apache Stanbol let's you create new types of content management systems with semantics at their core. The current code is written in Java and based on the OSGi component framework.
Applications include extending existing content management systems with (internal or external) semantic services, and creating new types of content management systems with semantics at their core.
Software Licensing: Apache Licensing
Apache Stanbol's main components are:
1. Content Enhancement
Services that add semantic information to “non-semantic” pieces of content.
- The Enhancer component together with its Enhancement Engines provides you with the ability to post content to Apache Stanbol and get suggestions for possible entity annotation in return. The enhancements are provided via natural language processing, metadata extraction and linking named entities to public or private entity repositories. Furthermore, Apache Stanbol provides a machinery to further process this data and add additional knowledge and links via applying rules and reasoning. Technically, the enhancements are stored in a triple-graph that is maintained by Apache Clerezza.
- The 'Sparql endpoint' gives access to the semantic enhancements form the Apache Stanbol Enhancer.
- The Entityhub is the component, which lets you cache and manage local indexes of repositories such as dbpedia but also custom data (e.g. product descriptions, contact data, specialized topic thesauri).
Services that are able to retrieve additional semantic information about the content based on the semantic information retrieved via content enhancement.
- The Rules component provides you with the means to re-factor knowledge graphs, e.g. for supporting the schema.org vocabulary for Search Engine Optimization.
- The Reasoners can be used to automatically infer additional knowledge. It is used to obtain new facts in the knowledge base, e.g. if your enhanced content tells you about a shop located in "Montparnasse", you can infer via a "located-in" relation that the same shop is located in "Paris", in the "Île-de-France" and in "France".
3. Knowledge Models
Services that are used to define and manipulate the data models (e.g. ontologies) that are used to store the semantic information.
- The Ontology Manager is the facility that manages your ontologies. Ontologies are used to define the knowledge models that describe the metadata of content. Additionally, the semantics of your metadata can be defined through an ontology.
Services that store (or cache) semantic information, i.e. enhanced content, entities, facts, and make it searchable.
- The Contenthub is the component which provides persistent document store whose back-end is Apache Solr. On top of the store, it enables semantic indexing facilities during text based document submission and semantic search together with faceted search capability on the documents.
- The FactStore is a component that let's use store relations between entities identified by their URIs. This relation between two entities is called a fact.
- The CMS Adapter component acts as a bridge between JCR/CMIS compliant content management systems and the Apache Stanbol. It can be used to map existing node structures from JCR/CMIS content repositories to RDF models or vica versa. It also provides services for the management of content repository items as components/enhancer/contentitem.html Content Items within Contenthub.
Usage Scenarios, Demos and Documentation
- Basic Content Enhancement: Analyze textual content, enhance with it with named entities (person, place, organization), suggest links to open data sources.
- Working with "local" Entities: Use locally defined entities (e.g. thesaurus concepts) from an organization's context.
- Working with multiple languages: Get enhancements for textual content in multiple languages (EN, DE, SV, DA, PT and NL).
- Semantic Search in Portals: Store/index enhancements and content items. For a portal this would facilitate semantic search applications.
- Refactoring Enhancements for SEO: Refactor the enhancement result, its property names and ontology types according your target ontology.
- Transforming CMS repository structures into ontologies.
- Provide repository structures as thesaurus or domain ontology, e.g. categories.
Documentation for CMS Developers
Documentation for Apache Stanbol Contributors
- Java API for developers
- How to contribute to software development
- How to work and update documentation
- Proposals for further development
- How to build and engine
IKS Software Lab
The IKS back-end components FISE, KReS, RICK CMS Adaptor and FactStore were the core components that now form the Apache Stanbol stack. The original work behind these components have been preserved in the links below. Please note that this is archival information only, the current components available in the Apache Stanbol project are listed above.
- FISE: is the Furtwangen IKS Semantic Engine, created during the IKS Semantic Engine Hackathon in March 2010. It implements a simple OSGi-based RESTful engine that can enhance textual content, using pluggable enhancement engines.
- KReS: is the Knowledge Representation and ReaSoning layer of IKS. The goal of KReS is to exploit semantic web standards and technologies for extending CMS systems with semantic capabilities. KReS is composed by three subsystems: the Ontology Network Manager (ONM), and the Rule manager and inference engine (R&I) - which implement the core part - and Semion, which provides reengineering and refactoring capabilitie
- RICK: Reference Infrastructure for Content and Knowledge. RICK provides an infrastructure to manage referenced Sites. RICK comes with out of the box support for commonly used protocols such as Linked Data but also allows extensions to work with sites that do not support standards. RICK allows the use of local caches. Such caches can store some/all information of referenced sites and are typically used if one needs to work offline, to increase query performance or to support queries that would not be possible/feasible by directly using the services provided by the referenced site.
- CMS Adaptor: CMS Adapter (a.k.a OntoGen) provides services for defining semantic relationships between content items of a CMS and transforms those relationships to ontology elements. In this manner, already available semantics in CMSs can be extracted and stored in a knowledge base. CMS Adapter also aims to keep generated ontology synchronized with content repository by providing notification services to submit updates on the content repository.
- Factstore: The IKS FactStore is designed to store semantic relations in terms of facts about entities and their relationships. Additionally, the FactStore specifies a simple SQL-like query language expressed in JSON-LD to search for semantic relations and to reason through the use of joined relationships. This specification defines required interfaces for the FactStore in terms of RESTful API interfaces and a concept for a possible implementation.