GzEvD Proposal

From IKS Project
Jump to: navigation, search


DBpedia Spotlight [4] is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia. DBpedia Spotlight detects phrases that could be entities or other more general concepts in text (what we call Spotting), finds possible meanings for those mentions according to DBpedia (called Candidate Selection), and ranks those meanings according to the context that they were mentioned (called Disambiguation). Through background knowledge provided by DBpedia, users can select which types of annotations to show, including arbitrarily defined "types" through SPARQL queries.


Based on our experience so far [1][3][4], we strongly believe that there is no one-size-fits-all annotation solution. Annotation tools should be generic, adaptable to particularities of each use case. Providing Apache Stanbol Enhancement Engines based on DBpedia Spotlight allows flexible mix-and-match of components and strategies to adapt to different task-specific needs.

However, as the number of available components increase, it becomes hard for users to keep track of the strengths and weaknesses of each method. Moreover, for non-technical users a default engine that performs reasonably for known use cases should be included.


We plan to integrate Stanbol and DBpedia Spotlight by offering a DBpedia Spotlight Enhancement Chain. The definition of a DBpedia Spotlight EnhancementChain ensures that typical users can use Spotlight without the need to know the inner working. Users would just need to send enhancement requests to "http://{host}:{port}/enhancer/chain/dbpedia" assuming that the DBpedia Spotlight chain is called "dbpedia".

Moreover, we will offer EnhancementEngines based on individual components in DBpedia Spotlight such as Spotters and Disambiguators. This will allow the mix-and-match of DBpedia Spotlight components with existing Stanbol components.

We will evaluate several chains, and propose the best combination of strategies as the default enhancement chain so that requests to "/enhancer" would be processed by it.


We have collected a number of evaluation data sets that have been used in the context of content enhancement research, including wikification, entity linking/disambiguation, entity extraction, etc. These data set have thousands of manually verified occurrences of entities and other concepts. We will use for the evaluation a subset of existing data sets such as:

  • M&W Wikify (Milne & Witten, 2008) [5]
  • CSAW (Kulkarni et al. 2009) [6]
  • NYT10 (Mendes et al., 2011) [4]

We will use the Stanbol Benchmarking tool in the evaluations.


  • Start of contract: 01 April 2012
  • Stanbol components for validation are: Apache Stanbol
  • Demo system will be available: 01 June 2012
  • End of contract: 12 June 2012
  • Total Contract: 7000 Euro

Planned Tasks

  • Expose /spot REST endpoint from DBpedia Spotlight
  • Standardize input/output format for endpoints according to Stanbol’s needs
  • Creation of EnhancementEngines for /spot, /candidates, /disambiguate
  • Creation of a DBpedia Spotlight EnhancementChain
  • Evaluation using existing corpora
  • Propose a new Stanbol Default Enhancement Engine

Online Demo

(not yet available) http://spotlight.dbpedia.org/stanbol/


DBpedia is one of the largest cross-domain knowledge bases on the Web. Since its inception in 2006, it has developed into a nucleous for interconnecting hundreds of other sources on the Web of Data. Providing specialized engines within Stanbol for linking to this prominent source will likely cater to a large audience of researchers and practitioners in the area of Linked Data.

DBpedia Spotlight has been recently accepted to the Google Summer of Code 2012 (GSoC2012), a program that supports students to engage in open source software development. We expect that a number of enhancements will be made to DBpedia Spotlight during this program. With the integration with Stanbol, many of these enhancements will be also available for Stanbol users. We will publicize our GSoC2012 as well as our EAP results to our fast growing user base.


[1] Mendes, P.N., Daiber, J., Rajapakse, R., Sasaki, F., Bizer, C. Evaluating the Impact of Phrase Recognition on Concept Tagging. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2012, 21-27 May 2012, Istanbul, Turkey. (to appear)

[2] Mendes, P.N., Jakob, M., Bizer, C. DBpedia for NLP: A Multilingual Cross-domain Knowledge Base. Proceedings of the International Conference on Language Resources and Evaluation,LREC 2012, 21-27 May 2012, Istanbul, Turkey. (to appear)

[3] Mendes P.N., Daiber, J., Jakob, M., Bizer, C. Evaluating DBpedia Spotlight for the TAC-KBP Entity Linking Task. Proceedings of the Text Analysis Conference, TAC 2011. 14-15 November 2011, Gaithersburg, Maryland USA. (to appear)

[4] Mendes P.N., Jakob M., García-Silva A., Bizer C. DBpedia Spotlight: Shedding Light on the Web of Documents. In the Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011). Graz, Austria, September 2011. (best paper award) [slides] [DOI 10.1145/2063518.2063519]

[5] D. Milne and I. H. Witten. Learning to link with wikipedia. In Proceeding of the 17th ACM conference on Information and knowledge management, CIKM ’08, pages 509–518, New York, NY, USA, 2008. ACM.

[6] S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective annotation of wikipedia entities in web text. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, pages 457–466, New York, NY, USA, 2009. ACM.