FactStore Specification

From IKS Project

Jump to: navigation, search

Contents

Specification Proposal

Abstract

The IKS analysis architecture contains functional boxes for reasoning capabilities of the IKS. The past versions of the IKS reference implementation (IKS-RI), namely IKS-Alpha and IKS-3.0, do not reflect this kind of functionality in an easy to use way. IKS-Alpha and 3.0 focus on a entity centric view on the IKS missing the ability to define and store semantic relationships between entities. The IKS FactStore is designed to store semantic relations in terms of facts about entities and their relationships. Additionally, the FactStore specifies a simple SQL-like query language expressed in JSON-LD to search for semantic relations and to reason through the use of joined relationships. This specification defines required interfaces for the FactStore in terms of RESTful API interfaces and a concept for a possible implementation.

Introduction

According to the defined IKS requirements, the IKS needs the functionality to reason over entities and their semantic relationships. In the following, we will refer to semantic relationships as facts about entities. For example, the relation ‘emplyeeOf’ may be a fact about two entities one of type person and one of type organization. Facts are n-ary meaning that the number of participating entities is not limited.
The FactStore implements a store for facts plus the ability to query for single facts and for combinations of facts which is equal to the required IKS reasoning capability. In summary, the FactStore implements:

  • Persistence storage for n-ary facts about entities
  • Query language to query for a single fact
  • Query language to query for combinations of facts (reasoning)

In the following, we will define the required interfaces for the FactStore plus the required query language.

Note: Interfaces will be defined as RESTful service APIs. The payload of service calls is specified using JSON-LD (Specification version 20110507).

Note: The FactStore does not provide any SPARQL endpoint so far. This could be part of an extended version.

Store Interface

The store interface allows clients to put new fact schemata and according facts (instances of that schemata) to the FactStore.

Publish a New Fact Schema

Description: Allows clients to publish new fact schemata to the FactStore. Each fact is an n-tuple where each element of that tuple defines a certain type of entity. A fact schema defines which types of entities and their roles are part of instances of that fact.
Path: /factstore/facts/{fact-schema-name}
Method: PUT with data type application/json returns HTTP 201 (created) on success.
Data: The fact schema is sent as the PUT payload in JSON-LD format as a JSON-LD profile. The name of the fact is given by the URL. The elements of the schema are defined in the "#types" section of the JSON-LD "#context". Each element is specified using a unique role name for that entity plus the entity type specified by an URN.
Example 1: PUT /factstore/facts/http%3A%2F%2Fiks-project.eu%2Font%2FemployeeOf
with the following data
{
 "@context" :
 {
  "iks"     : "http://iks-project.eu/ont/",
  "#types"  :
  {
    "person"       : "iks:person",
    "organization" : "iks:organization"
  }
 }
}

will create the new fact schema for "employeeOf" at the given URL which is in decoded representation: /factstore/facts/http://iks-project.eu/ont/employeeOf

Instead one can use the cURL tool for this. Store the fact schema in a JSON file and then use this command.

curl http://localhost:8080/factstore/facts/http%3A%2F%2Fiks-project.eu%2Font%2FemployeeOf -T spec-example1.json
Example 2: PUT /factstore/facts/http%3A%2F%2Fwww.schema.org%2FEvent.attendees
with the following data
{
 "@context" :
 {
  "sorg"       : "http://www.schema.org/",
  "#types"     :
  {
    "event"    : "sorg:Event",
    "attendee" : ["sorg:Person","sorg:Organization"]
  }
 }
}

will create the new fact schema for "attendees" at the given URL which is in decoded representation: /factstore/facts/http://www.schema.org/Event.attendees.

Note: That this fact schema uses the ability to define more than one possible type for a role. The role 'attendee' can be of type http://www.schema.org/Person or http://www.schema.org/Organization.

Status Implemented in Apache Stanbol

Get Fact Schema

Description: Allows clients to get the definition of an existing fact schema.
Path: /factstore/facts/{fact-schema-name}
Method: GET with data type application/json returns HTTP 200 on success.
Data: The fact schema is returned as a JSON-LD profile.
Example: GET /factstore/facts/http%3A%2F%2Fiks-project.eu%2Font%2FemployeeOf
will return the following data:
{
 "@context" :
 {
  "#types"  :
  {
    "person"       : "http://iks-project.eu/ont/person",
    "organization" : "http://iks-project.eu/ont/organization"
  }
 }
}
Status Implemented in Apache Stanbol

Publish New Facts

Description:Allows clients to publish a new facts according to a defined fact schema that was previously published to the FactStore. Each new fact is an n-tuple according to its schema where each tuple element identifies an entity using its unique IRI.
Path:/factstore/facts
Method:POST with data type application/json returns HTTP 201 (created) on success.
Data:The facts are sent as the POST payload in JSON-LD format referring to the defined JSON-LD profile. The name of the fact is given in the "@profile" element of the JSON-LD object. The JSON-LD object contains a list of facts under the attribute "facts" where each element of that list is an n-tuple of entity instances according to fhe fact schema. The instance of an entity can be specified either by its unique IRI or by specifying the instance by example.

Using the instance by example variant requires the FactStore to resolve the entity in an EntityHub. An entity by example is specified by defining attributes and required values of the searched entity. A fact can only be stored if all entities can be uniquely identified either by their IRI or by example.

Example 1:POST /factstore/facts

with the following data

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/",
   "upb" : "http://upb.de/persons/"
 },
 "@profile"     : "iks:employeeOf",
 "person"       : { "@iri" : "upb:bnagel" },
 "organization" : { "@iri" : "http://uni-paderborn.de"}
}

creates a new fact of type http://iks-project.eu/ont/employeeof specifying that the person http://upb.de/persons/bnagel is employee of the organization defined by the IRI http://uni-paderborn.de.

Example 2:POST /factstore/facts

with the following data to create several facts of the same type at once

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/",
   "upb" : "http://upb.de/persons/"
 },
 "@profile"     : "iks:employeeOf",
 "@" : [
   { "person"       : { "@iri" : "upb:bnagel" },
     "organization" : { "@iri" : "http://uni-paderborn.de" }
   },
   { "person"       : { "@iri" : "upb:fchrist" },
     "organization" : { "@iri" : "http://uni-paderborn.de" }
   }
 ]
}

creates two new facts of type http://iks-project.eu/ont/employeeof specifying that the persons http://upb.de/persons/bnagel and http://upb.de/persons/fchrist are employees of the organization defined by the IRI http://uni-paderborn.de.

Example 3:POST /factstore/facts

with the following data to create several facts of different type

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/",
   "upb" : "http://upb.de/persons/"
 },
 "@" : [
   { "@profile"     : "iks:employeeOf",
     "person"       : { "@iri" : "upb:bnagel" },
     "organization" : { "@iri" : "http://uni-paderborn.de" }
   },
   { "@profile"     : "iks:friendOf",
     "person"       : { "@iri" : "upb:bnagel" },
     "friend"       : { "@iri" : "upb:fchrist" }
   }
 ]
}

creates two new facts. The first one of type http://iks-project.eu/ont/employeeof specifying that the person http://upb.de/persons/bnagel is employee of the organization defined by the IRI http://uni-paderborn.de. The second of type http://iks-project.eu/ont/friendOf specifying that http://upb.de/persons/fchrist is a friend of http://upb.de/persons/bnagel.

Status Implemented in Apache Stanbol

Query Interface

The query interface allows clients to query for facts and combination of facts (reasoning). The JSON-LD query structure is inspired by SQL using SELECT FROM [JOIN] WHERE constructs. Depending on the implementation the JSON-LD queries may be transformed directly into valid SQL queries.

Query for Facts of a Certain Type

Description:Allows clients to query stored facts of a specific type defined by the fact's schema. The clients specify the desired fact plus an arbitrary number of entities that play some role in the fact.
Path:/factstore/query
Method:POST with data type application/json returns application/json
Data:The query is specified by a JSON-LD object in the payload of the request. The query defines a "select" to specify the desired type of result to be returned in the result set. The "from" part specifies the fact type to query and the "where" clause specifies constraints to be fulfilled.

Note:For the moment constraints only support the equals "=" relation. There may be more relations like ">" in future versions of this specification. If there is more than one constraint all constraints are concatenated by "AND".

Example 1:POST /factstore/query

with the following data"

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/"
 },
 "select" : [ "person" ],
 "from"   : "iks:employeeOf",
 "where"  : [
   {
     "="  : {
       "organization" : { "@iri" : "http://uni-paderborn.de" }
     }
   }
 ]
}

returns the list of all persons participating in the fact of type http://iks-project.eu/ont/employeeOf where the organization is http://uni-paderborn.de. The result is sent back in JSON-LD format with the result set specified by the select clause.

{
 "@subject" : [
   {
     "@subject" : "_bnode1",
     "PERSON"   : "http://upb.de/persons/gengels"
   },
   {
     "@subject" : "_bnode2",
     "PERSON"   : "http://upb.de/persons/ssauer"
   },
   {
     "@subject" : "_bnode3",
     "PERSON"   : "http://upb.de/persons/bnagel"
   },
   {
     "@subject" : "_bnode4",
     "PERSON"   : "http://upb.de/persons/fchrist"
   }
 ]
}

If there is only one entry in the result set, this would be returned as follows.

{
  "PERSON"   : "http://upb.de/persons/fchrist"
}
Status: Example 1 is implemented and should work in latest Apache Stanbol versions.
Example 2: GET /factstore/query?q=

with the following data as the request parameter "q"

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/"
 },
 "select" : [
   "person.name", "person.email"
 ],
 "from" : "iks:employeeOf",
 "where" : [
   {
     "=" : {
       "organization" : { "@iri" : "http://upb.de" }
     }
   }
 ]
}

returns a list with names and e-mail addresses of all persons participating in the fact of type http://iks-project.eu/ont/employeeOf where the organization is http://upb.de. The result is sent back in JSON-LD format with the result set specified by the select clause.

{
 "resultset": [
   { "person.name" : "Gregor Engels",
     "person.email": "engels@upb.de"  },
   { "person.name" : "Stefan Sauer",
     "person.email": "sauer@upb.de"   },
   { "person.name" : "Benjamin Nagel",
     "person.email": "nagel@upb.de"   },
   { "person.name" : "Fabian Christ",
     "person.email": "christ@upb.de"  }
 ]
}
Status Implemention in progress.

Query for Combinations of Facts

Description:Allows clients to query for combinations of facts.
Path:/factstore/query?q=
Method:GET with data type application/json returns application/json
Data:The query is specified by a JSON-LD object in request parameter "q" of the request. The query defines a "select" to specify the desired type of result to be returned in the result set. Instead of using a "from" part this type of query supports joins over facts using the "join" field. The "join" field specifies which facts are joined by specifying the elements of the facts that are evaluated to be equal during the join. The "where" clause specifies constraints over the join to be fulfilled.

Note: For the moment constraints only support the equals "=" relation. There may be more relations like ">" in future versions of this specification. If there is more than one constraint all constraints are concatenated by "AND".

Example:GET /factstore/query?q=

with the following request data in request parameter "q"

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/"
 },
 "select": [
   "iks:friendOf.friend.name"
 ],
 "join" : {
   "iks:employeeOf.person" : "iks:friendOf.person"
 },
 "where" : [
   {
     "=" : {
       "iks:employeeOf.organization" : {
         "@iri" : "http://upb.de"
       }
     }
   },
   {
     "=" : {
       "iks:friendOf.friend.city" : "Paderborn"
     }
   }
 ]
}

will return a list of names of all the friends living in Paderborn of the employees of the University of Paderborn. The result in JSON-LD format would look like the following.

{
 "@context" : {
   "iks" : "http://iks-project.eu/ont/"
 },
 "resultset": [
   { "iks:friendOf.friend.name" : "Schmidt"   },
   { "iks:friendOf.friend.name" : "Meier"     },
   { "iks:friendOf.friend.name" : "Schneider" },
   { "iks:friendOf.friend.name" : "Schuster"  }
 ]
}

Implementation Concept

The FactStore specification is written with a certain kind of implementation in mind. Although the implementation of the specification is not pretended it might be useful to have a look at this simple implementation concept.

Store Implementation

The store implementation is based on the well known concept of relational databases. Each fact schema is a table in a relational database. Creating a new fact schema is equivalent to creating a new table with a number of String attributes, because we store IRIs, according to the schema. For performance reasons the attributes should be indexed. The store just needs to be able to create new schemata. It is not specified that a schema may be altered over time. This could be an improvement for the future.

Query Implementation

The JSON-LD query structure is designed to be mapped directly to valid SQL statements. If the store is implemented in a relational database all queries can be transformed to SQL queries to this database. For security reasons it is important to keep hacks like SQL injection in mind when transforming the JSON-LD query to SQL. As seen in the examples, queries may use attributes of entities to formulate the request. However, the FactStore does only store the IRIs of the entities not the entities with their attributes. Therefore, the FactStores needs an EntityHub to resolve entities specified by their attributes. The EntityHub must be able to query for entities by example. Note: Depending on the number of entities returned by the EntityHub for a certain request this architecture may lead to performance problems. It has to be evaluated where the limit of this approach is in terms of performance. However, the assumption is that in many (or most) scenarios this will not become a problem. If it becomes a problem, the type of allowed queries may be restricted, e.g. don't allow queries that use entity attributes in the "where" clause, to avoid performance or memory problems.