Tuesday, June 26, 2018

Training and Test Data for Knowledge Graph Completion


Positive Examples:

Positive examples for predicate P are any entity pairs (x,y) in DBpedia such that (x,P,y) is in it. For example, the positive example for the spouse relation are the people who are married to each other. And, the type of entities for both x and y are Person type.

The example SPARQL query to generate the positive examples for spouse relation is given below:

PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?subject ?object
FROM <http://dbpedia.org>
WHERE { ?object rdf:type dbpedia-owl:Person.
?subject rdf:type dbpedia-owl:Person. 
?subject ?targetRelation ?object.  
FILTER (?targetRelation = dbpedia-owl:spouse)
} ORDER BY RAND() LIMIT 10000



The above query queries the distinct entity pair (subject, object) such that they are connected by target spouse. Last line in the query limits the output to 10,000 pairs and produce random output on each query. Also, we define the type of subject and the object as Person. With above query, we get 10,000 entity pairs of Person types connected by the spouse relation in KG.


Negative Examples:

For negative examples, every other entity z not connected by P to x can be considered as a negative example. In fact, if there are n entities in KG, then there could be at most n * (n-1) pairs of negative examples. To narrow down the scope of negative examples, the negative examples are the entity pair x,y that satisfy the following conditions as described in RuDiK:

RuDiK- codebase
paper

  1. (x,P,y) is not in KG
  2. either there is some y' != y such that (x,P,y') is in KG, or there is some x' != x such that (x',P,y) in KG;
  3. there is some P' != P such that (x, P',y) is in KG.

The first condition ensures that x and y are not connected by P in the KG. The third condition limits the negative example to entities connected by any other relation other than P. Then, the second condition further limits the negative examples to entities which have the predicate P in KG.

The above conditions are represented in SPARQL Query for spouse relation as follows.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?subject ?object
FROM <http://dbpedia.org>
WHERE { ?object rdf:type dbpedia-owl:Person.
?subject rdf:type dbpedia-owl:Person. 
{{?subject ?targetRelation ?realObject.} UNION  {?realSubject ?targetRelation ?object.}} ?subject ?otherRelation ?object.
FILTER (?targetRelation = dbpedia-owl:spouse)
FILTER (?otherRelation != dbpedia-owl:spouse)
FILTER NOT EXISTS {?subject dbpedia-owl:spouse ?object.} }
ORDER BY RAND() LIMIT 10000

In above query, the type of object and subject is set to be Person. The above query selects a sample of negative examples which are not connected by the spouse relation, are connected by any other relation and has spouse relation with any other entity.

You can find the implementaion on the following link:


Dependencies:
pip install sparql-client
pip install argparse
pip install csv
 To install local version of DBpedia:  http://kbreasoning.blogspot.com/2017/12/setting-up-local-dbpediawikidata-with.html

No comments:

Post a Comment