“Lifted Relational Random Walks” has been integrated into BoostSRL to obtain random walks in relational domains. Relational data schema can often be represented as a lifted graph where the nodes represent entity types and edges represent relations between two entities. A random walk on such graph may result in exploration of some interesting structure present in the relational schema. For example, the following random walk can be interpreted as a person taught a course that has a
Consider the random walk:
personid -> taught -> courseid -> courseLevel -> levelid
This random walk can be converted into clausal form as:
taught(personid,courseid) ^ courseLevel(courseid,levelid)
Random Walks can be obtained by running the following command in BoostSRL:
java -cp edu.iu.cs.RelationalRandomWalks.RunRelationalRandomWalks -rw -train "./facts.txt" -startentity "personid" -endentity "personid" -maxRWlen 6
As shown above the following flags need to be set:
-rw: Perform lifted relational random walks.
-startentity: Set the entity type from which the random walk should always start (e.g. personid in the above example).
-endentity: Set the entity type at which the random walk should always end, (e.g. levelid in above example).
-maxRWlen: Set the maximum length (number of relations) of any random walks.
-train: Set the path to schema file.
The input file (‘facts.txt’) will consist of the schema of the relational dataset. An example of schema to be input to the system is shown as follows:
courseLevel(courseid,levelid)|NoBF student(personid,tudentype)|NoBF professor(personid,rofessortype)|NoBF inPhase(personid,haseid)|NoBF yearsInProgram(personid,yearid)|NoBF hasPosition(personid,ositiontype)|NoBF B_taughtBy(courseid,personid)|NoTwin|NoBB
Setting Flags in Input File
As can be seen from the above examples, some flags can be set in schema file after vertical bars (|) for each relation. For more information on importance of these flags, please refer to . This code supports the following flags:
NoTwin: This code allows an inverse relation for every relation present in schema file, which is represented by putting an underscore (_) character in front relation. For e.g. inverse of
courseLevel(coursid,levelid) will be represented as
_courseLevel(levelid, courseid) such that
_courseLevel are two distinct relations. Setting NoTwin disallows the inverse of a relation to be present in random walks.
NoTwin: Disallow the inverse of a relation from being present in random walks.
NoBB: Restrict an inverse relation to immediately follow itself in random walk.
NoFF: Restrict a non-inverse relation to immediately follow itself in random walk.
NoFB: Restrict an inverse relation to immediately follow its ‘non-inverse’ counterpart in random walk.
NoBF: Restrict a non-inverse relation to immediately follow its inverse counterpart in random walk.
Caution: these flags are case sensitive. So set them carefully.
The output will be stored in ‘RWRPredicates.txt’ file in the same folder as input file.
- Ni Lao and William W. Cohen, “Relational Retrieval Using a Combination of Path-Constrained Random Walks”, ECML 2011.