Installation

We recommend using environments for managing Python packages, refer to venv or Conda for more information.

Stable versions of the rnlp library are hosted on the Python Package Index (PyPi)

pip install rnlp

Development versions are on GitHub. These may contain partially-implemented features.

git clone https://github.com/starling-lab/rnlp.git
python setup.py develop

Quick-Start

rnlp can either be used as a commandline tool or an imported Python Package.

Commandline:

$ python -m rnlp -f files/doi.txt
Reading corpus from file(s)...
Creating background file...
100%|████████| 18/18 [00:00<00:00, 38it/s]

Imoprted:

from rnlp.corpus import declaration
import rnlp

doi = declaration()
rnlp.converter(doi)

A Relational View of Text

Text will be converted into relational facts, built around the basic building blocks of Words, Sentences, and Blocks.

Words are individual units of text, such as the words you are currently reading. Sentences are a collection of Words. Blocks are a collection of Sentences.

This package encodes text in such a format so that relational learning methods (such as BoostSRL) can learn its structure.

Encoded Facts

  • Sentence’s Relative Position in Block:

    • earlySentenceInBlock: Sentence occurs within the first third of a block’s length.
    • midWaySentenceInBlock: Sentence occurs between the first and last third of a block’s length.
    • lateSentenceInBlock: Sentence occurs within the last third of a block’s length.
  • Word’s Relative Position in Sentence:

    • earlyWordInSentence: Word occurs within the first third of a sentence.
    • midWayWordInSentence: Word occurs between a third and two-thirds of a sentence.
    • lateWordInSentence: Word occurs within the last third of a sentence.
  • Relative Position Between Items:

    • nextWordInSentence: Pointer from a word to its neighbor.
    • nextSentenceInBlock: Pointer from a sentence to its neighbor.
  • Existential Semantics:

    • sentenceInBlock: Sentence occurs in a particular block.
    • wordInSentence: Word occurs in a particular sentence.
  • Low-Level Information about words:

    • wordString: A string representation of a word.
    • partOfSpeechTag: The word’s part of speech.

Example

Basic file structure for the Cora dataset which BoostSRL assumes for most operations.

A toy classification task where the goal is to predict whether a sentence contains the word “you”.

At the root of the tree, we see that [wordString(b, "you")] occurring is the best predictor. More interestingly, the model also shows that if both “a” and “b” occur early in the sentence, and “anon12035” is “Thank”, then it is also likely to be true.

The model was able to learn that the word “you” often occurs with the word “Thank” in the same sentence when “Thank” appears early in that sentence.