Fusion of high-level symbolic reasoning with lower level signal-based reasoning has become a paramount research question with the recent progress of deep learning. We recently developed a combination of a higher-order symbolic reasoner, a planner, with a signal-based RL system, called RePReL, to effectively construct abstractions to accelerate learning in structured domains (with several interacting objects that cannot be efficiently represented using a fixed-length vector). RL in structured domains is inherently a difficult task and only a small number of solutions exist1. The RePReL system takes a first step in the direction of combining (relational) planning and RL in solving structured problems by using the planner to define a smaller set of (abstract) state-action spaces to allow for efficient learning by the lower level RL agent. The key success of the RePReL method lies in its capability of generalization to a varying number of objects.
The RePReL architecture, shown below, consists of three stacked modules: Symbolic Planner, Abstraction Reasoner, and RL agents.
RePReL Architecture
- Symbolic Planner: The symbolic planner uses the high-level planning domain description to decompose the goal into a sequence of temporally extended actions. Essentially, the planner decomposes the GRMDP into small sub-goal RMDPs.
- Abstraction Reasoner: The abstraction reasoner generates a task-specific abstract state representation using the dynamic first-order conditional influence statements provided by a domain expert.
- RL Agents: Finally, multiple reinforcement learners at the lowest level learn separate RL policies for each option in the abstract state space.
While RePReL was successful, it had an important assumption—the underlying features are discrete and homogeneous. The discrete assumption restricts the RePReL from exploiting the power of deep RL and the homogeneous assumption restricts the use of RePReL in scenarios where the data could arrive from multiple sources. In this work, we propose an extention to RePReL architecture that can easily adapt to hybrid data, i.e. a conglomeration of many varied types of data coming from different sources.
Hybrid Deep RePReL
In the Hybrid Deep RePReL architecture, shown below, we introduce two additional modules, an input preprocessing module and a merge module, to handle the combination of structured and unstructured information.
Hybrid Deep RePReL Architecture
The unstructured part of the state space is passed through the input pre-processing module., that generates a latent state embeddings for the unstructured data. The input pre-processing module can be a Convolutional Neural Network for image data, a transformer for text data, or a combination of both. The relevant state variables obtained from the abstraction reasoner and the latent predicates obtained from the input preprocessing modules are combined by the merge module and provided to the deep RL agents for learning.
Implementation of this work is available our lab github here.
