What?

We consider the problem of human-machine collaborative problem solving as a planning task coupled with natural language communication. For this, we propose a task of collaboratively building target structures in a Minecraft environment. Here, two players, an architect (played by human) and a builder (machine), collaborate and communicate using natural language via chat interface.

The architect (shown as a human icon above) has access to the target structure and can see the current state in the build region. The builder (Steve from Minecraft) can move in the build region and place/remove blocks. The builder does not have access to the target structure. The architect has to describe the structure to the builder, via chat interface.


Why?

Human-machine collaborative planning and problem solving is quite challenging as it requires shared perception of the world, sophisticated language understanding, fluent execution, bi-directional communication and contextual understanding.

For a successful target structure construction, the architect must decompose the target structure to smaller structures that builder knows how to construct. The builder must interpet the instruction in context of the current world and plan the sequence of actions. The conundrums posed by the our building task are

  1. the communication between the architect and the builder is inherently bi-directional, as seen in the image above
  2. the builder should be able to seek clarifications as required
  3. both players must share some initial structures in the vocabulary and expand the vocabulary with experience.

Our task highlights the key challenges of the collaborative planning problem: bi-directional communication, contextual understanding, composable vocabulary and ability to induce new, rich concepts based on limited interaction and experience.


How?

Our framework consists of three main components that interact with the Minecraft Simulator. A natural language engine that parses the language utterances to a formal representation and vice-versa, a concept learner that induces generalized concepts for plans based on limited interactions with the user, and a planner that solves the task based on human interaction. More details on each of this component can be found in the paper.


The following video demonstrates our framework.

Citation

If you find this work useful, please provide consider using the following reference or BibTeX.

Harsha Kokel, Mayukh Das, Rakibul Islam, Julia Bonn, Jon Cai, Soham Dan, Anjali Narayan-Chen, Prashant Jayannavar, Janardhan Rao Doppa, Julia Hockenmaier, Sriraam Natarajan, Martha Palmer, and Dan Roth. (2021) Human-guided Collaborative Problem Solving: A Natural Language based Framework. In: ICAPS 2021.

@inproceedings{KokelDIBCDNJDHNPR21,
  author = {Harsha Kokel, Mayukh Das, Rakibul Islam, Julia Bonn, Jon Cai, Soham Dan, Anjali Narayan-Chen, Prashant Jayannavar, Janardhan Rao Doppa, Julia Hockenmaier, Sriraam Natarajan, Martha Palmer, Dan Roth},
  title = {Human-guided Collaborative Problem Solving: A Natural Language based Framework},
  year = {2021},
  booktitle = {Thirty First International Conference on Automated Planning and Scheduling ({ICAPS})}
}

Acknowledgements

We gratefully acknowledge the support of CwC Program Contract W911NF-15-1-0461 with the US Defense Advanced Research Projects Agency (DARPA) and the Army Research Office (ARO). Any opinions, findings and conclusion or recommendations expressed in this material are those of the authors and do not necessarily reflect the view of the DARPA, ARO or the US government.