Ongoing Theses/Internships

Ongoing Theses/Internships

The DFKI MLT Talking Robots group is currently conducting the following thesis topics:

Exploring mention representations for coreference in dialogue

about

In this project we will investigate different ways of extracting and representing mentions for coreference resolution using dialogue data in English. We will experiment with a variety of features (e.g., embeddings, animacy, position in the document etc.) and measure their impact on coreference resolution. Our group already has the baseline models (e.g., Workspace Coreference System) that can be used for this task.

Related projects: Cora4NLP, IMPRESS 

Supervisors: Tatiana Anikina, Cennet Oguz 

Prerequisites: good programming skills, familiarity with neural networks and PyTorch; background in linguistics is an advantage. 

Candidate Researcher: Annalena Kohnert

Thesis Type: MSc

More Details

One-shot or Few-shot learning methods for coreference resolution

about

The goal of the project is to investigate the impact of meta-learning methods from one to few shot learning architectures for coreference resolution.  

Related project: IMPRESS 

Supervisor: Cennet Oguz , Natalia Skachkova

Prerequisites: ML and neural networks 

Candidate Researcher: Urs Peter

Thesis Type: MSc
More Details

Modelling features relevant for discourse deixis resolution in dialogue

about

Discourse deixis is defined as a reference to a discourse entity such as a proposition, description, event, speech act, etc. realized in a certain discourse segment, i.e. a chunk of a linguistic text, e.g., a sequence of clauses or utterances (Webber, 1991). Discourse deixis is especially prominent in dialogue. The exact boundaries of discourse segments are often hard to define even for humans. There exist not so many frameworks focusing on the identification and resolution of discourse deixis mentions. Many of them rely on a large number of mostly syntax-based hand-crafted features (e.g., Müller, 2008), and some are designed to work for a very specific domain (e.g., Byron, 2002). The absolute majority of the early approaches are purely rule-based (e.g., Navarretta, 2000; Eckert and Strube, 2000), and not all of them are actually implemented. Later systems (e.g., Strube and Müller, 2003; Müller 2008; Marasovic et al., 2017; Kobayashi et al., 2021, Anikina et al., 2021) employ machine learning methods, including most recent neural network models, such as a model by Lee et al. (2018) that was originally designed for co-reference resolution. Still, the best discourse deixis identification and resolution model, submitted for the CODI-CRAC Anaphora Resolution Shared Task 2021, was only able to achieve F1-score of 42.7%. Clearly, discourse deixis resolution remains a challenging NLP task. 

Thus, we offer a B.Sc. or MSc Thesis topic on discourse deixis resolution with a strong focus on research and data analysis. The topic assumes re-implementation of the SOTA discourse deixis resolution model and (particularly for MSc) extending it with additional features. Building own model from scratch is also possible. 

Related project: IMPRESS 

Supervisor: Natalia Skachkova 

Prerequisites:  Interest in analysis and systematization of linguistic phenomena; experience with machine learning, namely with (re-)implementation and training neural-networks-based language models.

Candidate Researcher: Qiankun Zheng

Type: Internship
More Details

Exploration of context incorporation strategies for the NLU task (Slot Filling) on dialogues

about

In this project, we will investigate different ways to incorporate contextual information in the Slot Filling task for dialogues in German and, potentially, other languages: English and/or Polish. These ways can include (but are not limited to) the following options: for a given utterance of speaker A, a) use the full preceding turn of the speaker B as context and give it as input to the model; b) assign specific label to the preceding turn of the speaker B (e.g.: “question_about_name”, “question_about_location” etc.) and use this label as context information, hence model input; c) obtain a fixed-size internal representation (vector) for the preceding turn of the speaker B (e.g. through a specially trained encoder) and use this representation as context. You are invited to suggest and test any other methods for context incorporation (e.g. keeping track of previously shared information in the dialogue, using memory networks etc.). The results will be compared with each other to identify the most advantageous strategy. Investigating effect of different context incorporation strategies for diverse model architectures is another possible research direction (we are currently working with Transformer-based architectures as well as RNN-based encoder-decoder models for Slot Filling). You will work with simulated dialogues (both written and spoken data are available) in the domain of emergency calls (112), identifying information crucial for the disposition of the forces. You will need to deal with constraints of a real-world application: the final system has to be time and memory efficient, it has to work in real time.  

Related project: NotAs. 

Supervisor: Anastasiia Kysliak

Prerequisites: good knowledge of German (desired as more data are available in German) or English, good programming skills, familiarity with neural networks and PyTorch.  

Thesis Type: MSc
More Details

Benchmarking Human-Robot Teamwork in Search and Rescue Operations

about

The broader perspective of the A-DRZ project is to bundle and further develop experience in robot-assisted disaster response. Within the A-DRZ project, researchers and end users from the firebrigade in Dortmund work closely together to explore how robots can be used to assist them by making their job easier, especially for those tasks that could be dangerous for humans. The T-AP5 section of the project categorically deals with the teamwork support consisting of three major components: a speech processing unit (extracts mission relevant information using state-of-the-art ASR and NLU systems), a mission knowledge manager (gather information from various units of the T-AP5, including the speech processor, create a semantic representation of the current state of the mission, perform semantic reasoning over it and create logs that are easily accessible by other units) and a process assistance system (provides information like available units and their status). The purpose of this thesis is not to test how well the robots work or how well the human to robot interpreter works. The objective of this work is to evaluate how well the overall teamwork support of T-AP5 works as an entire entity in facilitating the human-robot teamwork in A-DRZ. Since this project has its own set of objectives and priorities the evaluation of the human-robot teamwork using the T-AP5 needs to be custom-made. Moreover benchmarking human-robot teamwork is fairly new and particularly task dependent. This work thereby aims to create a set of performance benchmarks for human-robot teamwork in the T-AP5 section of the A-DRZ project.

Related project: A-DRZ

Supervisor: Ivana Kruijff-Korbayová

Prerequisites: Knowledge of ontologies, human-robot teamwork and benchmarking.

Thesis Type: MSc