Exploring mention representations for coreference in dialogue

Research questions:

– What is the best way to represent mentions? Is it enough to use a concatenation of different embeddings? If yes, which embeddings work best for which type of mentions?

– Can we improve the results by encoding/embedding other (linguistic) features?

– To what extent does context affect the choice of markables?

-How to represent the span with multipable tokens? sum, average, concat?

Data: we can use (part of) the data from the CODI-CRAC Anaphora Resolution Shared Task 2021 and/or the OneCommon dataset (https://github.com/Alab-NII/onecommon/tree/master/aaai2020).

Paper suggestions:

Improving coreference resolution by learning entity-level distributed representations, Clark and Manning, 2016. Link: https://arxiv.org/pdf/1606.01323

End-to-end neural coreference resolution, Lee et al, 2017. Link: https://arxiv.org/pdf/1707.07045

Integrating knowledge graph embeddings to improve mention representation for bridging anaphora resolution, Pandit et al., 2020. Link: https://aclanthology.org/2020.crac-1.7.pdf

CorefQA: Coreference Resolution as Query-based Span Prediction, Wu et al., 2020. Link: https://aclanthology.org/2020.acl-main.622/

Pre-training Mention Representations in Coreference Models, Varkel and Globerson, 2020. Link: https://aclanthology.org/2020.emnlp-main.687.pdf

Improving Span Representation for Domain-adapted Coreference Resolution, Gandhi et al., 2021. Link: https://aclanthology.org/2021.crac-1.13.pdf