The ShARC Leaderboard

Currently, we are running a competition on the end to end task of conversational question answering based on ShARC. Planning to submit a model for the end to end task? Submit your code on codalab.

In the future, we will also have competitions for subtasks (classification of answer, generating follow-up questions and scenario resolution). Stay tuned!

ShARC: End-to-end Task

# Model / Reference Affiliation Date Micro Accuracy[%] Macro Accuracy[%] BLEU-1 BLEU-4
# DGM Shanghai Jiao Tong University Jan 2021 77.4 81.2 63.3 48.4
# Discern (single model) The Chinese University of Hong Kong May 2020 73.2 78.3 64.0 49.1
# EMT Salesforce Research & CUHK Nov 2019 69.4 74.8 60.9 46.0
# EMT + entailment Salesforce Research & CUHK Mar 2020 69.1 74.6 63.9 49.5
# UrcaNet (ensemble) IBM Research AI Dec 2019 69.0 74.6 56.7 42.0
# E3 University of Washington Feb 2019 67.6 73.3 54.1 38.7
# BiSon (single model) NEC Laboratories Europe Aug 2019 66.9 71.6 58.8 44.3
# UrcaNet (single model) IBM Research AI Aug 2019 65.1 71.2 60.5 46.1
# [Anonymous] [Anonymous] Jul 2019 64.9 71.4 55.3 38.9
# BERT-QA University of Washington Feb 2019 63.6 70.8 46.2 36.3
# Baseline-CM Bloomsbury AI May 2018 61.9 68.9 54.4 34.4
# Baseline-NMT Bloomsbury AI May 2018 44.8 42.8 34.0 7.8