ShARC is a Conversational Question Answering dataset focussing on question answering from texts containing rules. In order to understand the dataset and the task better, we provide explanation and visualisation of the data.
In this section we describe all the attributes of an instance in the dataset.
utterance_id: | Unique identification code for an instance in the dataset. | |
---|---|---|
tree_id: | A tree_id specifies a unique combination of a snippet and a question. There could be several instances with the same tree_id. This is because depending on the answer that a user provide to a follow-up question, the path of the conversation or the final answer can vary. | |
source_url: | The URL of the document containing the rule snippet. | |
snippet: | Input support document, i.e. often a paragraph which contains some rules. | |
question: | A question that can or cannot be answered from the snippet. | |
scenario: | Describes the context of the question. | |
history: | The conversation history, i.e. a set of follow-up questions and their corresponding answers. | |
evidence: | A list of relevant information that the system should extract from the user's scenario. This information should not be included in the input. | |
answer: | The desired output of a prediction model. |
NOTE: The input to a system should include snippet, question, scenario and history ONLY. Evidence and answer should not be included in the input to a prediction model.
Below, we provide some examples of different types of instances and provide explanation.
In this example, user's question is under-specified and therefore an answer can not be provided. Therefore, the system utterance or the answer is a question.
"utterance_id": "a115eadd97c4857abf76c07dc08bffd4c3b73976", "tree_id": "9ae3d25b11bc3a4be81f5884371de3b7c145ad48", "source_url": "https://www.gov.uk/winter-fuel-payment/eligibility", "snippet": "You can't get the payment if you live in Cyprus, France, Gibraltar or Greece", "question": "Can I get Winter Fuel Payment?", "scenario": "", "history": [], "evidence": [] "answer": "Do you live in Cyprus?"
In this example, user's question is under-specified. However, the conversation history contains enough information to derive an answer.
"utterance_id": "dd692dd2648dedb1de063241fc3323bb481d4842", "tree_id": "9ae3d25b11bc3a4be81f5884371de3b7c145ad48", "source_url": "https://www.gov.uk/winter-fuel-payment/eligibility", "snippet": "You can't get the payment if you live in Cyprus, France, Gibraltar or Greece", "question": "Can I get Winter Fuel Payment?", "scenario": "", "history": [ { "follow_up_question": "Do you live in Cyprus?", "follow_up_answer": "Yes" } ] "answer": "No"
In this example, more information is required to answer the question, given the history i.e. previous system questions and user responses.
"utterance_id": "c8720a4c4314d24b7a2351a172622e6a6002f8ec", "tree_id": "9ae3d25b11bc3a4be81f5884371de3b7c145ad48", "source_url": "https://www.gov.uk/winter-fuel-payment/eligibility", "snippet": "You can't get the payment if you live in Cyprus, France, Gibraltar or Greece", "question": "Can I get Winter Fuel Payment?", "scenario": "", "history": [ { "follow_up_question": "Do you live in Cyprus?", "follow_up_answer": "No" } ], "evidence": [] "answer": "Do you live in France?"
In this example, user's question contains enough context (scenario) to answer the question. The necessary extracted information (questions and answers) form the evidence.
"utterance_id": "107a420832a89ea5bc62c7d63fce4fd26d7e2ef6", "tree_id": "9ae3d25b11bc3a4be81f5884371de3b7c145ad48", "source_url": "https://www.gov.uk/winter-fuel-payment/eligibility", "snippet": "You can't get the payment if you live in Cyprus, France, Gibraltar or Greece", "question": "Can I get Winter Fuel Payment?", "scenario": "I am a 25 year old man from Cyprus and I currently live in Cyprus.", "history": [], "evidence": [ { "follow_up_question": "Do you live in Cyprus?", "follow_up_answer": "Yes" } ], "answer": "No"
In this example, the user has only provided a scenario to the operator after answering the two questions in the history. This is why some of the information that can be inferred from the scenario is not in evidence, but in the history. This replicates many real world cases where a user may provide relevant but redundant information in a QA conversation.
"utterance_id": "05784a5789aaf4e1ac12293bc16e79c38e9ec40c", "tree_id": "9ae3d25b11bc3a4be81f5884371de3b7c145ad48", "source_url": "https://www.gov.uk/winter-fuel-payment/eligibility", "snippet": "You can't get the payment if you live in Cyprus, France, Gibraltar or Greece.", "question": "Can I get Winter Fuel Payment?", "scenario": "I want to apply and right now I'm living in the UK.", "history": [ { "follow_up_question": "Do you live in Cyprus?", "follow_up_answer": "No" }, { "follow_up_question": "Do you live in France?", "follow_up_answer": "No" } ], "evidence": [ { "follow_up_question": "Do you live in Gibraltar?", "follow_up_answer": "No" }, { "follow_up_question": "Do you live in Greece?", "follow_up_answer": "No" } ], "answer": "Yes"