test
Search publications, data, projects and authors

Thesis

French

ID: <

10670/1.lej2dz

>

Where these data come from
Open domain question-answering : relevant document selection geared to the question

Abstract

This thesis aims at defining a unified adaptation of the document selection and answer extraction strategies, based on the document and question types, in a Question-Answering (QA) context. The solution is integrated in RITEL (a LIMSI QA system) to assess the contribution. We develop and investigate a method based on an Information Retrieval approach for the selection of relevant documents in QA. The method is based on a language model and a binary model of textual classification in relevant or irrelevant category. It is used to filter unusable documents for answer extraction by matching lists of a priori relevant documents to the question type automatically. First, we present the method along with its underlying models and we evaluate it on the QA task with RITEL in French. The evaluation is done on a corpus of 500,000 unsegmented web pages with factoid questions provided by the Quaero program (i.e. evaluation at the document level or D-level). Then, we evaluate the methodon segmented web pages (i.e. evaluation at the segment level or S-level). The idea is that information content is more consistent with segments, which facilitates answer extraction. D-filtering brings a small improvement over the baseline (no filtering). S-filtering outperforms both the baseline and D-filtering but not significantly. Finally, we study at the S-level the links between RITEL’s performances and the key parameters of the method. In order to apply the method on segments, we created a system of web page segmentation. We present and evaluate it on the QA task with the same corpora used to evaluate our document selection method. This evaluation follows the former hypothesis and measures the impact of natural web page variability (in terms of size and content) on RITEL in its task. In general, the experimental results we obtained suggest that our IR-based method helps a QA system in its task, however further investigations should be conducted – especially with larger corpora of questions – to make them significant.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!