test
Search publications, data, projects and authors

Free full text available

Conference

French

ID: <

10670/1.i36doi

>

Where these data come from
Construction of multilingual corpus: State of art

Abstract

National audience Multilingual corpus are used in several branches of automatic language processing. This article provides an overview of the work under automatic construction of these corpus. We deal with this by first providing an overview of different perceptions of comparability. We then look at the main approaches to the calculation of similarity, construction and evaluation developed in the field. We note that the calculation of textual similarity is generally based on corpus statistics, the structure of ontological resources or the combination of these two approaches. In a multilingual context with the use of a multilingual dictionary or an automatic translator, many problems arise. The exploitation of a multilingual ontological resource seems to be a solution. In terms of classification, the issue of adding documents to the initial database without affecting the quality of clusters remains open.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!