Search publications, data, projects and authors

Free full text available

Conference

English

ID: <

50|dedup_wf_001::91ad6311177f632d61f1d2ad10f757d1

>

Where these data come from

Date old texts: an approach to abrupt documents in French


Abstract

International audience; Automatic dating of ancient documents is a very important area of research for digital humanities applications. Many documents available via digital libraries do not have any dating or dating that is uncertain. Document dating is not only useful by itself but it also helps to choose the appropriate NLP tools (lemmatizer, POS tagger. . .) for subsequent analysis. This paper provides a dataset with thousands of ancient documents in French and present methods and evaluation metrics for this task. We compare character-level methods with token-level methods on two different datasets of two different time periods and two different text genres. Our results show that character-level models are more robust to noise than classical token-level models. The experiments presented in this article focused on documents written in French but we believe that the ability of character-level models to handle noise properly would help to achieve comparable results on other languages and more ancient languages in particular.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!