GoTriple - The RefLex project : documenting and exploring lexical resources in Africa

authors

Contributors

Langage, LAngues et Cultures d'Afrique (LLACAN) ; Institut National des Langues et Civilisations Orientales (Inalco)-Centre National de la Recherche Scientifique (CNRS),

Dynamique Du Langage (DDL) ; Université Lumière - Lyon 2 (UL2)-Centre National de la Recherche Scientifique (CNRS),

Pacific and Regional Archive for Digital Sources in Endangered Cultures (PARADISEC)

+ 3

Publisher

HAL CCSD

Abstract

International audience The RefLex project aims at testing a set of fundamental hypotheses concerning the structure and the evolution of African languages that are often mentioned in the literature, but whose validity was never demonstrated on an empirical basis. These hypotheses share the peculiarity that they can only be tested by means of a quantitative approach, which in turn presupposes the existence of a comprehensive documentation. The more than 2,200 languages spoken in Africa are characterized by great typological diversity, but also display some common characteristics, on each level of linguistic analysis, that go beyond the linguistic phyla and areas. So far, it has never been possible to conduct an in-depth study of these characteristics (e.g., logophoric pronouns, labiovelar consonants, etc.), due mainly to a lack of available data on the majority of African languages. Reflex solves this problem by fully exploiting the existing lexical documentation, which is in fact much larger than the grammatical documentation and yet often ignored in especially typological studies. One of the goals of RefLex is to make the scattered and hard to find lexical documentation available to interested researchers. Indeed, the lexical corpus of African languages, which is available on line for the whole scientific community, gives immediate access to a considerable wealth of data (as to june 2013, 460,000 lexical units for more than 370 languages, but we expect more than 1,000,000 entries within the next two years, representing 1,000 languages). This corpus will allow dramatic progress in several domains: typology, phylogeny, lexical semantics, lexical spread, areal linguistics. RefLex will be the largest online comparative database worldwide. Moreover, the database will be different from other existing databases at two crucial levels: (i) the possibility to have a direct online access to the original documents which are the basis of the digital data, which makes this corpus a true reference corpus, allowing corrections, checking, argued feedback, replication and even falsifications; (ii) a library of computational tools for the scientific use of the data, designed to facilitate research, retrieval and comparisons. The RefLex project thus conforms to the emerging domain of quantitative approaches to complex linguistic issues. It represents one of the very few projects based on data coming from various languages and the only one to enable easy manipulations of and experiments with the data itself. A set of statistical tools makes it possible to measure all kinds of combinatory distributions, including, but not restricted to, phonological correlations. An other bunch of tools is dedicated to phonological and lexical reconstruction, enabling the management of cognate sets and correspondance sets. Our talk presents the project, the existing tools, as well as their future developments.