test
Search publications, data, projects and authors

Conference

French

ID: <

http://hdl.handle.net/2078.1/134874

>

Where these data come from
Analysis of lexical differences between corpus: test or distance from Khi-2?

Abstract

Pearson’s Khi-2 test is probably the most popular statistical test in corpus linguists, especially where emphasis is placed on highlighting linguistic variations between corpus. For a number of years, its use has been challenged because of the large number of rejections of the zero hypothesis it produces when applied to large corpus. Oakes and Farrow (Literary and Linguistic Computing, 2007, 22, 85-99) proposed various adaptations to this test in order to make it more appropriate. By means of re-sampling procedures, this research demonstrates the severity of the problem and the inadequacy of the remedies proposed. This negative conclusion is consistent with the benefits of the matching analysis, which is probably the most classic approach to textual data analysis to deal with such issues.

Your Feedback

Please give us your feedback and help us make GoTriple better.
Fill in our satisfaction questionnaire and tell us what you like about GoTriple!