The courses will be taught in English.
Instructor: Sacha Beniamine (University of Surrey)
This course is an introduction to data management for quantitative linguists, applied to inflectional morphology. The core of session 1 applies to the curation of machine-readable data in linguistics in general: What are good data practices ? Why are they needed ? The two next sessions zoom in on inflected lexicons, or collections of morphological paradigms.
Instructor: Gabriel Thiberge (Paris Cité University)
The first session will introduce the theoretical framwork(s) in which most of these themes became relevant, with a quick definition of the objects, goals and stakes of experimental sociolinguistics, as influenced by psycholinguistics. In line with a classical distinction in psycholinguistics, the second and third sessions will focus on both offline (questionnaires, judgments, matched-guise...) and online (eye-tracking, EEG...) methodologies used in experimental sociolinguistics, with an overview of recent innovative methods developed in the field to study language as a situation-specific interactive phenomenon, and a presentation of issues and challenges yet to be addressed and overcome.
Instructor: Dr. Natalia Levshina (Radboud University, The Netherlands)
This hands-on course introduces key statistical methods for data exploration and hypothesis testing in linguistic typology. The focus is on practical application, with methods illustrated through case studies based on cross-linguistic databases and corpora. Participants will work with provided R scripts and gain experience applying the techniques to real typological data.
Instructor: Julie Marsault (University of Paris 3 & Lacito)
This class will present issues with data gathering and annotation of an under-described and critically endangered language, Umóⁿhoⁿ (also Omaha ; Siouan, United-States).
Instructor: Aleksandra Miletić (CNRS/MoDyCo)
Estimates say that 35-42% of the world’s languages remain undocumented. This is in part due to the high cost of the manual processing of the collected data, the bottleneck often being the very first step: transcribing audio into text. However, recent developments in NLP have lead to impressive improvements across numerous tasks, including automatic speech recognition (ASR). In this course, we will explore the impact these advances have on the documentation of less studied languages. First, we will examine the notion of less resourced languages and the variety of sociolinguistic situations they evolve in. Next, we will explore the promises and pitfalls of recent (sometimes massively) multilingual ASR tools (Whisper, Omnilingual ASR) through hands-on experience with model fine-tuning, evaluation, and error analysis. Finally, we will examine the role of the “language-as-data” paradigm in the current NLP landscape and possible alternative ways forward.
Instructor: Philippe Muller (University of Toulouse, IRIT & GDR TAL)
This class will introduce the fundamentals of recent pretrained language models (PLM) and explore their relationship with traditional linguistic levels of analysis.
We will examine PLM language capabilities and discuss how they can support linguistic research, e.g. through generating data, automated annotations or as experimental models for language performance. In addition, we will address issues and challenges in using or studying these models, notably multilinguality and representativeness.
The class will feature practical lab exercises, some of which can be informed by participants' use cases.
Instructor: Christophe Parisse (CNRS/MoDyCo & HumaNum CORLI consortium)
Instructor: Céline Pozniak (University of Paris 8 & SFL)
This course will introduce you to the principles and methods of psycholinguistics. We will explore how empirical approaches allow us to test hypotheses about language processing, using both offline and online methods.
Chargement...