Centre for Research on Bilingualism in Theory & Practice
Fredrik Karlsson and Ineke Mennen
Contact: i.mennen@bangor.ac.uk
This corpus was created as part of an ESRC-funded project ‘The ups and downs of learner intonation: a cross-language and longitudinal investigation of the intonation systems of L2 learners’, in collaboration with the Max Planck Institute for Psycholinguistics.
The e-LiLT corpus includes intonationally labeled speech of free conversations of (i) two groups of speakers from a structurally different source languages (SL), Punjabi and Italian, learning the same target languages (TL), English, and (ii) two groups with the same SL (Italian) learning a different TL (English and German). The original media files used in this corpus were extracted from the ESF Second Language Base corpus, which contains spontaneous second language acquisition data from 5 different TLs with 6 different SLs.
Annotations include an orthographic transcription of the utterance; segmentation into intonational phrases, words, and syllables; information about intonational function (statement, wh-question, yes-no-question, clarification, tag) and longitudinal moment (first or last 10-month cycle of recordings); intonational phonology, i.e. pitch accents (H*L, !H*L, H*, !H*, H*LH, L*!HL, L*H, and H*!H), and boundary tones (H%, L%, %); and labeling of accented syllables into number of phonemes in onset and coda and onset and coda type (sonorant vs. voiced obstruent vs. voiceless obstruent).
We are grateful to our collaborator Aoju Chen at the Max Planck Institute for Psycholinguistics Nijmegen for making the e-LiLT corpus available as part of their browsable corpora.
The corpus can be found at http://www.mpi.nl/resources/data. From that website, select the link ‘Browsable corpora at the MPI’. Please not that you will need to have Java installed & Cookies enabled to use this website. Then follow the nodes in the left-hand column to MPI corpora, Acquisition, L2 acquisition, and finally e-LiLT.
The corpus may currently be password protected, but this should be lifted in the near future. Please contact me at i.mennen@bangor.ac.uk in case you have difficulties accessing the corpus.
The corpus is freely downloadable to the public as long as it is used for non-commercial purposes. Please let us know when you make use of our e-LiLT corpus and make sure you include an acknowledgement in all reporting of work (in whichever form or format) that has made use of the e-LiLT corpus. Please contact i.mennen@bangor.ac.uk for an appropriate reference.