Cornell Language Acquisition Lab: Building an Electronic Library of Words of the World's Children

CORNELL LANGUAGE ACQUISITION LAB: BUILDING AN ELECTRONIC LIBRARY OF WORDS OF THE WORLD'S CHILDREN

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

HATCH

Reporting Frequency

Annual

Accession No.

0191071

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

Oct 1, 2001

Project End Date

Sep 30, 2005

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
CORNELL UNIVERSITY
(N/A)
ITHACA,NY 14853

Performing Department
HUMAN DEVELOPMENT

Non Technical Summary
A large amount of data regarding language development exists in the Cornell Language Acquisition Lab, and can now be made widely available if structured and digitized. The purpose of this project is to create a new Open Archived electronic Library of a large cross-linguistic relational database involving language development, including both English and Spanish children.

Animal Health Component

(N/A)

Research Effort Categories

Basic

70%

Applied

(N/A)

Developmental

30%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
802	6099	3070	70%
903	7410	3030	30%

Knowledge Area
903 - Communication, Education, and Information Delivery; 802 - Human Development and Family Well-Being;

Subject Of Investigation
6099 - People and communities, general/other; 7410 - General technology;

Field Of Science
3070 - Psychology; 3030 - Information and communication;

Goals / Objectives
A new open archive of language development data and related materials will be established through the creation of a new electronic library of Words of the World's Children by the Cornell Language Acquisition Lab (Cornell Department of Human Development, College of Human Ecology) in conjunction with Cornell Libraries. Language samples collected from children along varied points of language development will be structured, copied and converted to electronic format in order to become accessible through the world wide web and in order to create a permanent resource which can be continually expanded by current research in the area of child language acquisition. In the four years described in this project, coded natural speech samples from more than 200 children acquiring English in the US and from children acquiring Spanish in Peru (and other Spanish speaking countries) will be preserved, inventoried, digitized and linked to multimedia forms of Spanish and English child language development; these will establish a prototype for the subsequent archiving of data from many hundreds of monolingual and multilingual children in nearly 20 countries and languages around the world. These data exist in the CLAL now and are continually being expanded through research there. A set of Cornell Language Acquisition Lab Manuals will also be developed and made available electronically to allow researchers, students and teachers, as well as the general public, to both contribute to and benefit from scientific investigation of children's language acquisition. A structure for linking the resulting electronic database to an Open Archiving system, maintained and disseminated by Cornell Mann Library, will be developed, allowing national and international access to the database and materials in a form conforming to current Library developments.

Project Methods
The Cornell Language Acquisition Lab has begun to develop procedures for archiving and disseminating a large set of data and materials acquired (over more than 20 years) regarding child language acquisition in English and Spanish, as well as other languages. These procedures include the development of a web-based interface for the inventory, transcription and analysis of language data and language development, and the development of multimedia-based modules displaying these data. These new procedures and the software on which they are based will be developed in order to accomplish the objectives of this proposal. In turn, these procedures will be integrated with procedures being developed by Mann Library for Open Archiving of digital holdings and a conduit for exchange of information with the Cornell Library will be established. The large amount of language acquisition data which is involved in this project now exists in both the form of natural speech samples and experimental results from testing children's language production and comprehension at various developmental levels of language acquisition. The data derive from children acquiring language in many conditions: in cities (e.g., English in New York City, Spanish in Madrid) or villages (e.g., English in Ithaca, New York or Spanish in rural areas of Peru). Multimedia forms of these data include: audio tapes; sometimes video tapes; written transcripts (in various formats). Development of a metadata inventory system will allow search and comparison according to these factors. Students and faculty will collaborate throughout this project. Students will be hired to participate in the data inventory and transcription and coding processes involved in the project. They will be taught to use the new software, i.e., the Data Transcription and Analysis (DTA) tool, in order to capture and structure the language of children for its scientific study. Basic equipment necessary for processing language data (Audio visual equipment and CD Rom equipment) has been provided by a recent National Science Foundation grant to provide Instrumentation and Lab Improvement at Cornell for the purpose of engaging undergraduate students in scientific research involving language knowledge and acquisition. Students will use this available equipment. Students will also evaluate the new lab manuals. All records on all data will be located, copied and recorded and stored electronically. Existing audio tapes will be copied both on analogue tapes and on CD ROM discs. Computer transcriptions of the texts of natural speech samples will be transcribed for children acquiring English and Spanish, using the CLAL Data Transcription and Analysis Tool in order to enter these data into the general relational database. Maria Blume, a native speaker of Spanish, and co-author of many lab materials will be appointed as postdoctoral research associate and will participate in all aspects of the project, focusing on the Spanish component.

Progress 10/01/01 to 09/30/05

Outputs
Through this HATCH grant, the Cornell Language Acquisition Lab (CLAL) has inventoried its primary data in the area of language acquisition of English (over 900 audio tapes) covering all basic steps of language development. It has also inventoried much of its holdings of Spanish language development in children (from three countries, Peru, Puerto Rico and Spain.) In Spanish, language from 52 children between 1 and 5 years of age sampled in 72 sessions has been inventoried, Inventory records are being entered in terms of a metadata structure being developed with Cornells Mann Library. They involve a new web interface being developed by the Cornell Language Acquisition Lab for Data Transcription and Analysis. A manual for data digitization and archiving has been prepared. Prototypes of data digitization and archiving have been prepared for both English and Spanish child language acquisition. In conjunction with the electronic library of Words of the Worlds Children, the CLAL is also preparing materials which explicate best practices in the scientific area of study of child language and in the archiving of data in this area. A Research Manual: Scientific Methods for the Study of Language Acquisition, is now in preparation (B. Lust, M. Blume and T. Ogden). The scientific research methods now being developed in the CLAL led to a Cornell University Faculty Innovation in Teaching Grant through which multi-media materials are being prepared to supplement Cornell courses in language acquisition; they will also support distance learning in this area: Integrating Digital Multimedia Resources in Two Interdisciplinary Language Development Courses (B. Lust with Maria Blume.) The materials being developed in this FITG project include a research methods Manual and audiovisual modules teaching scientific methods of assessing childrens language production and comprehension. The CLAL, with Cornell Mann Library, also attained a Small Grant for Exploratory Research from NSF to further develop infrastructure for handling data in the field of language acquisition in a long term and general access perspective: Planning Information Infrastructure Through New Library-Research Partnership (B. Lust with Janet McCue, director of Mann Library). This includes developing infrastructure for the long term preservation, access and distribution of data in the area of language acquisition, and for structuring data preservation in Research Lab-University collaborations in general. Data from the CLAL, developed through our current HATCH grant, provides prototypes in the areas of English and Spanish and procedures. The CLAL also attained a small planning grant from NSF to develop infrastructure for an international and interdisciplinary Virtual Center for Language Acquisition, based on the materials created in the CLAL.

Impacts
This project has two forms of impact. (1) The research on language acquisition helps to explain the normal course of language development in the child, and can inform educational and medical treatments of the same, while research on multilingualism and its potential cognitive advantages in children can impact on policy issues regarding school and social policies regarding bilingualism. (2) The work building infrastructure between the Cornell Language Acquisition Lab and the University Mann Library can provide a model for other scientific fields regarding research lab-university library collaborations in the permanent management and dissemination of research data.

Publications

No publications reported this period

Progress 01/01/04 to 12/31/04

Outputs
Through the current HATCH grant, the Cornell Language Acquisition Lab (CLAL) has inventoried its primary data in the area of language acquisition of English (over 900 audio tapes) covering all basic steps of language development. It has also intentoried much of its holdings of Spanish language development in children (from three countries, Peru, Puerto Rico and Spain.) In Spanish, language from 52 children between 1 and 5 years of age sampled in 72 sessions has been inventoried, Inventory records are being entered in terms of a metadata structure being developed with Cornell's Mann Library. They involve a new web interface being developed by the Cornell Language Acquisition Lab for Data Transcription and Analysis. In addition, 20 sessions of child language data have been transcribed from the Spanish corpus. A manual for data digitization and archiving has been prepared. Prototypes of data digitization and archiving have been prepared for both English and Spanish child language acquisition. In conjunction with the electronic library of Words of the World's Children, the CLAL is also publishing materials which explicate 'best practices' in the scientific area of study of child language and in the archiving of data in this area. A Research Manual: Scientific Methods for the Study of Language Acquisition, is now in preparation. The scientific research methods now being developed in the CLAL have led to a current Cornell University Faculty Innovation in Teaching Grant through which multi-media materials are being prepared to supplement Cornell courses in language acquisition; they will also support distance learning in this area: 'Integrating Digital Multimedia Resources in Two Interdisciplinary Language Development Courses' (B. Lust with Maria Blume.) The materials being developed in this FITG project include a research methods Manual and audiovisual modules teaching scientific methods of assessing children's language production and comprehension. The CLAL, with Cornell Mann Library, attained a Small Grant for Exploratory Research to develop infrastructure for handling data in the field of language acquisition in a long term and general access perspective: 'Planning Information Infrastructure Through New Library-Research Partnership' (B. Lust with Janet McCue, director of Mann Library). The CLAL is working with the Cornell Mann Library staff (including metadata specialists and ontology web information specialists) through this current Small Grant for Exploratory Research (SGER) from the National Science Foundation SEIII program (Science and Engineering Information Integration and Informatics) to develop an infrastructure for the long term preservation, access and distribution of data in the area of language acquisition, and for structuring data preservation in Research Lab-University collaborations in general. The CLAL in conjunction with Mann Library has submitted a grant proposal to the National Endowment for the Humanities to support preservation and archiving of is holdings in the area of acquisition of South Asian languages: Hindi, Sinhala, Tulu, Malayalm, Oriya.

Impacts
The current Cornell Language Acquisition Lab FABIT (Faculty Innovation in Teaching) award and National Science Foundation awards depended crucially on the materials and data being prepared through the current HATCH grant to the Cornell Language Acquisition Lab. They allow extension of the electronic library being created both to educational purposes within the University and to internationally extended distance learning sites of education and collaborative research. Cornell progress in the area of lab-library collaboration and electronic library creation provided a model for other institutions at both the Rutgers and ALLC presentations given in 2003.

Publications

Lust, B., & Foley, C. (Eds.). 2004. Language Acquisition: The Essential Readings. Blackwell.
Lust, B. 2004. Facing Plato's Cave: Viewing the Shadows of Grammatical Competence. Commentary on Learnability and Linguistic Performance by Ken Drozd. Journal of Child Language. 31 (2). 484-488.

Progress 01/01/03 to 12/31/03

Outputs
Over 900 audiotapes of children acquiring English have been entered into a master inventory. Metadata on subjects and sessions have begun to be entered into a central relational database. Transcripts from 212 children acquiring English and 60 children acquiring Spanish have been entered into this database and are now being subject to reliability checks and phonetic editing. Design of this database (Cornell Data Transcription and Analysis (DTA) tool) and its execution as an interactive web interface are being developed. Design of metadata components of this database is proceeding in collaboration with Cornell A. Mann Library, and with Open Language Archives Community (OLAC) standards. A formal link has been made between OLAC and the Cornell Language Acquisition Lab. A 14 step process of data creation and preservation has been created. Two prototypes of data (English and Spanish child language, audio, video and written transcripts) are being prepared for access and dissemination by the Cornell A. Mann Library. A digitization system is being standardized, and digitization of English and Spanish data is proceeding gradually. A Manual reviewing best practice methods is under construction ( Lust, B., Blume, M., & Ogden, T. (in prep). Cornell University Virtual Linguistics Lab (VLL) Research Methods Manual: Scientific Methods for Study of Language Acquisition.)A Cornell University FABIT (Faculty Innovation in Teaching) Award. 'Integrating Digital multimedia resources in two interdisciplinary language development courses' (with Maria Blume) has been attained (2003), to allow extension to educational uses of multimedia data being created in the CLAL electronic library of language acquisition data and materials.A small planning grant from the National Science Foundation has just been supplemented, allowing development of a Virtual Center for the Study of Language Acquisition and a first international extension for it (South Korea). Collaborative research on child multilingualism has begun (Sujin Yang, Cornell, 2003.) Progress was reported in an invited address (November 6, 2003): The Web as Enabler: A Virtual Center Links Language Acquisition Researchers across Time and Place, given at Rutgers University. Livingston College in their Global Futures Symposia. Cornell unique collaboration between the CLAL and A.Mann Library was reported collaboratively at: 2003 (May 29-June 2) with Blume, M., Gair, J.,& Westbrooks, E.: Creating a Virtual Center as an International Web-based Interactive Infrastructure for Research and Teaching in the Language Sciences: A New Research and Library Collaboration. Association for Computers and the Humanities (ACH) and Association for Literacy and Linguistic Computing (ALLC), Web X: A Decade of the World Wide Web, at the University of Georgia, Athens.

Impacts
The current Cornell Language Acquisition Lab FABIT (Faculty Innovation in Teaching) award and National Science Foundation awards depended crucially on the materials and data being prepared through the current HATCH grant to the Cornell Language Acquisition Lab. They allow extension of the electronic library being created both to educational purposes within the University and to internationally extended distance learning sites of education and collaborative research. Cornell progress in the area of lab-library collaboration and electronic library creation provided a model for other institutions at both the Rutgers and ALLC presentations given in 2003.

Publications

Lust, B., & Foley, C. (Eds.). 2004. Language Acquisition: The Essential Readings. Blackwell.
Chien, Y.-C., Lust, B., & Chiang, C.-P. 2003. Chinese Children's Acquisition of Classifiers and Measure Words. Journal of East Asian Linguistics, 12, 91-120.
Foley, C., Nunez del Prado, Z., Barbier, I., & Lust, B. 2003. Knowledge of Variable Binding in VP Ellipsis: Language Acquisition Research and Theory Converge. Syntax, 6, no. 1, 52-83..
Dye, C., Foley, C., Blume, M. and Lust, B. 2003. Mismatches between morphology and syntax in first language acquisition suggest a syntax-first model. Boston University Child Language Development, #28. Boston, Mass. (October 31, 2003).

Progress 01/01/02 to 12/31/02

Outputs
Progress has been made on an archiving project intended to preserve and study a large amount of data from various stages and ages of children's language acquisition which exists in the Cornell Language Acquisition Lab. English and Spanish data from the lab's recording of children's language is being entered into a metadata based inventory system which is being developed in collaboration with Mann Library, and an Open Library Archiving Community. Hundreds of speech samples from both languages are being digitized from audio tape, preserved on CD, and metadata surrounding them has been entered into a database. A cross-linguistic transcription system for these language samples is being developed in conjunction with the development of new software to guide the student and researcher in the archiving and general research process. A general Lab Manual describing scientific research methods for the study of language acquisition is in preparation.

Impacts
The new archive being created of Library of Words of the World's Children in collaboration with Cornell's Mann Library is preserving and preparing for dissemination wide amounts of child language data across many languages which researchers across many labs in many countries can collaborate on. It thus will affect the field broadly. These data provide a central resource of the planned Virtual Center for the Study of Language Acquisition which is now being developed in the Cornell Language Acquisition Lab. The metadata system for inventory of language data being conducted in collaboration with Cornell Mann Library is establishing a prototype for Cornell and for the field at large. The Spanish and English child language data currently being archived will be central to current studies of childhood bilingualism and second language acquisition, as well as to investigations of normal and delayed language development.

Publications

Flynn, S. and Lust, B. 2002. A Minimalist Approach to L2 Solves a Dilemma of UG. In Cook, V. (ed), Portraits of the L2 User. Multilingual Matters, Ltd. Clevedon, U.K. pp. 93-120.
Santelmann, L., Berk, S., Somashekar, S., Austin, J and Lust, B. 2002. Dissociating Movement and Inflection: Continuity and Development in the Acquisition of Subject-Aux Inversion. Journal of Child Language 29, (4), 813-842.