Mapit: Using Machine Learning And Text Mining Techniques To Develop A Mapping Tool For Veterinary Medicine Curriculum

MAPIT: USING MACHINE LEARNING AND TEXT MINING TECHNIQUES TO DEVELOP A MAPPING TOOL FOR VETERINARY MEDICINE CURRICULUM

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

ACTIVE

Funding Source

OTHER GRANTS

Reporting Frequency

Annual

Accession No.

1031619

Grant No.

2024-70003-41453

Cumulative Award Amt.

$150,000.00

Proposal No.

2023-05509

Multistate No.

(N/A)

Project Start Date

Dec 1, 2023

Project End Date

Nov 30, 2026

Grant Year

2024

Program Code

[ER]- Higher Ed Challenge

Recipient Organization
IOWA STATE UNIVERSITY
2229 Lincoln Way
AMES,IA 50011

Performing Department
(N/A)

Non Technical Summary
Preparing well-rounded veterinarians equipped with the required skill sets is essential for our society as they play a critical role in the health of animals, environmental protection, food safety, and public health. Veterinary medicine curriculum requires a great harmony across courses for students to be able to skillfully create mental models of the knowledge and skills they gain during their education and graduate as practice-ready professionals. However, faculty rarely know in adequate detail what students have encountered in other courses. This results in lost opportunities to build on prior knowledge, and can also lead to student disengagement when there are excessive gaps or redundancies. The overall objective of this proposal is to bring innovation to veterinary curriculum analysis by utilizing artificial intelligence (AI) approaches, which will not only provide a much-needed platform for analyzing the actual curriculum as taught, but will do so in an efficient and cost-effective manner. To meet our overall objective, we will pursue two specific objectives: 1) develop an online searchable database of the veterinary curriculum, and 2) develop an online mapping tool for curriculum analysis. The deliverable for Objective 1 will be an online database of the veterinary medicine curriculum where users can search for any phrases and identify which course(s) mentions that specific phrase in all the relevant course materials. The deliverable for Objective 2 will be an online mapping tool which will utilize artificial intelligence approaches to categorize the content of the curriculum and map it against competencies and program objectives.

Animal Health Component

50%

Research Effort Categories

Basic

Applied

50%

Developmental

50%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
903	7299	3020	100%

Knowledge Area
903 - Communication, Education, and Information Delivery;

Subject Of Investigation
7299 - Research equipment and methods, general/other;

Field Of Science
3020 - Education;

Keywords

veterinary medicine education

artificial intelligence

curriculum mapping

curriculum review

machi

Goals / Objectives
The overall objectiveof this projectis to bring innovation to curriculum mapping by utilizing artificial intelligence (AI) approaches including machine learning and text mining, which will not only provide a much-needed platform for analyzing the actual curriculum as taught, but will do so in an efficient and cost-effective manner.Ourlong-term goalis to create an automated sustainable system that hosts an online searchable database of the veterinary medicine curriculum that uses real classroom data. Users will be able to search any phrases to check if that phrase is covered in any of the courses in the curriculum.To meet our overall objective, we will pursue two specific objectives:Develop an online searchable database of the veterinary curriculum.Develop an online mapping tool for curriculum analysis.

Project Methods
Methodology to reach Specific Objective # 1: Develop an online searchable database of the veterinary curriculum. The goal for the research under this specific objective is to utilize real classroom data to develop a searchable online database of the veterinary curriculum available for faculty, students, and curriculum administrators. Our working hypothesis, based on previous literature and our preliminary work, is that we can use the transcripts of the lecture recordings to create a database of lecture content that can be tied back to actual recordings. Users, then, can search for a keyword and see in which courses that keyword was mentioned, and access the related materials (e.g. lecture video, course syllabus). We will follow a human-centered design approach and involve our stakeholders in testing and refining the searchable database to develop a user-friendly and functional website. The rationale for this objective is that an online searchable database will make the curriculum more transparent and will enable students to be able to search information for concepts covered in courses across the curriculum, and refresh their knowledge base. Faculty and curriculum administrators, on the other hand, will be able to identify which courses cover similar content and how that content is presented, so they can make evidence-based assumptions about the prerequisite knowledge students bring to specific courses rather than trying to guess what they have learned previously.To achieve the first objective of this project, we will follow a human-centered design approach to design and development of the online searchable database of the veterinary medicine curriculum by focusing on the potential users and create an innovative solution that is tailor-made to suit their needs (Abras et al., 2004). This will entail an iterative process as displayed in Figure 2. We will start by retrieving all course-related materials, including lecture recordings, transcripts, course syllabi, and course information already available in digital form on the course management site. Next, we will create a quick prototype using only a limited number of course materials to test if the system works. Once we have a working prototype, we will ask for feedback from a small number of users (students and faculty) to test the website. Based on the feedback, we will keep iterating, testing, and refining until we have fine-tuned our searchable database. Finally, we will make the database available for use. To assess if the searchable database reaches its objective, we will collect user traffic data to determine the number of clicks, downloads, and access to resources. Additionally, we will survey students and faculty members to examine how they used the database, and whether it was helpful in enhancing their understanding of how various components of the curriculum fit together.The expected result of this specific objective is to bring in a complete four-year veterinary medicine curriculum to the fingertips of faculty members and students. Students will be able to visualize how concepts and principles build upon each other across courses, and use the tool to refresh their knowledge base in a time-efficient manner. Faculty members will be able to locate where the prerequisite knowledge is covered in the curriculum, and make informed decisions about improvements at the course and curriculum level. In turn, this will improve the communication between instructors and help them create a cohesive curriculum that successfully prepares students for an ever-changing veterinary practice. Additionally, because the curriculum will be easily accessible, curriculum assessment will be a continuous effort rather than a one-time exercise.Methodology to reach Specific Objective # 2: Develop an online mapping tool for curriculum analysis. The goal of the research under this specific objective is to develop, implement, and evaluate a natural language processing (NLP) system that will automatically map curricular materials onto program objectives and outcomes. A corpus--collection of texts--of the curriculum will be created using all the available digital curricular materials (lecture transcripts, course syllabi, and course information documents), which will be extracted from the searchable database developed in Specific Objective #1. In collaboration with three faculty members from ISU CVM (see support letters) as content experts, a taxonomy of domain concepts representing program objectives and outcomes will be developed, based on AMVA COE Standard 9 (Curriculum) and Standard 11 (Outcomes Assessment), as well as the Program Objectives adopted by ISU CVM. A representative sample of curricular corpus will be built using standard corpus-linguistic methods to ensure that the sample is balanced and representative of the general population. Portions of the corpus that are representative of disciplines which are related to a single domain concept can be used for model training directly. However, most disciplines represent an overlap of multiple domain concepts. Therefore, the corpus will be manually annotated for the identified domain concepts, and the inter-annotator reliability will be controlled to ensure high quality of annotation.From the annotated corpus, two types of representations of the texts will be derived: (1) lexical representations, with raw and stemmed n-grams and skip-grams as features; and (2) deep-learning-based semantic representations (e.g. using Bidirectional Encoder Representations from Transformers [Devlin et al., 2018]). These representations will be combined into an engineered set of features to be used as input to a feature-based, machine-learning classifier (e.g. a Support Vector Machine [Zhang et al., 2008]). Following a supervised machine-learning approach with the manually annotated corpus as a gold standard, the classifier will be trained to recognize, based on the engineered features set, segments within the text of curricular materials that correspond to particular domain concepts. Due to the inherent ambiguity of natural language, classification uncertainties (i.e. instances when a text segment can be interpreted as corresponding to several domain concepts at once) are expected. In these cases, the classifier will yield a confidence metric associated with each of the predicted concepts. Standard cross-validation procedures will be employed to assess the performance of the classifier on portions of the manually annotated gold-standard corpus that have not been employed in the training of the classifier. Performance metrics of precision, recall, and F-score will be used to guide the refinement of the classifier.It is expected that the baseline classifier may not deliver acceptable performance. If that is the case, to improve the classifier, we will experiment with several machine-learning algorithms, fine-tune the engineered feature sets and learning algorithm parameters, identify confusing domain concepts, if any, and clarify their definitions with content experts. This experimental work will continue until acceptable performance of the classifier is attained. We will then follow a user-focused participatory design approach to develop a way of visualizing the output of the classifier for end users of the final system (instructors and students of veterinary medicine) both in terms of individual lecture-level output, and in terms of the horizontal and vertical relationships across courses.

Progress 12/01/23 to 11/30/24

Outputs
Target Audience:The target audience for this project is faculty members and students in the Doctor of Veterinary Medicine program at Iowa State University. As such, several faculty members and students utilized the tool during this reporting period. First, five faculty members were involved as usability participants to evaluate the preliminary versions of the curriculum search tool. They provided feedback and made recommendations to improve the functionality and the usability of the tool. After the revisions were made, the tool was made available to all faculty members in the college (approximately 200 faculty members). Several of these faculty members used the tool to identify what has been taught in other courses, or which courses covered certain concepts. The tool was also used to identify overlaps in the curriculum. As such, one course was voted to be removed from the curriculum as it had a very high overlap with a similar course. Three DVM students were invited to review the tool and provide feedback on the usefulness of the tool to help them make connections across courses and subject areas. These test users indicated that they tool would help them quickly identify where a concept was taught so that they couldgo back to their own notes to review and refresh their knowledge base. Based on this feedback, the tool was also made available to all VM4 students (approximately 160). Faculty members preferred that students in years 1 through 3 did not have access to the tool and all the lecture recordings involved in order to avoid having premature access to the content that they may have not learned yet. Therefore, it was decided that only VM4 students would have access to the tool to use it to review the content they have learned in the first three years of the curriculum. Additionally, 4th year students used it to study for the NAVLE (North American Veterinary Licensing Exam). We presented the functionality of the tool to curriculum administrators at University of Minnesota. They were impressed with the tool, and they would like to adapt it to their own context. Once we have the finalized version, we will publish the source code in github so that it can be easily downloaded and adapted. Changes/Problems:Thus far the project has moved as scheduled and planned, so there is no major changes to report. However, the university recently made changes to the multiple authentication system which is likely to influence how the search tool functions. We might have to take the tool down until the authentication migration is successfully completed. What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?We published one paper in the Journal of Veterinary Medical Education. We also present the tool at informal curriculum meetings with other veterinary schools. University of Minnesota has shown interest in getting the source code and develop their own curricular search tool. We created a Teaching Tip video for how to use the tool, and distributed to all faculty in the college via e-mail. What do you plan to do during the next reporting period to accomplish the goals?We plan to and revise and update the search tool based on the feedback we will collect from students and faculty. We plan to continue developing the mapping tool and test the accuracy including faculty and curriculum administrators.

Impacts
What was accomplished under these goals? During this reporting period, the first specific objective and part of the second specific objective was met. The curriculum search tool was developed and functional. It is available for users on a password protected website (https://curriculum.vetmed.iastate.edu) that utilizes the university supported multi-factor authentication. If the report readers are interested in seeing how the tool functions, they can watch the tutorial linked here. Even though the tool is currently functional we would like to continue making improvements based on the input from users per human-centered design principles. We are currently working on achieving the second specific objective which includes developing a curriculum mapping tool utilizing Retrieval-Augmented Generation (RAG) to align educational content with the Iowa State University College of Veterinary Medicine (ISU CVM) program objectives. RAG combines the strengths of information retrieval and generative AI models to provide accurate, context-aware mappings of lecture transcripts to predefined accreditation labels. The methodology involves retrieving relevant documents and lecture transcripts from a pre-indexed database using similarity measures such as cosine similarity and reranker models. The augmented query is fed into a generative AI model, such as Mistral-7B-Instruct-v0.2 for summarization and BGE-M3 for embedding, which generates responses that combine the model's internal knowledge with the newly retrieved information, ensuring both accuracy and relevance. The implementation is structured into three main parts: summarizing and embedding the entire transcript to create a comprehensive overview, scoring each chunk of the transcript individually to ensure detailed and context-aware analysis, and combining these scores using a weighted average to produce final relevance scores for each label. These relevance scores are then used to map the educational content to the ISU CVM program objectives, addressing challenges such as transcript length and context loss during chunking. To check the accuracy of the mapping, we compare system labeling with faculty manual labels. Currently, we have noticed some discrepancy between what faculty think they meet in their courses versus what the system maps the lecture content. In particular, areas that include professional skills such as communication, teamwork, and collaboration seems to be missed by the system. We are still in the process of optimizing the system and improving the accuracy of the mapping results.

Publications

Type: Peer Reviewed Journal Articles Status: Published Year Published: 2024 Citation: Karabulut-Ilgu, A. & Demir, S. (2024) Bringing the Veterinary Medicine Curriculum to the Fingertips of Faculty and Students: A Novel Curriculum Search and Analysis Tool, Journal of Veterinary Medical Education. My Role: Design and interpretation of results; drafting manuscript