[CS.AI] Cross-Dataset Bloom Question Classification: A Co...

Abstract

Automatic Bloom's taxonomy classification of assessment questions can substantially reduce instructor workload, but labeling is subjective and teacher-dependent. Prior machine learning (ML) and deep learning (DL) approaches reported strong within-dataset results, yet were rarely evaluated in cross-dataset settings, leaving real-world generalizability unclear; meanwhile, LLM effectiveness for Bloom question classification has not been systematically studied.

We evaluated the cross-dataset generalization of existing ML/DL methods and assessed LLMs with multiple prompting strategies on five datasets. The best prompting strategy combined in-context examples with course-specific action verbs. Supervised ML/DL models degraded substantially on unseen datasets, whereas LLMs were more stable, suggesting a robust alternative across diverse educational contexts.

Based on the best prompting strategy, we also presented a lightweight UI that supports instructors in automatically classifying large question banks; a usability study indicated low workload and high usability.

Blogger's Review: This study highlights the potential of LLMs in the educational domain, particularly for tasks with high subjectivity. Compared to traditional supervised learning methods, LLMs demonstrate better adaptability and stability, making them a promising tool for educators. The lightweight UI design further reduces the workload for instructors, significantly enhancing educational efficiency.

[CS.AI] Cross-Dataset Bloom Question Classification: A Comparative Study of Supervised Models and Prompted LLMs

Abstract