PRIME Technical Report 1: The Development and Validation of the Computer Science Concepts Assessment for Undergraduate Students (UG-CSCA): Preliminary Results
Background, Purposes and the Intended Uses of the Assessment
The Undergraduate Computer Science Concept Assessment (UG-CSCA) is intended to assess STEM undergraduate students’ understanding of basic computer science and programming concepts – variables, conditionals, loops, and algorithms. The validation process of this assessment was guided and informed by a Focal Knowledge, Skills, and Abilities (FKSAs) Framework proposed by Grover and Basu (2017) and the K-12 CS Framework (K–12 Computer Science Framework, 2016). Block-based programming is used as the context for each item in the UG-CSCA. Several studies suggest that block-based programming is effective and appropriate computer programming for novices (Grover, Pea, & Cooper, 2015; Weintrop & Wilensky, 2015), and thus aligns with the intention of this assessment.
The current version of the UG-CSCA was written for undergraduate students who are novices in computer science and programming. We believe that this assessment will be useful to instructors who teach introductory computer science and programming courses, as well as computer science education researchers. The instrument was designed for use in pre-intervention-post or longitudinal contexts, as well as for a diagnostic tool. We suggest providing 30-35 minutes for students to complete the assessment which consists of 26 multiple-choice questions.
The Process of the Development of the Assessment
The UG-CSCA was developed based on the FKSAs framework (Grover & Basu, 2017) and our prior validation study on the Middle Grades Computer Science Concepts Assessment (MG-CSCA, Rachmatullah et al., 2020). A total of 30 multiple choice questions were used for the initial version of the UG-CSCA. Ten of these items were taken from the MG-CSCA that we used as a baseline to develop the 20 additional questions. The MG-CSCA items were scored as the most difficult for middle grades students and used with the assumption that those items would then anchor the easy end of the difficulty scale for undergraduate students. Next, 20 new items were created with a focus on developing medium and hard-level questions to create a wider spread of difficulty levels within the instrument.
The validation process was conducted in three phases. First, the original 30 items were piloted with 74 STEM undergraduate students enrolling in introductory computer programming classes. We used this data to determine the range of item difficulty and to identify possible problematic items. We then conducted cognitive interviews with six of these students to focus on their cognitive processes in solving the problematic items and their interpretation of the directions. These interviews were then used to revise the problematic items. In the final phase, after the problematic items had been revised, we then administered the refined version of the UG-CSCA to 594 undergraduate students enrolled in one of two different introductory classes offered over four semesters (Spring 2019, Summer 2019, Fall 2019, and Spring 2020). This data set was used to validate the assessment further.
The combination of classical test theory (CTT) and Item Response Theory — 2 parameters logistic (IRT-2PL) — was used to validate the revised version of the assessment. Based on these analyses, we found four problematic items. We removed these four items, resulting in a total of 26 items remaining in the final version of UG-CSCA.
Evidence of Construct Validity and Reliability
The IRT-2PL and Cronbach’s alpha were used to gather evidence of construct validity and reliability. We assumed that the assessment is unidimensional, measuring only one construct – conceptual understanding of computer science. IRT-2PL was run first. IRT-2PL is a statistical method that can be used to predict students’ abilities in a particular construct based on two parameters of the items, namely difficulty and discrimination. Difficulty refers to how hard the question is for students to answer it correctly, and discrimination refers to how well the items differentiate low and high ability students (Baker & Kim, 2017). Baker and Kim (2017) suggest prioritizing the discrimination parameter because this parameter provides rich information about the item itself. The decisions on whether to retain or remove the questions are based on discrimination values, and we used the value of > 0.65 as the cutoff (Baker & Kim, 2017). We then ran a reliability test on the remaining items using Cronbach’s alpha. The IRT-2PL was performed using the “ltm” package (Rizopoulos, 2006) in R Studio (RStudio Team, 2018), and the reliability test was run in IBM SPSS 26 (IBM Corp, 2019).
Based on the analyses, we found four items that had discrimination values less than 0.65, ranging from 0.24 to 0.59. These items, I12, I18, I24, and Item_30, were removed and the analysis re-run. The second round of the analysis showed that all the remaining items had discrimination values greater than the cutoffs, indicating that those items can moderately differentiate low and high ability students. Table 1 presents all the difficulty and discrimination values for these items.
Figure 1. Item Information Curves (ICC) and Test Information Function
We also used Item Information Curves (ICC) to investigate the items that provide more information about the latent ability. The higher the discrimination value an item has, the higher information that item has regarding the latent ability. Figure 1a shows the ICC for the final items set for the UG-CSCA. For example, based on Figure 1a, I8 provides the most information for the lower ability students, around θ = -1, but almost no information for high ability students. On the contrary, I11 had lower discrimination value but covered a broader range of abilities. Test Information Function visualized in Figure 1b shows that the assessment provides the most information for slightly-lower-than average students, around θ = -1. Even though the assessment consisted of a broad variety of discrimination values, the assessment still lacks items with high information on high ability students. We plan on developing additional difficult items for the next version of the UG-CSCA.
For reliability, we found that the final version of UG-CSCA had a Cronbach’s alpha value of .868, indicating a satisfactory value (DeVellis, 2017).
Table 1. Psychometric properties of the UG-CSCA