This dataset provides a clean and processed subset of the OECD PISA student data for the years 2000-2022. The original data is sourced from https://www.oecd.org/en/about/programmes/pisa/pisa-data.html and has been prepared for analysis. A sampling of 50 students per country (for OECD countries) has been included for each year. The data curation and sampling process are documented in https://github.com/kevinwang09/learningtower_masonry/blob/master/Code/student_bind_rows.Rmd

Format

A tibble of the following variables

  • year: Year of the PISA data. Integer.

  • country: Country 3 character code. Note that some regions/territories are coded as "country" for ease of input. Factor.

  • school_id: Unique school identifier for each country and year. Character.

  • student_id: Unique student identifier within each school. Integer.

  • mother_educ: Mother's highest level of education, from "less than ISCED1" to "ISCED 3A". Factor.

  • father_educ: Father's highest level of education, from "less than ISCED1" to "ISCED 3A". Factor.

  • gender: Gender of the student. Only "male" and "female" are recorded. Factor. Note that we call this variable gender and not sex as this term was used in the OECD PISA database.

  • computer: Possession of computer. Only "yes" and "no" are recorded. Factor.

  • internet: Access to internet. Only "yes" and "no" are recorded. Factor.

  • math: Simulated score in mathematics. Numeric.

  • read: Simulated score in reading. Numeric.

  • science: Simulated score in science. Numeric.

  • stu_wgt: The final survey weight score for the student score. Numeric.

  • desk: Possession of desk to study at. Only "yes" and "no" are recorded. Factor.

  • room: Possession of a room of your own. Only "yes" and "no" are recorded. Factor.

  • dishwasher: Possession of a dishwasher. Only "yes" and "no" are recorded. Factor. Note that in 2015 and 2018, all entries are missing.

  • television: Number of televisions. "0", "1", "2" are code for no, one and two TVs in the house. "3+" codes for three or more TVs. Factor. Note that in 2003, all entries are missing.

  • computer_n: Number of computers. "0", "1", "2" are code for no, one and two computers in the house. "3+" codes for three or more computers. Factor. Note that in 2003, all entries are missing.

  • car: Number of cars. "0", "1", "2" are code for no, one and two cars in the house. "3+" codes for three or more cars Factor. Note that in 2003, all entries are missing.

  • book: Number of books. Factor. Note that encoding is different in the years 2000 and 2003 compared to all other years. Factor. Evaluate table(student$book, student$year) for a demo.

  • wealth: Index of family wealth. Numeric. Note that in 2003, all entries are missing.

  • escs: Index of economic, social and cultural status. Numeric.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
data(student_subset_2000)
data(student_subset_2003)
dplyr::bind_rows(
student_subset_2000,
student_subset_2003
)
#> # A tibble: 3,100 × 22
#>     year country school_id student_id mother_educ father_educ gender computer
#>    <int> <fct>   <chr>          <int> <fct>       <fct>       <fct>  <fct>   
#>  1  2000 AUS     4187            2901 NA          NA          male   NA      
#>  2  2000 AUS     2120             999 NA          NA          male   NA      
#>  3  2000 AUS     2197            1278 NA          NA          female NA      
#>  4  2000 AUS     4008            2315 NA          NA          female NA      
#>  5  2000 AUS     3060            1660 NA          NA          male   NA      
#>  6  2000 AUS     8004            4845 NA          NA          female NA      
#>  7  2000 AUS     1108             200 NA          NA          male   NA      
#>  8  2000 AUS     5039            3202 NA          NA          female NA      
#>  9  2000 AUS     3114            1858 NA          NA          female NA      
#> 10  2000 AUS     3158            1915 NA          NA          male   NA      
#> # ℹ 3,090 more rows
#> # ℹ 14 more variables: internet <fct>, math <dbl>, read <dbl>, science <dbl>,
#> #   stu_wgt <dbl>, desk <fct>, room <fct>, dishwasher <fct>, television <fct>,
#> #   computer_n <fct>, car <fct>, book <fct>, wealth <dbl>, escs <dbl>