A sample student subset dataset containing scores and other information from the triennial testing of 15 year olds around the globe. Original data available from https://www.oecd.org/pisa/data/.

Format

A tibble of the following variables

  • year: Year of the PISA data. Factor.

  • country: Country 3 character code. Note that some regions/territories are coded as country for ease of input. Factor.

  • school_id: The school identification number, unique for each country and year combination. Factor.

  • student_id: The student identification number, unique for each school, country and year combination. Factor.

  • mother_educ: Highest level of mother's education. Ranges from "less than ISCED1" to "ISCED 3A". Factor. Note that in 2000, all entries are missing.

  • father_educ: Highest level of father's education. Ranges from "less than ISCED1" to "ISCED 3A". Factor. Note that in 2000, all entries are missing.

  • gender: Gender of the student. Only "male" and "female" are recorded. Factor. Note that we call this variable gender and not sex as this term was used in the OECD PISA database.

  • computer: Possession of computer. Only "yes" and "no" are recorded. Factor.

  • internet: Access to internet. Only "yes" and "no" are recorded. Factor.

  • math: Simulated score in mathematics. Numeric.

  • read: Simulated score in reading. Numeric.

  • science: Simulated score in science. Numeric.

  • stu_wgt: The final survey weight score for the student score. Numeric.

  • desk: Possession of desk to study at. Only "yes" and "no" are recorded. Factor.

  • room: Possession of a room of your own. Only "yes" and "no" are recorded. Factor.

  • dishwasher: Possession of a dishwasher. Only "yes" and "no" are recorded. Factor. Note that in 2015 and 2018, all entries are missing.

  • television: Number of televisions. "0", "1", "2" are code for no, one and two TVs in the house. "3+" codes for three or more TVs. Factor. Note that in 2003, all entries are missing.

  • computer_n: Number of computers. "0", "1", "2" are code for no, one and two computers in the house. "3+" codes for three or more computers. Factor. Note that in 2003, all entries are missing.

  • car: Number of cars. "0", "1", "2" are code for no, one and two cars in the house. "3+" codes for three or more cars Factor. Note that in 2003, all entries are missing.

  • book: Number of books. Factor. Note that encoding is different in the years 2000 and 2003 compared to all other years. Factor. Evaluate table(student$book, student$year) for a demo.

  • wealth: Family wealth. Numeric. Note that in 2003, all entries are missing.

  • escs: Index of economic, social and cultural status. Numeric.

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
data(student_subset_2000)
data(student_subset_2003)
dplyr::bind_rows(
student_subset_2000,
student_subset_2003
)
#> # A tibble: 4,200 × 22
#> # Groups:   country [49]
#>    year  country school_id studen…¹ mothe…² fathe…³ gender compu…⁴ inter…⁵  math
#>    <fct> <fct>   <fct>     <fct>    <fct>   <fct>   <fct>  <fct>   <fct>   <dbl>
#>  1 2000  ALB     2016      771      NA      NA      female NA      NA        NA 
#>  2 2000  ALB     1093      428      NA      NA      male   NA      no        NA 
#>  3 2000  ALB     11074     2805     NA      NA      male   NA      no       211.
#>  4 2000  ALB     25058     4366     NA      NA      female NA      no       267.
#>  5 2000  ALB     5112      1606     NA      NA      female NA      no       676.
#>  6 2000  ALB     2006      769      NA      NA      female NA      no       508.
#>  7 2000  ALB     2130      872      NA      NA      male   NA      no        NA 
#>  8 2000  ALB     1161      673      NA      NA      male   NA      no       343.
#>  9 2000  ALB     23143     4067     NA      NA      female NA      no        NA 
#> 10 2000  ALB     1173      721      NA      NA      male   NA      no       388.
#> # … with 4,190 more rows, 12 more variables: read <dbl>, science <dbl>,
#> #   stu_wgt <dbl>, desk <fct>, room <fct>, dishwasher <fct>, television <fct>,
#> #   computer_n <fct>, car <fct>, book <fct>, wealth <dbl>, escs <dbl>, and
#> #   abbreviated variable names ¹​student_id, ²​mother_educ, ³​father_educ,
#> #   ⁴​computer, ⁵​internet