In this section, we explore some basic objects in both languages. Some of the objects are comparable with each other while some are unique to a language.
R comes with a vector (including both numeric and character) class. The definition of a vector is using the c() function.
x = c(1, 3, 5)
class(x)## [1] "numeric"print(x)## [1] 1 3 5y = c("a", "b", "c")
class(y)## [1] "character"print(y)## [1] "a" "b" "c"The definition of numeric and character vectors is through [...], a list object.
x = [1, 3, 5]
type(x)## <class 'list'>print(x)## [1, 3, 5]y = ["a", "b", "c"]
type(y)## <class 'list'>print(y)## ['a', 'b', 'c']Each element in a R vector must be of the same class. For example x = c(1, "a") gives a character vector (because character is considered to be more general than numeric. However, in Python these elements can co-exist, for example in x = [1, "a"], type(x[0]) is an int but type(x[1]) is a str. Of course, R also has a list class, which can accommodate different classes for different elements, see below.
The biggest difference between R and Python vectors is that R indexes from 1 and Python indexes from 0.
As a result of this, in Python:
x[0:3] can be written as x[:3]. One can think of this as “getting 3 elements from the vector”.Interestingly, Python does not operate on vectors element-wise as you would expect in R. To me, this makes Python slower to write when one needs to write codes for data analytics.
x = c(1, 2, 3, 4, 5)
x[1]## [1] 1x[2]## [1] 2x[2:4]## [1] 2 3 4x[1:3]## [1] 1 2 3## No equivalent way in R
##
##
##x[c(1, 3, 5)]## [1] 1 3 5##
##
##
##
##
##
##
##
##x[x > 3]## [1] 4 5x = [1, 2, 3, 4, 5]
x[0]## 1x[1]## 2x[1:4]## [2, 3, 4]x[0:3]## [1, 2, 3]x[:3] ## Equivalent to x[0:3]## [1, 2, 3]b = [0, 2, 4]
## Using list comprehension 
[x[i] for i in b] 
## Using pandas## [1, 3, 5]import pandas as pd
list(pd.Series(x)[b])## [1, 3, 5]print(list(filter(lambda y: y > 3, x)))## [4, 5]negative indices carry very different meaning between the two languages. In R, the element with the negation will be deleted and in Python, the element counting from the end of the vector will be extracted.
extracting elements from the end of a vector/list is easier in Python than R because of this behaviour.
x[-3] ## Deleting the third element ## [1] 1 2 4 5x[(length(x) - 1):length(x)]## [1] 4 5del x[2]
print(x)## [1, 2, 4, 5]x[-2:] ## Extracting the second last element to the end of the vector## [4, 5]list in R can very powerful. Each element of a list can be literally any object of any class.
x = list(
  numeric = 1,
  character = "cats",
  data_frame = data.frame(x = 1:3, 
                          y = 2:4))
print(x)## $numeric
## [1] 1
## 
## $character
## [1] "cats"
## 
## $data_frame
##   x y
## 1 1 2
## 2 2 3
## 3 3 4class(x)## [1] "list"lapply(x, class)## $numeric
## [1] "numeric"
## 
## $character
## [1] "character"
## 
## $data_frame
## [1] "data.frame"As we have seen above, list in Python behaves more or less like a vector.
x = [1, "cats"]
print(x)## [1, 'cats']type(x)## <class 'list'>print(list(map(lambda y: type(y), x)))## [<class 'int'>, <class 'str'>]A tuple behaves like a list, but it is immutable.
y = tuple(x);
x[0] = 10
print(x)## [10, 'cats']y[0] = 10## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: 'tuple' object does not support item assignment
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>print(y)## (1, 'cats')Again, due to the indexing, Python will require an offset.
1:8## [1] 1 2 3 4 5 6 7 8list(range(1, 9))## [1, 2, 3, 4, 5, 6, 7, 8]It is very important to note that in Python, the = assignment only allows one to make another reference to the same variable. Whereas in R, performing this assignment will overwrite the original object.
x = 1
y = x
print(y)## [1] 1y = 2
print(y)## [1] 2x = 1
y = x
print(y)## 1y = 2
print(y)## 2In R, the class function shows the class of the object. And almost all functions that converts an object of one class to another is in the form of as.xyz.
x = 1
print(x)## [1] 1class(x)## [1] "numeric"x_chr = as.character(x)
print(x_chr) ## Notice the quotes## [1] "1"class(x_chr)## [1] "character"x_int = as.integer(x_chr)
print(x_int)## [1] 1class(x_int)## [1] "integer"In python, the type function shows the class of the object. The conversion between different classes has a specialised function.
x = 1
print(x)## 1type(x)## <class 'int'>x_chr = str(x)
print(x_chr)## 1type(x_chr)## <class 'str'>x_int = int(x_chr)
print(x_int)## 1type(x_int)## <class 'int'>Suppose we have a matching pair of information in the form of two vectors, and we wish to extract elements of a vector based on another vector. This can of course be done through the usual manipulations on vectors/for loops etc. However, both R and Python provide different solutions to this seemingly simple task.
In R, no additional class of object is needed. A vector can be given a vector of names, and the subsetting can be done using the names themselves.
x = c("alpha", "bravo", "charlie")
names(x) = c("a", "b", "c")
x[c("a", "c")]##         a         c 
##   "alpha" "charlie"In Python, a very interesting object class is dict (dictionary). A dictionary has two properties, a key and a value. The key behaves much like the names() of a vector in R.
d = {
"a": "alpha",
"b": "bravo",
"c": "charlie"
}
d["a"]## 'alpha'[d[i] for i in ["a", "c"]]## ['alpha', 'charlie']## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] reticulate_1.20-9000
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        here_1.0.1        lattice_0.20-44   png_0.1-7        
##  [5] rprojroot_2.0.2   digest_0.6.27     grid_4.1.0        jsonlite_1.7.2   
##  [9] magrittr_2.0.1    evaluate_0.14     stringi_1.6.2     rlang_0.4.11     
## [13] Matrix_1.3-3      rmarkdown_2.8     tools_4.1.0       stringr_1.4.0    
## [17] xfun_0.23         yaml_2.2.1        compiler_4.1.0    htmltools_0.5.1.1
## [21] knitr_1.33## python:         /home/runner/.virtualenvs/r-reticulate/bin/python
## libpython:      /home/runner/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome:     /home/runner/.virtualenvs/r-reticulate:/home/runner/.virtualenvs/r-reticulate
## version:        3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01)  [GCC 9.3.0]
## numpy:          /home/runner/.virtualenvs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.19.5
## 
## NOTE: Python version was forced by use_python function