In this section, we explore some basic objects in both languages. Some of the objects are comparable with each other while some are unique to a language.
R
comes with a vector
(including both numeric
and character
) class. The definition of a vector is using the c()
function.
x = c(1, 3, 5)
class(x)
## [1] "numeric"
print(x)
## [1] 1 3 5
y = c("a", "b", "c")
class(y)
## [1] "character"
print(y)
## [1] "a" "b" "c"
The definition of numeric and character vectors is through [...]
, a list
object.
x = [1, 3, 5]
type(x)
## <class 'list'>
print(x)
## [1, 3, 5]
y = ["a", "b", "c"]
type(y)
## <class 'list'>
print(y)
## ['a', 'b', 'c']
Each element in a R
vector must be of the same class. For example x = c(1, "a")
gives a character vector (because character is considered to be more general than numeric. However, in Python these elements can co-exist, for example in x = [1, "a"]
, type(x[0])
is an int
but type(x[1])
is a str
. Of course, R
also has a list
class, which can accommodate different classes for different elements, see below.
The biggest difference between R
and Python
vectors is that R
indexes from 1 and Python
indexes from 0.
As a result of this, in Python
:
x[0:3]
can be written as x[:3]
. One can think of this as “getting 3 elements from the vector”.Interestingly, Python
does not operate on vectors element-wise as you would expect in R
. To me, this makes Python
slower to write when one needs to write codes for data analytics.
x = c(1, 2, 3, 4, 5)
x[1]
## [1] 1
x[2]
## [1] 2
x[2:4]
## [1] 2 3 4
x[1:3]
## [1] 1 2 3
## No equivalent way in R
##
##
##
x[c(1, 3, 5)]
## [1] 1 3 5
##
##
##
##
##
##
##
##
##
x[x > 3]
## [1] 4 5
x = [1, 2, 3, 4, 5]
x[0]
## 1
x[1]
## 2
x[1:4]
## [2, 3, 4]
x[0:3]
## [1, 2, 3]
x[:3] ## Equivalent to x[0:3]
## [1, 2, 3]
b = [0, 2, 4]
## Using list comprehension
[x[i] for i in b]
## Using pandas
## [1, 3, 5]
import pandas as pd
list(pd.Series(x)[b])
## [1, 3, 5]
print(list(filter(lambda y: y > 3, x)))
## [4, 5]
negative indices carry very different meaning between the two languages. In R
, the element with the negation will be deleted and in Python
, the element counting from the end of the vector will be extracted.
extracting elements from the end of a vector/list is easier in Python
than R
because of this behaviour.
x[-3] ## Deleting the third element
## [1] 1 2 4 5
x[(length(x) - 1):length(x)]
## [1] 4 5
del x[2]
print(x)
## [1, 2, 4, 5]
x[-2:] ## Extracting the second last element to the end of the vector
## [4, 5]
list
in R
can very powerful. Each element of a list
can be literally any object of any class.
x = list(
numeric = 1,
character = "cats",
data_frame = data.frame(x = 1:3,
y = 2:4))
print(x)
## $numeric
## [1] 1
##
## $character
## [1] "cats"
##
## $data_frame
## x y
## 1 1 2
## 2 2 3
## 3 3 4
class(x)
## [1] "list"
lapply(x, class)
## $numeric
## [1] "numeric"
##
## $character
## [1] "character"
##
## $data_frame
## [1] "data.frame"
As we have seen above, list
in Python
behaves more or less like a vector.
x = [1, "cats"]
print(x)
## [1, 'cats']
type(x)
## <class 'list'>
print(list(map(lambda y: type(y), x)))
## [<class 'int'>, <class 'str'>]
A tuple
behaves like a list
, but it is immutable.
y = tuple(x);
x[0] = 10
print(x)
## [10, 'cats']
y[0] = 10
## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: 'tuple' object does not support item assignment
##
## Detailed traceback:
## File "<string>", line 1, in <module>
print(y)
## (1, 'cats')
Again, due to the indexing, Python will require an offset.
1:8
## [1] 1 2 3 4 5 6 7 8
list(range(1, 9))
## [1, 2, 3, 4, 5, 6, 7, 8]
It is very important to note that in Python, the =
assignment only allows one to make another reference to the same variable. Whereas in R, performing this assignment will overwrite the original object.
x = 1
y = x
print(y)
## [1] 1
y = 2
print(y)
## [1] 2
x = 1
y = x
print(y)
## 1
y = 2
print(y)
## 2
In R
, the class
function shows the class of the object. And almost all functions that converts an object of one class to another is in the form of as.xyz
.
x = 1
print(x)
## [1] 1
class(x)
## [1] "numeric"
x_chr = as.character(x)
print(x_chr) ## Notice the quotes
## [1] "1"
class(x_chr)
## [1] "character"
x_int = as.integer(x_chr)
print(x_int)
## [1] 1
class(x_int)
## [1] "integer"
In python
, the type
function shows the class of the object. The conversion between different classes has a specialised function.
x = 1
print(x)
## 1
type(x)
## <class 'int'>
x_chr = str(x)
print(x_chr)
## 1
type(x_chr)
## <class 'str'>
x_int = int(x_chr)
print(x_int)
## 1
type(x_int)
## <class 'int'>
Suppose we have a matching pair of information in the form of two vectors, and we wish to extract elements of a vector based on another vector. This can of course be done through the usual manipulations on vectors/for loops etc. However, both R
and Python
provide different solutions to this seemingly simple task.
In R
, no additional class of object is needed. A vector
can be given a vector of names, and the subsetting can be done using the names themselves.
x = c("alpha", "bravo", "charlie")
names(x) = c("a", "b", "c")
x[c("a", "c")]
## a c
## "alpha" "charlie"
In Python
, a very interesting object class is dict
(dictionary). A dictionary has two properties, a key and a value. The key behaves much like the names()
of a vector in R
.
d = {
"a": "alpha",
"b": "bravo",
"c": "charlie"
}
d["a"]
## 'alpha'
[d[i] for i in ["a", "c"]]
## ['alpha', 'charlie']
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] reticulate_1.20-9000
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 here_1.0.1 lattice_0.20-44 png_0.1-7
## [5] rprojroot_2.0.2 digest_0.6.27 grid_4.1.0 jsonlite_1.7.2
## [9] magrittr_2.0.1 evaluate_0.14 stringi_1.6.2 rlang_0.4.11
## [13] Matrix_1.3-3 rmarkdown_2.8 tools_4.1.0 stringr_1.4.0
## [17] xfun_0.23 yaml_2.2.1 compiler_4.1.0 htmltools_0.5.1.1
## [21] knitr_1.33
## python: /home/runner/.virtualenvs/r-reticulate/bin/python
## libpython: /home/runner/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome: /home/runner/.virtualenvs/r-reticulate:/home/runner/.virtualenvs/r-reticulate
## version: 3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01) [GCC 9.3.0]
## numpy: /home/runner/.virtualenvs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version: 1.19.5
##
## NOTE: Python version was forced by use_python function