In this section, we explore some basic objects in both languages. Some of the objects are comparable with each other while some are unique to a language.


Vectors

R

R comes with a vector (including both numeric and character) class. The definition of a vector is using the c() function.

x = c(1, 3, 5)
class(x)
## [1] "numeric"
print(x)
## [1] 1 3 5
y = c("a", "b", "c")
class(y)
## [1] "character"
print(y)
## [1] "a" "b" "c"

Python

The definition of numeric and character vectors is through [...], a list object.

x = [1, 3, 5]
type(x)
## <class 'list'>
print(x)
## [1, 3, 5]
y = ["a", "b", "c"]
type(y)
## <class 'list'>
print(y)
## ['a', 'b', 'c']

A note of difference

Each element in a R vector must be of the same class. For example x = c(1, "a") gives a character vector (because character is considered to be more general than numeric. However, in Python these elements can co-exist, for example in x = [1, "a"], type(x[0]) is an int but type(x[1]) is a str. Of course, R also has a list class, which can accommodate different classes for different elements, see below.


Subsetting vectors

The biggest difference between R and Python vectors is that R indexes from 1 and Python indexes from 0.

As a result of this, in Python:

Interestingly, Python does not operate on vectors element-wise as you would expect in R. To me, this makes Python slower to write when one needs to write codes for data analytics.

R

x = c(1, 2, 3, 4, 5)
x[1]
## [1] 1
x[2]
## [1] 2
x[2:4]
## [1] 2 3 4
x[1:3]
## [1] 1 2 3
## No equivalent way in R
##
##
##

Extracting multiple elements

x[c(1, 3, 5)]
## [1] 1 3 5
##
##
##
##
##
##
##
##
##

Filtering using logical

x[x > 3]
## [1] 4 5

Python

x = [1, 2, 3, 4, 5]
x[0]
## 1
x[1]
## 2
x[1:4]
## [2, 3, 4]
x[0:3]
## [1, 2, 3]
x[:3] ## Equivalent to x[0:3]
## [1, 2, 3]

Extracting multiple elements

b = [0, 2, 4]

## Using list comprehension 
[x[i] for i in b] 

## Using pandas
## [1, 3, 5]
import pandas as pd
list(pd.Series(x)[b])
## [1, 3, 5]

Filtering using logical

print(list(filter(lambda y: y > 3, x)))
## [4, 5]

Negative indices

R

x[-3] ## Deleting the third element 
## [1] 1 2 4 5
x[(length(x) - 1):length(x)]
## [1] 4 5

Python

del x[2]
print(x)
## [1, 2, 4, 5]
x[-2:] ## Extracting the second last element to the end of the vector
## [4, 5]

List and tuples

R

list in R can very powerful. Each element of a list can be literally any object of any class.

x = list(
  numeric = 1,
  character = "cats",
  data_frame = data.frame(x = 1:3, 
                          y = 2:4))

print(x)
## $numeric
## [1] 1
## 
## $character
## [1] "cats"
## 
## $data_frame
##   x y
## 1 1 2
## 2 2 3
## 3 3 4
class(x)
## [1] "list"
lapply(x, class)
## $numeric
## [1] "numeric"
## 
## $character
## [1] "character"
## 
## $data_frame
## [1] "data.frame"

Python

As we have seen above, list in Python behaves more or less like a vector.

x = [1, "cats"]

print(x)
## [1, 'cats']
type(x)
## <class 'list'>
print(list(map(lambda y: type(y), x)))
## [<class 'int'>, <class 'str'>]

A tuple behaves like a list, but it is immutable.

y = tuple(x);

x[0] = 10
print(x)
## [10, 'cats']
y[0] = 10
## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: 'tuple' object does not support item assignment
## 
## Detailed traceback:
##   File "<string>", line 1, in <module>
print(y)
## (1, 'cats')

Consecutive numbers

Again, due to the indexing, Python will require an offset.

R

1:8
## [1] 1 2 3 4 5 6 7 8

Python

list(range(1, 9))
## [1, 2, 3, 4, 5, 6, 7, 8]

Variable assignment

It is very important to note that in Python, the = assignment only allows one to make another reference to the same variable. Whereas in R, performing this assignment will overwrite the original object.

R

x = 1
y = x
print(y)
## [1] 1
y = 2
print(y)
## [1] 2

Python

x = 1
y = x
print(y)
## 1
y = 2
print(y)
## 2

Conversion functions

R

In R, the class function shows the class of the object. And almost all functions that converts an object of one class to another is in the form of as.xyz.

x = 1
print(x)
## [1] 1
class(x)
## [1] "numeric"
x_chr = as.character(x)
print(x_chr) ## Notice the quotes
## [1] "1"
class(x_chr)
## [1] "character"
x_int = as.integer(x_chr)
print(x_int)
## [1] 1
class(x_int)
## [1] "integer"

Python

In python, the type function shows the class of the object. The conversion between different classes has a specialised function.

x = 1
print(x)
## 1
type(x)
## <class 'int'>
x_chr = str(x)
print(x_chr)
## 1
type(x_chr)
## <class 'str'>
x_int = int(x_chr)
print(x_int)
## 1
type(x_int)
## <class 'int'>

Named vector and dictionary

Suppose we have a matching pair of information in the form of two vectors, and we wish to extract elements of a vector based on another vector. This can of course be done through the usual manipulations on vectors/for loops etc. However, both R and Python provide different solutions to this seemingly simple task.

R

In R, no additional class of object is needed. A vector can be given a vector of names, and the subsetting can be done using the names themselves.

x = c("alpha", "bravo", "charlie")
names(x) = c("a", "b", "c")
x[c("a", "c")]
##         a         c 
##   "alpha" "charlie"

Python

In Python, a very interesting object class is dict (dictionary). A dictionary has two properties, a key and a value. The key behaves much like the names() of a vector in R.

d = {
"a": "alpha",
"b": "bravo",
"c": "charlie"
}
d["a"]
## 'alpha'
[d[i] for i in ["a", "c"]]
## ['alpha', 'charlie']

References

Session info

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] reticulate_1.20-9000
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        here_1.0.1        lattice_0.20-44   png_0.1-7        
##  [5] rprojroot_2.0.2   digest_0.6.27     grid_4.1.0        jsonlite_1.7.2   
##  [9] magrittr_2.0.1    evaluate_0.14     stringi_1.6.2     rlang_0.4.11     
## [13] Matrix_1.3-3      rmarkdown_2.8     tools_4.1.0       stringr_1.4.0    
## [17] xfun_0.23         yaml_2.2.1        compiler_4.1.0    htmltools_0.5.1.1
## [21] knitr_1.33
## python:         /home/runner/.virtualenvs/r-reticulate/bin/python
## libpython:      /home/runner/.local/share/r-miniconda/envs/r-reticulate/lib/libpython3.6m.so
## pythonhome:     /home/runner/.virtualenvs/r-reticulate:/home/runner/.virtualenvs/r-reticulate
## version:        3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01)  [GCC 9.3.0]
## numpy:          /home/runner/.virtualenvs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.19.5
## 
## NOTE: Python version was forced by use_python function