An introduction to working with R and Python (2024)

This article is intended to be an introduction for working with R within Python

An introduction to working with R and Python (3)

When I was a university student, the statistics courses (Survival Analysis, Multivariate Analysis, etc…) were taught in R. Nevertheless, as I wished to learn Data Science, I choose Python because it seemed “spooky” to me.

By working only with Python, I stumble upon the need of implementing some Statistical techniques like the Grubb Test for outliers, Markov Chain Monte Carlo for simulations or Bayesian Networks for synthetic data. Thus, this article is intended to be an introductory guide to incorporate R in your workflow as a Python Data Scientist. In case, you’ll like to integrate Python in your workflow as an R Data Scientist, the reticulate package is useful, check out [1].

We choose the rpy2 framework, other options are pyRserve or pypeR, because it runs an embedded R. In other words, it allows communication between Python and R objects through rpy2.robjects, we’ll see later a particular example when converting a pandas DataFrame to an R DataFrame. If you get stuck in any of the below steps read the official documentation or the references.

We’ll cover three steps appropriate to start working with R within Python. Finally, we’ll do a practical example and cover further functionalities that the rpy2 package allows you to handle.

  1. Install R packages.
  2. Importing packages and functions from R.
  3. Converting pandas DataFrame to R data frame and vice-versa.
  4. Practical example (Running a Bayesian Network).

But first, we should install the rpy2 package.

# Jupyter Notebook option
!pip install rpy2
# Terminal option
pip install rpy2

1. Install R packages

In R, installing packages is performed by downloading them from CRAN mirrors and then installing them locally. In a similar way to Python modules, the packages can be installed and then loaded.

# Choosing a CRAN Mirror
import rpy2.robjects.packages as rpackages
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
# Installing required packages
from rpy2.robjects.vectors import StrVector
packages = ('bnlearn',...,'other desired packages')
utils.install_packages(StrVector(packages))

By selecting ind = 1 in chosseCRANmirror , we assure an automatic redirection to the server nearest to our location. Now, we’re going to cover step two.

2. Importing packages and functions

Here, we’re going to import the libraries and functions required to perform a Bayesian Network in the practical example.

# Import packages
from rpy2.robjects.packages import importr
base, bnlearn = importr('base'), importr('bnlearn')
# Import Functions
bn_fit, rbn = bnlearn.bn_fit, bnlearn.rbn
hpc, rsmax2, tabu = bnlearn.hpc, bnlearn.rsmax2, bnlearn.tabu

In order to import any function, it is convenient to see the ‘rpy2’ key in the dictionary of every package, for example, to see available functions to import on bnlearn we run:

bnlearn.__dict__['_rpy2r']Output:
...
...
'bn_boot': 'bn.boot',
'bn_cv': 'bn.cv',
'bn_cv_algorithm': 'bn.cv.algorithm',
'bn_cv_structure': 'bn.cv.structure',
'bn_fit': 'bn.fit',
'bn_fit_backend': 'bn.fit.backend',
'bn_fit_backend_continuous': 'bn.fit.backend.continuous',
...
...

For more info on how to import functions checkout [4] or [5].

3. Converting pandas DataFrame to R data frame and vice-versa

Personally, I think this functionality is what allows you to combine the scalability (python) with statistical tools (R). As a personal example, while I was using the Multiprocessing python library to implement parallel computation, I also wanted to try the auto.arima() function from the forecast R library, besides the functions of statsmodels Python package, for forecasting. So, the robjects.conversion is what allows one to merge the best of the two programming languages.

# Allow conversion
import rpy2.robjects as ro
from rpy2.objects import pandas2ri
pandas2ri.activate()
# Convert to R dataframe
r_dt = ro.conversion.py2rpy(dt) # dt is a pd.DataFrame object
# Convert back to pandas DataFrame
pd_dt = ro.conversion.rpy2py(r_dt)

When activating the pandas conversion (pandas2ri.activate()), many conversions of R to pandas will be done automatically. Yet, for explicit conversion we call the py2rpy or rpy2py functions.

4. Practical example with a Bayesian Network

Besides Monte-Carlo methods, Bayesian Networks are an option for simulating data. However, as today there is no library available for this task in Python. So, I opt for the bnlearn package, which let to learn the graphical structure of Bayesian networks and perform inference from them.

In the example below, we’re using a hybrid algorithm (rsmax2) for learning the structure of the network because it allows us to use any combination of constraint-based and score-based algorithms. However, depending on the nature of the problem you should choose the right heuristic, for the complete list of available algorithms see [7]. Once, the network is learned we simulate n random samples from the bayesian network with the rbn function. Finally, we perform a try-except structure to handle a particular type of error.

r_imputados = robjects.conversion.py2rpy(imputados) 

try:
# Learn structure of Network
structure = rsmax2(data, restrict = 'hpc', maximize = 'tabu')

fitted = bn_fit(structure, data = data, method = "mle")

# Generate n number of observations
r_sim = rbn(fitted, n = 10)

except rpy2.rinterface_lib.embedded.RRuntimeError:
print("Error while running R methods")

RunTimeError happens when we don’t want the function to fail or do something unexpected. In this case, we’re catching this error because it is a way to inform the user when something went wrong that it isn't another kind of error (for complete exceptions see [9]). As an illustration, I got the error of not finding the hybrid.pc.filter hybrid.pc.filter while running the rsmax2 function.

Further Functionalities

There is much more you could do with the rpy2 low-level interface and high-level interface. For instance, you could call python functions with R, let’s see how to find the minimum of a four-dimensional Colville Function through Conjugate-Gradient Method.

from rpy2.robjects.vectors import FloatVector
from rpy2.robjects.packages import importr
import rpy2.rinterface as ri
stats = importr('stats')

# Colville f: R^4 ---> R
def Colville(x):
x1, x2, x3, x4 = x[0], x[1], x[2], x[3]

return 100*(x1**2-x2)**2 + (x1-1)**2+(x3-1)**2 + 90*(x3**2-x4)**2 + 10.1*((x2-1)**2 + (x4-1)**2) + 19.8*(x2-1)*(x4-1)

# Expose function to R
Colville = ri.rternalize(Colville)

# Initial point
init_point = FloatVector((3, 3, 3, 3))

# Optimization Function
res = stats.optim(init_point, Colville, method = c("CG"))

References:

[1] Matt Brown. “Run Python code from R using the reticulate package”. R-pubs. URL: https://rstudio-pubs-static.s3.amazonaws.com/407460_396f867ce3494d479fd700960879e22c.html

[2] Ajay Ohri. “Using Python and R together: 3 main approaches”. KDnuggets. URL: https://www.kdnuggets.com/2015/12/using-python-r-together.html

[3] Rpy2 official documentation. URL: https://rpy2.github.io/doc/latest/html/index.html

[4] https://stackoverflow.com/questions/59462337/importing-any-function-from-an-r-package-into-python/59462338#59462338

[5] https://stackoverflow.com/questions/49776568/calling-functions-from-within-r-packages-in-python-using-importr

[6] https://stackoverflow.com/questions/47306899/how-do-i-catch-an-rpy2-rinterface-rruntimeerror-in-python

[7] Bnlearn Official Documentation. http://www.bnlearn.com/documentation/man/structure.learning.html

[8] Daniel Oehm. “Bayesian Network Example with the bnlearn Package”. URL: http://gradientdescending.com/bayesian-network-example-with-the-bnlearn-package/

[9] Python 3.8 Built-In Exceptions. URL: https://docs.python.org/3.8/library/exceptions.html#RuntimeError

[10] Robert, Christian; Casella, George. Introducing Monte Carlo Methods with R. Springer. 2010

[11] Nagarajan, Radhakrishnan; Scutari, Marco; Lèbre, Sophie. Bayesian Networks in R. Springer. 2013

An introduction to working with R and Python (2024)

FAQs

Is Python and R difficult to learn? ›

Both Python and R are relatively easy to learn, especially if you already have some programming experience. People will debate which is easier for newcomers; both have a relative simple syntax, although Python may just edge it.

Is it hard to learn R and Python at the same time? ›

While there are many languages and disciplines to choose from, some of the most popular are R and Python. It's totally fine to learn both at the same time! Generally speaking, Python is more versatile: it was developed as a general-purpose programming language and has evolved to be great for data science.

Is it easy to learn Python if you know R? ›

Overall, Python's easy-to-read syntax gives it a smoother learning curve. R tends to have a steeper learning curve at the beginning, but once you understand how to use its features, it gets significantly easier. Tip: Once you've learned one programming language, it's typically easier to learn another one.

Can R and Python work together? ›

RStudio has recently added support for Python, and you can use it to write and execute Python and R code in the same project, and access Python tools and libraries from R. You can use RStudio to edit, debug, and run Python and R scripts, and create notebooks and reports that combine both languages.

How long does it take to learn R if you know Python? ›

It may only take a week or two to learn this language for those who already have coding knowledge. Some estimates say that individuals with a solid coding background can be ready to use R in as little as one week if they commit to studying this language daily and in-depth.

Which should I learn first, R or Python? ›

If this is your first foray into computer programming, you may find Python code easier to learn and more broadly applicable. However, if you already have some understanding of programming languages or have specific career goals centered on data analysis, R language may be more tailored to your needs.

Can I learn Python if I'm bad at math? ›

If you want to learn advance python to get into data science or machine learning you need to have some knowledge in linear algebra, stat etc.. Parts. But, if you are a beginner pythonista basic math like numerics and how to handle all these are enough.

How much Python is needed for a data analyst? ›

Python is relatively easy to learn, so you can master it within a short time. For data science jobs, you need to have advanced Python skills as this language is used for data analysis, data visualization, ML, etc.

Is learning Python alone enough? ›

Python alone isn't going to get you a job unless you are extremely good at it. Not that you shouldn't learn it: it's a great skill to have since python can pretty much do anything and coding it is fast and easy. It's also a great first programming language according to lots of programmers.

What are the disadvantages of Python over R? ›

Disadvantages of Python

Python performs poorly in statistical analysis compared to R due to a lack of statistical packages. Sometimes developers may face runtime errors due to the dynamically typed nature.

What is the best IDE for both R and Python? ›

Jupyter Notebook

It is considered the best IDE for R code and Python code. Its notebook-style format seamlessly integrates code, text, and visualizations, offering a versatile platform for data analysis.

Is Python more in demand than R? ›

Popularity of R vs Python

Python currently supports 15.7 million worldwide developers while R supports fewer than 1.4 million. This makes Python the most popular programming language out of the two. The only programming language that outpaces Python is JavaScript, which has 17.4 million developers.

How hard is it to learn R programming? ›

Learning R is considered one of the more challenging programming languages to master. This is because its syntax is quite different from other coding languages.

How difficult is Python to learn? ›

Python is widely considered among the easiest programming languages for beginners to learn. If you're interested in learning a programming language, Python is a good place to start. It's also one of the most widely used.

Can Python do everything that R does? ›

R can't be used in production code because of its focus on research, while Python, a general-purpose language, can be used both for prototyping and as a product itself. Python also runs faster than R, despite its GIL problems.

Top Articles
Latest Posts
Article information

Author: Trent Wehner

Last Updated:

Views: 5736

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Trent Wehner

Birthday: 1993-03-14

Address: 872 Kevin Squares, New Codyville, AK 01785-0416

Phone: +18698800304764

Job: Senior Farming Developer

Hobby: Paintball, Calligraphy, Hunting, Flying disc, Lapidary, Rafting, Inline skating

Introduction: My name is Trent Wehner, I am a talented, brainy, zealous, light, funny, gleaming, attractive person who loves writing and wants to share my knowledge and understanding with you.