What’s new in R 4.4.0? (2024)

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What’s new in R 4.4.0? (1)

R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is abeauty. In time-honoured tradition, here we summarise some of thechanges that caught our eyes. R 4.4.0 introduces some cool features (oneof which is experimental) and makes one of our favourite {rlang}operators available in base R. There are a few things you might need tobe aware of regarding handling NULL and complex values.

The full changelog can be found at the r-release ‘NEWS’page and ifyou want to keep up to date with developments in base R, have a look atthe r-devel ‘NEWS’page.

Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.

A tail-recursive tale

Years ago, before I’d caused my first stack overflow, my Grandad used totell me a daft tale:

It was on a dark and stormy night,And the skipper of the yacht said to Antonio,"Antonio, tell us a tale",So Antonio started as follows...It was on a dark and stormy night,And the skipper of the yacht .... [ad infinitum]

The tale carried on in this way forever. Or at least it would until youwere finally asleep.

At around the same age, I was toying with BASIC programming and couldknock out classics such as

>10 PRINT "Ali stinks!">20 GOTO 10

Burn! Infinite burn!

That was two example processes that demonstrate recursion. Antonio’stale quotes itself recursively, and my older brother will be repeatedlymocked unless someone intervenes.

Recursion is an elegant approach to many programming problems – thisusually takes the form of a function that can call itself. You would useit when you know how to get closer to a solution, but not necessarilyhow to get directly to that solution. And unlike the un-ending examplesabove, when we write recursive solutions to computational problems, weinclude a rule for stopping.

An example from mathematics would be finding zeros for a continuousfunction. The sine function provides a typical example:

What’s new in R 4.4.0? (2)

We can see that when x = π, there is a zero for sin(x), but thecomputer doesn’t know that.

One recursive solution to finding the zeros of a function, f(), is thebisection method,which iteratively narrows a range until it finds a point where f(x) isclose enough to zero. Here’s a quick implementation of that algorithm.If you need to perform root-finding in R, please don’t use the followingfunction. stats::uniroot() is much more robust…

bisect = function(f, interval, tolerance, iteration = 1, verbose = FALSE) { if (verbose) { msg = glue::glue( "Iteration {iteration}: Interval [{interval[1]}, {interval[2]}]" ) message(msg) } # Evaluate 'f' at either end of the interval and return # any endpoint where f() is close enough to zero lhs = interval[1]; rhs = interval[2] f_left = f(lhs); f_right = f(rhs) if (abs(f_left) <= tolerance) { return(lhs) } if (abs(f_right) <= tolerance) { return(rhs) } stopifnot(sign(f_left) != sign(f_right)) # Bisect the interval and rerun the algorithm # on the half-interval where y=0 is crossed midpoint = (lhs + rhs) / 2 f_mid = f(midpoint) new_interval = if (sign(f_mid) == sign(f_left)) { c(midpoint, rhs) } else { c(lhs, midpoint) } bisect(f, new_interval, tolerance, iteration + 1, verbose)}

We know that π is somewhere between 3 and 4, so we can find the zeroof sin(x) as follows:

bisect(sin, interval = c(3, 4), tolerance = 1e-4, verbose = TRUE)#> Iteration 1: Interval [3, 4]#> Iteration 2: Interval [3, 3.5]#> Iteration 3: Interval [3, 3.25]#> Iteration 4: Interval [3.125, 3.25]#> Iteration 5: Interval [3.125, 3.1875]#> Iteration 6: Interval [3.125, 3.15625]#> Iteration 7: Interval [3.140625, 3.15625]#> Iteration 8: Interval [3.140625, 3.1484375]#> Iteration 9: Interval [3.140625, 3.14453125]#> Iteration 10: Interval [3.140625, 3.142578125]#> Iteration 11: Interval [3.140625, 3.1416015625]#> [1] 3.141602

It takes 11 iterations to get to a point where sin(x) is within10−4 of zero. If we tightened the tolerance, had a morecomplicated function, or had a less precise starting range, it mighttake many more iterations to approximate a zero.

Importantly, this is a recursive algorithm - in the last statement ofthe bisect() function body, we call bisect() again. The initial callto bisect() (with interval = c(3, 4)) has to wait until the secondcall to bisect() (interval = c(3, 3.5)) completes before it canreturn (which in turn has to wait for the third call to return). So wehave to wait for 11 calls to bisect() to complete before we get ourresult.

Those function calls get placed on a computational object named thecall stack. For eachfunction call, this stores details about how the function was called andwhere from. While waiting for the first call to bisect() to complete,the call stack grows to include the details about 11 calls tobisect().

Imagine our algorithm didn’t just take 11 function calls to complete,but thousands, or millions. The call stack would get really full andthis would lead to a “stack overflow”error.

We can demonstrate a stack-overflow in R quite easily:

blow_up = function(n, max_iter) { if (n >= max_iter) { return("Finished!") } blow_up(n + 1, max_iter)}

The recursive function behaves nicely when we only use a small number ofiterations:

blow_up(1, max_iter = 100)#> [1] "Finished!"

But the call-stack gets too large and the function fails when we attemptto use too many iterations. Note that we get a warning about the size ofthe call-stack before we actually reach it’s limit, so the R process cancontinue after exploding the call-stack.

blow_up(1, max_iter = 1000000)# Error: C stack usage 7969652 is too close to the limit

In R 4.4, we are getting (experimental) support for tail-callrecursion. This allows us (inmany situations) to write recursive functions that won’t explode thesize of the call stack.

How can that work? In our bisect() example, we still need to make 11calls to bisect() to get a result that is close enough to zero, andthose 11 calls will still need to be put on the call-stack.

Remember the first call to bisect()? It called bisect() as the verylast statement in it’s function body. So the value returned by thesecond call to bisect() was returned to the user without modificationby the first call. So we could return the second call’s value directlyto the user, instead of returning it via the first bisect() call;indeed, we could remove the first call to bisect() from the call stackand put the second call in it’s place. This would prevent the call stackfrom expanding with recursive calls.

The key to this (in R) is to use the new Tailcall() function. Thattells R “you can remove me from the call stack, and put this cat oninstead”. Our final line in bisect() should look like this:

bisect = function(...) { ... snip ... Tailcall(bisect, f, new_interval, tolerance, iteration + 1, verbose)}

Note that you are passing the name of the recursively-called functioninto Tailcall(), rather than a call to that function (bisect ratherthan bisect(...)).

To illustrate that the stack no longer blows up when tail-call recursionis used. Let’s rewrite our blow_up() function:

# R 4.4.0blow_up = function(n, max_iter) { if (n >= max_iter) { return("Finished!") } Tailcall(blow_up, n+1, max_iter)}

We can still successfully use a small number of iterations:

blow_up(1, 100)#> [1] "Finished!"

But now, even a million iterations of the recursive function can beperformed:

blow_up(1, 1000000)#> [1] "Finished!"

Note that the tail-call optimisation only works here, because therecursive call was made as the very last step in the function body. Ifyour function needs to modify the value after the recursive call, youmay not be able to use Tailcall().

Rejecting the NULL

Missing values are everywhere.

In a typical dataset you might have missing values encoded as NA (ifyou’re lucky) and invalid numbers encoded as NaN, you might haveimplicitly missing rows (for example, a specific date missing from atime series) or factor levels that aren’t present in your table. Youmight even have empty vectors, or data-frames with no rows, to contendwith. When writing functions and data-science workflows, where the inputdata may change over time, by programming defensively and handling thesekinds of edge-cases your code will throw up less surprises in the longrun. You don’t want a critical report to fail because a mathematicalfunction you wrote couldn’t handle a missing value.

When programming defensively with R, there is another important form ofmissingness to be cautious of …

The NULLobject.

NULL is an actual object. You can assign it to a variable, combine itwith other values, index into it, pass it into (and return it from) afunction. You can also test whether a value is NULL.

# Assignmentmy_null = NULLmy_null#> NULL# Use in functionsmy_null[1]#> NULLc(NULL, 123)#> [1] 123c(NULL, NULL)#> NULLtoupper(NULL)#> character(0)# Testing NULL-nessis.null(my_null)#> [1] TRUEis.null(1)#> [1] FALSEidentical(my_null, NULL)#> [1] TRUE# Note that the equality operator shouldn't be used to# test NULL-ness:NULL == NULL#> logical(0)

R functions that are solely called for their side-effects (write.csv()or message(), for example) often return a NULL value. Otherfunctions may return NULL as a valid value - one intended forsubsequent use. For example, list-indexing (which is a function call,under the surface) will return NULL if you attempt to access anundefined value:

config = list(user = "Russ")# When the index is present, the associated value is returnedconfig$user#> [1] "Russ"# But when the index is absent, a `NULL` is returnedconfig$url#> NULL

Similarly, you can end up with a NULL output from an incomplete stackof if / else clauses:

language = "Polish"greeting = if (language == "English") { "Hello"} else if (language == "Hawaiian") { "Aloha"}greeting#> NULL

A common use for NULL is as a default argument in a functionsignature. A NULL default is often used for parameters that aren’tcritical to function evaluation. For example, the function signature formatrix() is as follows:

matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)

The dimnames parameter isn’t really needed to create a matrix, butwhen a non-NULL value for dimnames is provided, the values are usedto label the row and column names of the created matrix.

matrix(1:4, nrow = 2)#> [,1] [,2]#> [1,] 1 3#> [2,] 2 4matrix(1:4, nrow = 2, dimnames = list(c("2023", "2024"), c("Jan", "Feb")))#> Jan Feb#> 2023 1 3#> 2024 2 4

R 4.4 introduces the %||% operator to help when handling variablesthat are potentially NULL. When working with variables that could beNULL, you might have written code like this:

# Remember there is no 'url' field in our `config` list# Set a default value for the 'url' if one isn't defined in# the configmy_url = if (is.null(config$url)) { "https://www.jumpingrivers.com/blog/"} else { config$url}my_url#> [1] "https://www.jumpingrivers.com/blog/"

Assuming config is a list:

  • when the url entry is absent from config (or is itself NULL),then config$url will be NULL and the variable my_url will be setto the default value;
  • but when the url entry is found within config (and isn’t NULL)then that value will be stored in my_url.

That code can now be rewritten as follows:

# R 4.4.0my_url = config$url %||% "https://www.jumpingrivers.com/blog"my_url#> [1] "https://www.jumpingrivers.com/blog"

Note that the left-hand value must evaluate to NULL for the right-handside to be evaluated, and that empty vectors aren’t NULL:

# R 4.4.0NULL %||% 1#> [1] 1c() %||% 1#> [1] 1numeric(0) %||% 1#> numeric(0)

This operator has been available in the {rlang} package for eightyears and is implemented in exactly the same way. So if you have beenusing %||% in your code already, the base-R version of this operatorshould work without any problems, though you may want to wait until youare certain all your users are using R >= 4.4 before switching from{rlang} to the base-R version of %||%.

Any other business

A shorthand hexadecimalformat(common in web-programming) for specifying RGB colours has beenintroduced. So, rather than writing the 6-digit hexcode for a colour“#112233”, you can use “#123”. This only works for those 6-digithexcodes where the digits are repeated in pairs.

Parsing and formatting of complex numbers has been improved. Forexample, as.complex("1i") now returns the complex number 0 + 1i,previously it returned NA.

There are a few other changes related to handling NULL that have beenintroduced in R 4.4. The changes highlight that NULL is quitedifferent from an empty vector. Empty vectors contain nothing, whereasNULL represents nothing. For example, whereas an empty numeric vectoris considered to be an atomic (unnestable) data structure, NULL is nolonger atomic. Also, NCOL(NULL) (the number of columns in a matrixformed from NULL) is now 0, whereas it was formerly 1.

sort_by() a new function for sorting objects based on values in aseparate object. This can be used to sort a data.frame based on it’scolumns (they should be specified as a formula):

mtcars |> sort_by(~ list(cyl, mpg)) |> head()## mpg cyl disp hp drat wt qsec vs am gear carb## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2

Try the latest version out for yourself

To take away the pain of installing the latest development version of R,you can use docker. To use the devel version of R, you can use thefollowing commands:

docker pull rstudio/r-base:devel-jammydocker run --rm -it rstudio/r-base:devel-jammy

Once R 4.4 is the released version of R and the r-docker repositoryhas been updated, you should use the following command to test out R4.4.

docker pull rstudio/r-base:4.4-jammydocker run --rm -it rstudio/r-base:4.4-jammy

See also

The R 4.x versions have introduced a wealth of interesting changes.These have been summarised in our earlier blog posts:

For updates and revisions to this article, see the original post

Related

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What’s new in R 4.4.0? (2024)

FAQs

What’s new in R 4.4.0? ›

TLDRR 4.4. 0 introduces new features and experimental support for tail-call recursion

tail-call recursion
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion.
https://en.wikipedia.org › wiki › Tail_call
. It also introduces the %||% operator for handling potentially null variables.

What is R 4.4 0? ›

R 4.4.0 (“Puppy Cup”) was released on the 24th April 2024 and it is a beauty. In time-honoured tradition, here we summarise some of the changes that caught our eyes. R 4.4.0 introduces some cool features (one of which is experimental) and makes one of our favourite {rlang} operators available in base R.

Which version of R is best? ›

0, this would be version 4.2. 3. That's the recommended one for a production system at the moment. And as soon as the R team releases the next minor version – 4.4.

When was R 4.3 0 released? ›

R can already take you everywhere. With it we can learn about the minutest particles and the largest galaxies. So, to celebrate the release of R 4.3 (“Already Tomorrow”, on April 21st, 2023), let's reverse Einstein's quote and take you from A to B with logic.

Is 4.4 R value good? ›

In fact, Therm-a-Rest designers claim that you actually need a pad with an R-value of at least 4.0 to maximize your sleeping bag's temperature rating. In other words, if your sleeping bag is rated to 20 degrees Fahrenheit, you need a pad with an R-value of 4.0 or higher in order to be comfortable at 20 degrees.

Should I update my R version? ›

While there is no requirement to regularly update your installation of R and RStudio, occasionally updating your software ensures that you have all of latest functionality and resources. The R Project Team and the Foundation for Open Access Statistics [developers of RStudio] regularly update these applications.

Is R becoming obsolete? ›

The truth is, R is far from dead. While it's true that Python has gained significant traction in recent years, R remains a powerful language that offers unique benefits for data scientists. One of the critical advantages of R is its focus on statistics and data visualization.

Why use R rather than Excel? ›

It is very well known that Excel has a data storage limitation per spreadsheet. It can have a very limited amount of columns and rows, while R is made to handle larger data sets. Excel files are also known to crash when they exceed 20 tabs of data. Excel is able to handle a good chunk of data, but not much.

Does anyone still use R? ›

In October 2023, R held 17th place on the TIOBE Index, and in August 2020, R was in 8th place. Updated monthly, the TIOBE Index is a good indicator of a programming language's popularity. There are 50 languages listed in the index and more than 8000 other programming languages, so it's safe to say that R is popular!

Is R hard to learn? ›

R is considered one of the more difficult programming languages to learn due to how different its syntax is from other languages like Python and its extensive set of commands. It takes most learners without prior coding experience roughly four to six weeks to learn R. Of course, this depends on several factors.

Can I download R for free? ›

If you plan to use R, you will need to download R and install it on your computer. It is also advised that you download and install RStudio. RStudio will make it easier to use R and manage your R scripts. Two good things about R, are that it is free and it runs on most computers.

Why is R called R? ›

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.

What is R good for? ›

R is widely used in data science by statisticians and data miners for data analysis and the development of statistical software. R is one of the most comprehensive statistical programming languages available, capable of handling everything from data manipulation and visualization to statistical analysis.

Who uses R? ›

R is one of these specialized programming languages. R is a programming language created by statisticians for statistics, specifically for working with data. It is a language for statistical computing and data visualizations used widely by business analysts, data analysts, data scientists, and scientists.

What is R software used for? ›

R is widely used in data science by statisticians and data miners for data analysis and the development of statistical software. R is one of the most comprehensive statistical programming languages available, capable of handling everything from data manipulation and visualization to statistical analysis.

What is code R used for? ›

R is a programming language for statistics that can be used for statistical computing and to show data. It's numerous abilities can be put into three main groups: Manipulation of data. Analysis of the numbers.

What is R version 4.0 2? ›

R 4.0. 2 is the current version of the R language. RStudio is a graphical user interface to R . Packages available to R are available to RStudio.

What is R for Windows 4.0 5? ›

5 Download - for statistical computing and graphics. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

Top Articles
Latest Posts
Article information

Author: Jeremiah Abshire

Last Updated:

Views: 6358

Rating: 4.3 / 5 (54 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.