Etienne Bacher's Avatar

Etienne Bacher

@etiennebacher.bsky.social

PhD in economics from LISER, Luxembourg, now looking for research software engineer or data science positions. Mostly here to talk about #rstats https://github.com/etiennebacher

199 Followers  |  68 Following  |  126 Posts  |  Joined: 14.12.2024
Posts Following

Posts by Etienne Bacher (@etiennebacher.bsky.social)

A screenshot with three panels. The two top panels show two R files where an R function "expr_uses_col_from_dots" is defined (therefore two functions are given the same name). The bottom panel shows  the output of "jarl check . --select duplicated_function_definition", highlighting one of the "expr_uses_col_from_dots" with the message:

`expr_uses_col_from_dots` is defined more than once in this package.
help: Other definition at R/utils-expr.R:885:1

A screenshot with three panels. The two top panels show two R files where an R function "expr_uses_col_from_dots" is defined (therefore two functions are given the same name). The bottom panel shows the output of "jarl check . --select duplicated_function_definition", highlighting one of the "expr_uses_col_from_dots" with the message: `expr_uses_col_from_dots` is defined more than once in this package. help: Other definition at R/utils-expr.R:885:1

This will be available in the next version of Jarl (0.5.0):

23.02.2026 21:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Alt+enter

23.02.2026 18:37 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yes it's a complement, not a replacement. They don't do the same thing

18.02.2026 13:10 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Jarl โ€“ jarl

And a bit of self promotion ๐Ÿ˜„: if you use R, you could try out my new linter Jarl

jarl.etiennebacher.com

18.02.2026 07:29 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

A linter often plays well with a formatter, whose job is to automatically format the code to match some rules in terms of spacing, indentation, etc

18.02.2026 07:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

A linter could also detect correctness issues in the code, for example to find code that can never run because it comes after a return() in a function.

Another example is detecting code that we know will error if it runs.

18.02.2026 07:29 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

A linter checks for several patterns of code that could / should be fixed to improve it.

For instance, some code might produce the correct output but could be slow or hard to read because it doesn't use the most appropriate function for the job. A linter could detect that and recommend a fix.

18.02.2026 07:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Thank you, it's always nice to read that :)

13.02.2026 23:17 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Well deserved, congrats!

12.02.2026 17:11 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Changelog

#rstats tidypolars 0.17.0 is available!

tidypolars provides the tidyverse syntax while using polars for better perf.

In this release:

- support new functions from dplyr 1.2.0 (filter_out, when_any...)
- pivot_wider with lazyframe
- bug fixes

and more

News: tidypolars.etiennebacher.com/news/

12.02.2026 12:57 โ€” ๐Ÿ‘ 28    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Enable "Fix on save" in editors ยท Issue #160 ยท etiennebacher/jarl Ruff has a code action "Fix all" that can run on save: https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff

Not for now, this is sth that maybe will be implemented but I have some concerns, see github.com/etiennebache...

05.02.2026 15:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

You had a great timing, I had already planned to release today ^^

05.02.2026 15:06 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
This screenshot shows the terminal after running `jarl check . --statistics`:

> jarl check . --statistics
   92 [ ] true_false_symbol
   30 [ ] implicit_assignment
    9 [*] numeric_leading_zero
    8 [ ] duplicated_arguments
    3 [*] seq
    3 [*] lengths
    2 [ ] unreachable_code
    2 [*] outer_negation
    1 [*] any_is_na
    1 [*] class_equals
    1 [*] length_levels

Rules with `[*]` have an automatic fix.

This screenshot shows the terminal after running `jarl check . --statistics`: > jarl check . --statistics 92 [ ] true_false_symbol 30 [ ] implicit_assignment 9 [*] numeric_leading_zero 8 [ ] duplicated_arguments 3 [*] seq 3 [*] lengths 2 [ ] unreachable_code 2 [*] outer_negation 1 [*] any_is_na 1 [*] class_equals 1 [*] length_levels Rules with `[*]` have an automatic fix.

To avoid filling the terminal with tons of diagnostics, there is now a command-line option `--statistics` to quickly show the summary of diagnostics reported by Jarl.

(not trying to throw shade on dplyr of course, just an example ;-) )

05.02.2026 13:30 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
This screenshot is in two parts. 

1. the left part shows a code example where some lines should reported by Jarl but are ignored because they have a `# jarl-ignore` comment

# The comment below only applies to `any(is.na(x1))`.
# jarl-ignore any_is_na: <reason>
any(is.na(x1))
any(is.na(x2))

# The comment below applies to the entire function definition, including the
# two `any(is.na(...))` calls.
# jarl-ignore any_is_na: <reason>
f <- function(x1, x2) {
  any(is.na(x1))
  any(is.na(x2))
}

2. the right part shows the terminal, where Jarl only reports the line that doesn't have this special comment.

This screenshot is in two parts. 1. the left part shows a code example where some lines should reported by Jarl but are ignored because they have a `# jarl-ignore` comment # The comment below only applies to `any(is.na(x1))`. # jarl-ignore any_is_na: <reason> any(is.na(x1)) any(is.na(x2)) # The comment below applies to the entire function definition, including the # two `any(is.na(...))` calls. # jarl-ignore any_is_na: <reason> f <- function(x1, x2) { any(is.na(x1)) any(is.na(x2)) } 2. the right part shows the terminal, where Jarl only reports the line that doesn't have this special comment.

0.4.0 brings a new system for suppression comments. Suppression comments allow you to ignore diagnostics on specific pieces of code. Jarl used to have some (brittle) compatibility with `lintr` comments "# nolint".

This is not the case anymore and Jarl only supports "# jarl-ignore" comments.

05.02.2026 13:30 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
This screenshot is in two parts:

1. the left part shows  a function where a print() statement is unreachable because it comes after an if/else where all branches return early or error.

f <- function(x) {
  if (x > 5) {
    return("greater than five")
  } else if (x < 5) {
    return("lower than five")
  } else {
    stop("x must be greater or lower than five")
  }
  print("end of function")
}

2. the right part shows the output of `jarl check test.R` in the terminal, highlighting that this code is unreachable:

warning: unreachable_code
 --> _posts/2026-02-03-jarl-0.4.0/test.R:9:3
  |
9 |   print("end of function")
  |   ------------------------ This code is unreachable because the preceding if/else
  terminates in all branches.
  |

Found 1 error.

This screenshot is in two parts: 1. the left part shows a function where a print() statement is unreachable because it comes after an if/else where all branches return early or error. f <- function(x) { if (x > 5) { return("greater than five") } else if (x < 5) { return("lower than five") } else { stop("x must be greater or lower than five") } print("end of function") } 2. the right part shows the output of `jarl check test.R` in the terminal, highlighting that this code is unreachable: warning: unreachable_code --> _posts/2026-02-03-jarl-0.4.0/test.R:9:3 | 9 | print("end of function") | ------------------------ This code is unreachable because the preceding if/else terminates in all branches. | Found 1 error.

This is very similar to the first image but the code example is different as a line of code is unreachable because it comes after a `next` in a for loop:

f <- function(x) {
  for (i in names(x)) {
    if (i == "foo") {
      next
      print("Found name 'foo', skipping")
    }
    print(toupper(i))
  }
}

This is very similar to the first image but the code example is different as a line of code is unreachable because it comes after a `next` in a for loop: f <- function(x) { for (i in names(x)) { if (i == "foo") { next print("Found name 'foo', skipping") } print(toupper(i)) } }

Jarl is now able to find unreachable code, meaning code that will never run because it's after a stop(), a return(), or a `next` in a `for` loop for example.

This can also happen if the code comes after an `if` statement where all branches return early or error, and Jarl can reliably detect that.

05.02.2026 13:30 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Etienne Bacher: Jarl 0.4.0 Find unreachable code, ignore diagnostics, show summary statistics of diagnostics, and more.

#rstats I'm very happy to announce Jarl 0.4.0!

Jarl is a very fast R linter, written in Rust. This release brings lots of improvements and fixes.

See the blog post: www.etiennebacher.com/posts/2026-0...

And the full changelog: jarl.etiennebacher.com/changelog

๐Ÿงต to highlight some features below

05.02.2026 13:30 โ€” ๐Ÿ‘ 26    ๐Ÿ” 10    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2
A screenshot with two parts: 

1. the left side shows code with a nested loop:

for (x in names(mtcars)) {
  x <- substr(x, 1, 3)
  for (x in 1:3) {
    print(x)
  }
}

2. the right part shows the output of `jarl check test.R` in the terminal:

warning: for_loop_dup_index
 --> test.R:3:8
  |
3 |   for (x in 1:3) {
  |        -------- This index variable is already used in a parent `for` loop.
  |
  = help: Rename this index variable to avoid unexpected results.

Found 1 error.

A screenshot with two parts: 1. the left side shows code with a nested loop: for (x in names(mtcars)) { x <- substr(x, 1, 3) for (x in 1:3) { print(x) } } 2. the right part shows the output of `jarl check test.R` in the terminal: warning: for_loop_dup_index --> test.R:3:8 | 3 | for (x in 1:3) { | -------- This index variable is already used in a parent `for` loop. | = help: Rename this index variable to avoid unexpected results. Found 1 error.

It will make it in the update:

04.02.2026 19:47 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

That's a good idea, I'll try to include it in Jarl before I release 0.4.0

04.02.2026 18:26 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

No idea, I've never used it. Maybe @gmcd.bsky.social can answer that

01.02.2026 00:24 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
GitHub - grantmcdermott/dbreg: Fast regressions on database backends Fast regressions on database backends. Contribute to grantmcdermott/dbreg development by creating an account on GitHub.

You might be interested in dbreg, it looks like there's an overlap in functionalities: github.com/grantmcdermo...

01.02.2026 00:17 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

You can see some benchmarks here, with the usual caveats that benchmarks never capture all use cases:
duckdblabs.github.io/db-benchmark/

I would just suggest to give both polars and duckdb a try.

23.01.2026 07:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Regarding polars vs duckdb, both are super performant. The one that suits your needs best will likely depend on the usecase. I like the data frame interface of polars in both R and python since I'm not super familiar with writing SQL.

23.01.2026 07:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Parquet is a file format while polars or duckdb are data processing libraries so they're not comparison to be made between parquet and polars. Polars is very very fast at processing parquet files.

23.01.2026 07:01 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Changelog

#rstats tidypolars 0.16.0 is available!

tidypolars provides the tidyverse syntax while using polars for better perf.

This release:
- support for unnest and separate functions (tidyr)
- new interface to export partitioned output
- and more

News: www.tidypolars.etiennebacher.com/news/#tidypo...

22.01.2026 18:46 โ€” ๐Ÿ‘ 10    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Disable cutesy, encouraging messages? ยท Issue #804 ยท r-lib/testthat Is there a way to disable the cute messages when tests fail? (a la "No one is perfect" et al.?) It can get to be cumbersome during repeated unit tests. > test_file('myscript.R') โœ” | OK F W S | Cont...

This is the only thing I found in the issue tracker: github.com/r-lib/testth...

20.01.2026 23:02 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Handling large data with R and Python

#rstats I've given a workshop a few days ago on handling large data (think tens to hundreds of millions of rows) with Polars in Python and in R.

Here are my introductory slides: brussels-large-data-r-python.etiennebacher.com

And the associated repo: github.com/etiennebache...

20.01.2026 09:07 โ€” ๐Ÿ‘ 19    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
changelog โ€“ jarl

#rstats Jarl 0.3.0 is available!

Jarl is a very fast R linter, able to check and fix thousands of lines in milliseconds.

New since 0.2.0:
- 6 new rules
- ignore automatically generated files by default
- bug fixes and perf improvements

All changes: jarl.etiennebacher.com/changelog

17.12.2025 21:58 โ€” ๐Ÿ‘ 6    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Take a look at the R packages fuzzyjoin and zoomerjoin (zoomerjoin is extremely fast if you have millions of observations)

17.12.2025 18:57 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Introduction roxytest

roxytest? (Never used)

mikldk.github.io/roxytest/art...

08.12.2025 17:35 โ€” ๐Ÿ‘ 0    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

janitor::clean_names() for column names, almost always

04.12.2025 16:23 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0