Nic Crane

Nic Crane

@niccrane.bsky.social

Independent R consultant. Apache Arrow PMC Member & #rstats πŸ“¦ maintainer. Arrow course launching early 2026: https://big-data-r.thinkific.com/ More of my stuff at https://niccrane.com/

2,510 Followers 158 Following 108 Posts Joined Sep 2023
3 days ago

This is fantastic news! Heather is such a positive force in the #rstats community and is doing vital work for the long-term sustainability of R and its community.

49 8 2 0
3 days ago
Preview
Maintaining open source in the age of generative AI: Recommendations for maintainers and contributors AI-generated contributions to open source projects are surging. Adrin Jalali and Cailean Osborne share best practices and recommendations for maintainers.

I really enjoyed reading this article from scikit-learn maintainers about the specific impacts of AI-generated open source contributions, recommendations for maintainers, and the potential for positives where AI use can be helpful.

blog.probabl.ai/maintaining-...

#ai #opensource #llms

2 1 0 0
3 days ago

Ah, cheers!

0 0 0 0
3 days ago

OMG, same, like the AI shame is so real, even when it's inevitable we're going to make mistakes with something so new!

0 0 0 0
3 days ago

That's so cool! What does the extension do?

0 0 0 0
5 days ago

Thanks! I still am impressed by how easy it makes it to do these kinds of things!

0 0 1 0
5 days ago
Preview
LLM-Assisted Issue Triage for Open Source Maintainers Nic Crane

I built a GitHub issue classifier for Apache Arrow issue language using {ellmer} - super simple and almost 100% accuracy. Blog post: niccrane.com/posts/llm-issue-triage/

#rstats #ai #llms

12 4 1 0
6 days ago

In the shower thinking "wouldn't it be cool to combine LLM tool calls and have them run code but in a constrained way" & then "it needs some kind of intermediate representation; how would we validate whatever it produces?" & then realised my idea wasn't novel & just the motivation for text-to-sql πŸ˜…

3 0 0 0
1 week ago

I remember at posit::conf last year there was mention of posit::conf Europe 2026 - anyone know if this is still a thing? #rstats #positconf #posit

2 0 1 0
2 weeks ago

Huge thanks to the organisational team for putting on such an excellent event! πŸ’œπŸŒˆ

5 0 0 0
2 weeks ago
Preview
Schedule – rainbowR conference

Excited for all of the talks tomorrow, check out the schedule here if you havent' seen it! conference.rainbowr.org/schedule.html

3 0 1 0
2 weeks ago

Whew, and it's done! Thanks to everyone who came to my RainbowR workshop on LLMs for Data Analysis in #rstats! First time with that content in front of an audience, so I appreciate the excellent questions folks asked (and double thanks to everyone who filled in the feedback forms!)

18 1 1 0
2 weeks ago
Video thumbnail

"Working with agents is a lot more productive, but a lot less fun." Charlie Marsh on the weird world of building software right now. Full conversation on The Test Set.

19 5 0 2
2 weeks ago

Sounds interesting, how well does it work for R code?

1 0 1 0
2 weeks ago



It's still experimental, so potentially some rough edges, but I think it's a great example of making sure the LLM benefits are tempered with what actually makes sense for *people*.

1 0 1 0
2 weeks ago

Instead of generating a load of comment, you get suggestions one at a time, which you can then choose to accept or reject, before it moves on to the next suggestion. It generates suggestions as it goes, so if you accept some changes but reject others, its suggestions change on the basis of the code.

1 0 1 0
2 weeks ago
Post image

There's promise in using LLMs for code review, but it's tricky things to make sure it's not overwhelming.

I was looking at this new experimental package by Simon Couch and I really love how it allows you to review code iteratively. #rstats #ai #llms

github.com/simonpcouch/...

26 6 3 0
1 month ago
How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far. Cognitive debt, a term gaining traction recently, instead communicates the notion that …

Short musings on "cognitive debt" - I'm seeing this in my own work, where excessive unreviewed AI-generated code leads me to lose a firm mental model of what I've built, which then makes it harder to confidently make future decisions simonwillison.net/2026/Feb/15/...

465 88 41 20
1 month ago

Should be there shortly!

1 0 0 0
1 month ago

Let's talk contributors! This release saw 44 contributors to the codebase! 38 worked on the C++ library, 3 on the R πŸ“¦, & 3 on both. 23 people made their first contribution! πŸŽ‰

Thanks to everyone who was involved!

2 0 0 0
1 month ago

Writing partitioned datasets on S3 no longer requires ListBucket permissions; useful if you have write-only access to a bucket.

1 0 1 0
1 month ago
The following reproducible example:
library(arrow)                          
library(dplyr)
library(stringr)

df <- arrow_table(x = c("Apache", "Arrow", "23.0.0"))                                                                       
df |> 
  filter(str_ilike(x, "ARROW")) |> collect() 
#> # A tibble: 1 Γ— 1
#>   x    
#>   <chr>
#> 1 Arrow

We've added support for stringr::str_ilike() for case-insensitive pattern matching.

0 0 1 0
1 month ago
Preview
Changelog

We're excited to announce the release of {arrow} 23.0.0 πŸΉπŸ“¦

Here's a roundup of the new features and changes in a 🧡

Full details can be found at arrow.apache.org/docs/r/news/

#rstats #apachearrow

26 3 2 0
1 month ago

I mean, you could say the same thing about any R function; just a toy example - feel free to replace it with something more useful! πŸ˜‰

1 0 0 0
1 month ago

Yeah, there's some irony in the fact that I randomly chose that specific example, and then the results even showed the new features including the web fetch thing making my example redundant! πŸ˜† I shall have to think up a new example for when I'm teaching, but YAY, awesome new feature! πŸŽ‰

2 0 0 0
1 month ago
First part of code - full code can be found at https://gist.github.com/thisisnic/eae09dbd4594e2cff75d156a8bab3f59 First part of code - full code can be found at https://gist.github.com/thisisnic/eae09dbd4594e2cff75d156a8bab3f59

Tool calling lets LLMs run R functions; in this example I let an LLM ask my R session to check the latest {ellmer} updates by scraping the news page and when I ask the LLM "what's new in ellmer?", it works with what comes back.

{ellmer} website: ellmer.tidyverse.org

#rstats #llms #ai #datascience

4 0 2 0
1 month ago
Line chart showing percent correct on the y-axis and three conditions on the x-axis: Baseline, Intuitive, and Mocked. Three lines represent GPT-5.2, Claude Opus 4.5, and Gemini 2.5 Pro. All three models score between 93-98% on baseline, then drop on intuitive and mocked conditions. All three perform the worst on the mocked condition.

More on LLMs and plot interpretation: they do fine in normal conditions, but struggle when the plot conflicts strongly with their priors.

@simonpcouch.com and I investigated why and what might help: posit.co/blog/llm-plo...

24 5 1 0
1 month ago
Code in which text from wikipedia article being passed into chat_structured method to extract dates and events

I love "structured output" as a way of extracting data from text as data frame. 🎯

Image shows using the {ellmer} package and how using type_array(type_object(...)) automatically returns a data frame in R πŸ”§

{ellmer} website: ellmer.tidyverse.org

#rstats #llms #ai #datascience

15 2 0 0
1 month ago
Preview
Posit::conf(2026) Call for Talks - Posit posit::conf(2026) is coming September 14-16 to Houston, TX, and we're looking for talks!

posit::conf(2026) call for talks is now open! If you're an #RStats or #Python user, have a great DS workflow to share, or have some lessons learned, we'd love to hear from you.

πŸ”— posit.co/blog/posit-c...

10 3 1 0