David Robinson's Avatar

David Robinson

@drob.bsky.social

Director of Engineering at Heap. #rstats fan. Dad x2. He/him

6,610 Followers  |  85 Following  |  17 Posts  |  Joined: 17.05.2023  |  1.6647

Latest posts by drob.bsky.social on Bluesky

He doesn’t believe that β€œbrekkie” is real; fair enough I barely believe it myself

17.02.2025 02:04 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0

My son came up with a silly language where you add -ie to the end of each word. Like a shirt is a β€œshirtie” or milk is β€œmilkie”

So I told him about Australia and he absolutely lost it

17.02.2025 02:01 β€” πŸ‘ 43    πŸ” 1    πŸ’¬ 3    πŸ“Œ 0
Post image

Blown away

Using OpenAI’s Deep Research is like collaborating with a PhD student

(It told me it would get right on it then ghosted me)

04.02.2025 01:54 β€” πŸ‘ 88    πŸ” 6    πŸ’¬ 2    πŸ“Œ 1

Thanks for sharing- I might get back into them!

01.02.2025 20:44 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

By convention, I use all caps for any function or infix operator I want to be passed to SQL

Because that avoids the possibility of conflicting with an R function, which will lead to an error when dbplyr finds it and tries to apply it

(E.g. lowercase extract() would have had a conflict from tidyr)

01.02.2025 20:09 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

That’s right, %FROM% isn’t from a package; dbplyr turns any unrecognized infix operator directly into SQL (much like it does with variable names)

Fun fact; %FrOm% would work too

01.02.2025 20:07 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

Being able to read is OP for a dad

I take my son around the Natural History Museum, he asks about anything, and I rattle off what the plaque says

He thinks I’m a goddamn genius

01.02.2025 19:50 β€” πŸ‘ 64    πŸ” 1    πŸ’¬ 1    πŸ“Œ 1
David Robinson | The unreasonable effectiveness of public work | RStudio (2019)
YouTube video by Posit PBC David Robinson | The unreasonable effectiveness of public work | RStudio (2019)

Here is a talk by @drob.bsky.social at @posit.co's conf 2019. The ideas shaped and voiced there are priceless. I've been suggesting this talk to my @datavizartskill.ikashnitsky.phd students ever since, and hopefully some of them found it as useful and motivating /3
youtu.be/th79W4rv67g?...

04.01.2025 23:41 β€” πŸ‘ 16    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0
Apple Music Replay '24

#1 Taylor Swift: 25,520
#

Apple Music Replay '24 #1 Taylor Swift: 25,520 #

"How did you spend 2024?"
"I'll tell you how I spent 5% of it"

05.12.2024 14:51 β€” πŸ‘ 10    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
library(tidyverse)
library(adventdrob)

input <- advent_input(2, 2024)
x <- input$x

is_safe <- function(report) {
  d <- diff(report)
  return((all(d > 0) || all(d < 0)) && all(between(abs(d), 1, 3)))
}

is_safe_part2 <- function(report) {
  return(is_safe(report) ||
           any(map_lgl(seq_along(report), \(i) is_safe(report[-i]))))
}

input$x %>%
  str_split(" ") %>%
  map(as.numeric) %>%
  map_lgl(is_safe_part2) %>%
  sum()

library(tidyverse) library(adventdrob) input <- advent_input(2, 2024) x <- input$x is_safe <- function(report) { d <- diff(report) return((all(d > 0) || all(d < 0)) && all(between(abs(d), 1, 3))) } is_safe_part2 <- function(report) { return(is_safe(report) || any(map_lgl(seq_along(report), \(i) is_safe(report[-i])))) } input$x %>% str_split(" ") %>% map(as.numeric) %>% map_lgl(is_safe_part2) %>% sum()

My #rstats solution to Day 2 of #adventofcode

* I feel like half of Advent of Code puzzles need a diff(), especially in the early days!
* Didn't use much tidyverse today (except map_lgl and between, but those could have easily been replaced)

02.12.2024 05:26 β€” πŸ‘ 20    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
input %>%
  separate(x, c("first", "second"), convert = TRUE) %>%
  summarize(sum(abs(sort(second) - sort(first))))

input %>% separate(x, c("first", "second"), convert = TRUE) %>% summarize(sum(abs(sort(second) - sort(first))))

When I woke up I realized my Part 1 could have been WAY shorter with sort πŸ€¦β€β™‚οΈ

01.12.2024 15:44 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
library(tidyverse)
library(adventdrob)

input <- advent_input(1, 2024)

separated <- input %>%
  separate(x, c("first", "second"), convert = TRUE)

# Part 1
separated %>%
  gather(type, value) %>%
  group_by(type) %>%
  mutate(rank = rank(value, ties.method = "first")) %>%
  ungroup() %>%
  spread(type, value) %>%
  summarize(sum(abs(second - first)))

# Part 2
totals %>%
  count(first = second, sort = TRUE) %>%
  inner_join(separated, by = "first") %>%
  summarize(sum(first * n))

library(tidyverse) library(adventdrob) input <- advent_input(1, 2024) separated <- input %>% separate(x, c("first", "second"), convert = TRUE) # Part 1 separated %>% gather(type, value) %>% group_by(type) %>% mutate(rank = rank(value, ties.method = "first")) %>% ungroup() %>% spread(type, value) %>% summarize(sum(abs(second - first))) # Part 2 totals %>% count(first = second, sort = TRUE) %>% inner_join(separated, by = "first") %>% summarize(sum(first * n))

My #rstats solution to Day 1 of #adventofcode

* Fun use of gather and spread (I know I'm supposed to be using pivot_longer and pivot_wider, but old-dog-new-tricks)
* One step I got stuck on was setting a ties.method in rank()

01.12.2024 05:12 β€” πŸ‘ 42    πŸ” 2    πŸ’¬ 2    πŸ“Œ 0

Who is doing #rstats Advent of Code this year? β„οΈπŸŽ„

01.12.2024 02:47 β€” πŸ‘ 41    πŸ” 5    πŸ’¬ 10    πŸ“Œ 1
Upfront (allupfront.com) is fixing the childcare industry through accurate, complete data so the government, parents, and providers can all make decisions and operate with optimal results. Our SaaS platform cleans, validates, and provides vital insights on childcare data (price, hours, location, availability, etc.) and serves as a central hub for every stakeholder. A Techstars portfolio company, Upfront has seen rapid growth with customers such as the states of Maryland, Arizona, and North Carolina. 



We are looking for a engineer to maintain our system ingest data from our clients, clean and enrich that data, and integrate it into our production system. 



Responsibilities:

Build a system for ingesting daily batches of data from a client’s API
Develop and maintain ETL scripts for processing and enriching that data
Manage data quality and accuracy, such as developing automated tests


We expect you to have:

The ability to architect a data ingestion platform from the ground up
Extensive experience with at least one platform for scheduled ETL pipelines 
Extensive experience in dbt, AWS and Snowflake
Intermediate to advanced proficiency in Python
Attention to detail and proactivity when it comes to data quality


Extra points for:

Skill at visualizing and drawing insights from data
Experience pulling data from public websites
Scrappy mindset- we're a small, but smart team and nothing is above or below our job title

Upfront (allupfront.com) is fixing the childcare industry through accurate, complete data so the government, parents, and providers can all make decisions and operate with optimal results. Our SaaS platform cleans, validates, and provides vital insights on childcare data (price, hours, location, availability, etc.) and serves as a central hub for every stakeholder. A Techstars portfolio company, Upfront has seen rapid growth with customers such as the states of Maryland, Arizona, and North Carolina. We are looking for a engineer to maintain our system ingest data from our clients, clean and enrich that data, and integrate it into our production system. Responsibilities: Build a system for ingesting daily batches of data from a client’s API Develop and maintain ETL scripts for processing and enriching that data Manage data quality and accuracy, such as developing automated tests We expect you to have: The ability to architect a data ingestion platform from the ground up Extensive experience with at least one platform for scheduled ETL pipelines Extensive experience in dbt, AWS and Snowflake Intermediate to advanced proficiency in Python Attention to detail and proactivity when it comes to data quality Extra points for: Skill at visualizing and drawing insights from data Experience pulling data from public websites Scrappy mindset- we're a small, but smart team and nothing is above or below our job title

My wife Dana is hiring a full-time Data Engineer at her company!

Great role for someone with strong experience in Python, dbt, and Snowflake who wants to join a growing startup in the government data space

Please forward to strong data folks you know!

www.linkedin.com/jobs/view/40...

26.11.2024 17:51 β€” πŸ‘ 16    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Was there a period where you were using the early tools personally / in teaching before you uploaded them to CRAN?

Did that change over the course of reshape, reshape2, ggplot, ggplot2?

25.11.2024 17:19 β€” πŸ‘ 22    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

`rm -rf` ❌
β€’ remove rf??? what does that even mean???? (nothing)
β€’ boring
β€’ hard to remember

`rm -fr` βœ…
β€’ remove forreal πŸ’…πŸΌπŸ’…πŸΌ
β€’ makes u smile every time
β€’ will never forget

13.11.2024 21:51 β€” πŸ‘ 137    πŸ” 24    πŸ’¬ 6    πŸ“Œ 3

#rstats is actually fighting about Base versus Tidyverse again on this platform. We are so back

14.11.2024 23:34 β€” πŸ‘ 209    πŸ” 21    πŸ’¬ 12    πŸ“Œ 2

ggplot(data, aes(x,y)) +
geom_jitter(velocity = units(17000, "mph"))

help, my data is stuck in LEO

#RStats #ggplot #dataviz

12.11.2024 15:41 β€” πŸ‘ 100    πŸ” 9    πŸ’¬ 3    πŸ“Œ 2
Preview
The Tyranny of the Marginal User why consumer software gets worse, not better, over time

I think about this piece at least once a week: nothinghuman.substack.com/p/the-tyrann...

11.11.2024 17:42 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

Tired: P-hacking
Wired: Querymandering

10.11.2024 22:56 β€” πŸ‘ 37    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

I’ve been thinking about this too!!

I think one important shift in the last ten years is that data analysts are much more likely to use SQL + scripting, so β€œanalysts that can program” is no longer a niche that gets its own title

10.11.2024 22:47 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Question for #databs folks:

I am searching for a recent write up of how data careers/titles are evolving. Has anyone written or read something that resonated on this lately?

I’m hoping for a boots on the ground point-of-view of basically β€œwhere have all the data scientists gone” 🀠

09.11.2024 18:41 β€” πŸ‘ 55    πŸ” 7    πŸ’¬ 17    πŸ“Œ 0

Remember in Squid Game where the contestants in mortal danger barely managed to reach safety through a popular vote

And then later the dull banality of regular life drove them to voluntarily re-enter mortal danger

Dunno what made me think of that

06.11.2024 05:22 β€” πŸ‘ 136    πŸ” 13    πŸ’¬ 5    πŸ“Œ 0

My fav starter packs so far, a thread:

stats: go.bsky.app/Ki7PjpS
stats: go.bsky.app/7TBN5rX
causal inference: go.bsky.app/FdemGAZ
package devs: go.bsky.app/N1569Qh
data peeps: go.bsky.app/8TdEfdK
medical stats: go.bsky.app/ArqEz36
bioinformatics: go.bsky.app/Ha64Gmv
r-ladies: go.bsky.app/Vgxwa2F

26.10.2024 19:23 β€” πŸ‘ 280    πŸ” 134    πŸ’¬ 19    πŸ“Œ 30
Positron A next-generation data science IDE

We've got a brand new, baby website for Positron! Take a look if you are interested in getting started, and please let us know how it goes:
positron.posit.co

28.10.2024 16:40 β€” πŸ‘ 121    πŸ” 44    πŸ’¬ 4    πŸ“Œ 3

I miss when bluesky was good

25.05.2023 14:18 β€” πŸ‘ 209    πŸ” 28    πŸ’¬ 2    πŸ“Œ 1

Twitter is San Francisco: still tech leader, but experiencing a doom loop

Mastodon is Boise: had a big wfh surge, but nothing actually there

Substack is Cambridge MA: go when you want to learn

This place is Miami

09.05.2023 22:07 β€” πŸ‘ 13    πŸ” 4    πŸ’¬ 5    πŸ“Œ 0

@drob is following 20 prominent accounts