What a banger is skrub @skrub-data.bsky.social !
Big thumbs up for the sklearn team & the maintainer of this package
@mersault.bsky.social
Professor at BIFOLD & TU Berlin, research on data engineering for ML. Previously at UvA, NYU, Amazon, Twitter. Opinions are my own. https://deem.berlin
What a banger is skrub @skrub-data.bsky.social !
Big thumbs up for the sklearn team & the maintainer of this package
I don't know what to say. You dream about it for so long and then when it finally happens you're in shock. I'm so proud of you Larry. www.theguardian.com/technology/2...
10.09.2025 19:52 β π 17 π 2 π¬ 0 π 1It looks like a date frame, but Skrub stores the whole transformation pipeline in the magic skb attribute!
02.09.2025 19:10 β π 7 π 2 π¬ 0 π 0π₯CAN YOU BUILD AI MODELS that give you (verifiable) uncertainty estimates in their outputs? Cool talk on ML, classifiers, + calibration www.youtube.com/watch?v=SI6b... by scikit-learn architect @gaelvaroquaux.bsky.social
*with ninja-level modeling of variance you probably didn't know existed !
β¨ Excited to present our workshop paper at DataWorld at #ICML2025 tomorrow π¨π¦
We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.
Visit our poster:
π
Saturday, July 19, 10:05 AM - 11:20 AM
π West Meeting Room 208-209
Thanks to my supervisor
@mersault.bsky.social!
Paper: openreview.net/pdf?id=JJYHb...
Code: github.com/OlgaOvcharen...
On Saturday, @oovcharenko.bsky.social will present a poster on "Towards Cross-Modal Error Detection with Tables and Images" at the the Data World workshop, which focuses on finding errors in tables by inspecting corresponding image data:
olgaovcharenko.github.io/_pages/MERIT...
(3/3)
On Thursday, @oovcharenko.bsky.social will present her research on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data". This paper is joint work with ETH Zuerich and was selected as a spotlight poster:
icml.cc/virtual/2025...
(2/3)
The DEEM Lab is at ICML this week for the first time, with two contributions!
(1/3)
Our paper "Towards Cross-Modal Error Detection with Tables and Images" was accepted for the DataWorld workshop at ICML'25! π₯³
Thanks to @mersault.bsky.social!
Our demo "mlidea: Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'" was accepted at VLDB! π₯³
We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions
youtu.be/ePGm1J6S2qk
Joint work w/ @mersault.bsky.social @p-groth.bsky.social
π’ We are hosting a DuckDB meetup in Berlin during the week of the SIGMOD conference.
π The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!
π If you plan to attend, please register at duckdb.org/events/2025/...
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines that optimize ML models along responsibility objectives.
This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.
Details: deem.berlin#jobs-17725
One more week to apply to this exciting position... and another position on #CausalRepresentationLearning and #ReinforcementLearning for learning provably correct #concepts from raw data opening up soon!
12.05.2025 09:21 β π 7 π 3 π¬ 0 π 0We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems.
This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-2225
π’ Our extended benchmark on self-supervised learning for single-cell data, scSSL-Bench π§¬, is now accepted at ICML (spotlight)!
Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!
I was invited to review for the "Journal of Pipeline Systems Engineering and Practice", seems our work on ML pipelines is finally recognised by other communities as well ;D
19.04.2025 09:19 β π 16 π 2 π¬ 0 π 0Thank you!
10.04.2025 18:23 β π 0 π 0 π¬ 0 π 0@recsys.bsky.social Quick question, is the full list of accepted workshops already published somewhere? I am looking for a target venue for the early work of a student of mine. Thx.
08.04.2025 08:09 β π 0 π 0 π¬ 1 π 0We have openings for student assistants in the DEEM Lab at @bifold.berlin. This is a great opportunity to work with PhD students, implement cool stuff, gather research experience and become a co-author of scientific publications :)
deem.berlin#jobs-193487
Thanks!
24.03.2025 16:49 β π 2 π 0 π¬ 1 π 0Still interviewing. Are you thinking about a second PhD? ;)
24.03.2025 09:32 β π 2 π 0 π¬ 1 π 0Our visionΒ "Towards Regaining Control over Messy ML Pipelines"Β was accepted for theΒ DAIS workshop at ICDE! π₯³
Initial experiments show LLMs are promising for extracting declarative query plans from messy ML code.
Joint work w/Β @guangchen811.bsky.social @oovcharenko.bsky.social @mersault.bsky.social
The Data Management for End-to-End Machine Learning workshop (@deem-workshop.bsky.social) will be back at #SIGMOD2025! β¨
π Check out the CfP: deem-workshop.github.io
π Submission deadline: March 21
π’ Notifications: April 25
Join us for the 9th edition in Berlin!
#DEEM2025
We have a **Postdoc opening** in Berlin on Responsible Data Engineering!
This is a fully-funded position with salary level E14 at the newly founded DEEM Lab, as part of @bifold.berlin .
Details available at deem.berlin#jobs-57624
We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on efficiently creating, maintaining and evaluating datasets and pipelines for ML use cases.
This is a fully-funded position at the newly founded DEEM Lab, as part of @bifold.berlin .
deem.berlin#jobs-2225