Sebastian's Avatar

Sebastian

@mersault.bsky.social

Professor at BIFOLD & TU Berlin, research on data engineering for ML. Previously at UvA, NYU, Amazon, Twitter. Opinions are my own. https://deem.berlin

429 Followers  |  550 Following  |  23 Posts  |  Joined: 26.12.2023  |  1.9843

Latest posts by mersault.bsky.social on Bluesky

Post image

What a banger is skrub @skrub-data.bsky.social !

Big thumbs up for the sklearn team & the maintainer of this package

01.10.2025 08:23 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Imagine A World Without Lawyers
YouTube video by Wutaii1 Nostalgia Imagine A World Without Lawyers

www.youtube.com/watch?v=--vL...

01.10.2025 17:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Larry Ellison overtakes Elon Musk as world’s richest person Oracle co-founder’s shares rose by 40% in early trading, valuing his fortune at $393bn, just ahead of Musk’s $384bn

I don't know what to say. You dream about it for so long and then when it finally happens you're in shock. I'm so proud of you Larry. www.theguardian.com/technology/2...

10.09.2025 19:52 β€” πŸ‘ 17    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
Post image

It looks like a date frame, but Skrub stores the whole transformation pipeline in the magic skb attribute!

02.09.2025 19:10 β€” πŸ‘ 7    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ”₯CAN YOU BUILD AI MODELS that give you (verifiable) uncertainty estimates in their outputs? Cool talk on ML, classifiers, + calibration www.youtube.com/watch?v=SI6b... by scikit-learn architect @gaelvaroquaux.bsky.social
*with ninja-level modeling of variance you probably didn't know existed !

01.09.2025 00:09 β€” πŸ‘ 34    πŸ” 5    πŸ’¬ 1    πŸ“Œ 0
Post image

✨ Excited to present our workshop paper at DataWorld at #ICML2025 tomorrow πŸ‡¨πŸ‡¦

We introduce the problem of detecting cross-modal errors in tabular data that originate from other modalities.

Visit our poster:
πŸ“… Saturday, July 19, 10:05 AM - 11:20 AM
πŸ“ West Meeting Room 208-209

18.07.2025 17:24 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Thanks to my supervisor
@mersault.bsky.social!

Paper: openreview.net/pdf?id=JJYHb...
Code: github.com/OlgaOvcharen...

18.07.2025 17:24 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

On Saturday, @oovcharenko.bsky.social will present a poster on "Towards Cross-Modal Error Detection with Tables and Images" at the the Data World workshop, which focuses on finding errors in tables by inspecting corresponding image data:

olgaovcharenko.github.io/_pages/MERIT...

(3/3)

14.07.2025 06:12 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

On Thursday, @oovcharenko.bsky.social will present her research on "scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data". This paper is joint work with ETH Zuerich and was selected as a spotlight poster:

icml.cc/virtual/2025...

(2/3)

14.07.2025 06:11 β€” πŸ‘ 4    πŸ” 3    πŸ’¬ 2    πŸ“Œ 0

The DEEM Lab is at ICML this week for the first time, with two contributions!

(1/3)

14.07.2025 06:10 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

Our paper "Towards Cross-Modal Error Detection with Tables and Images" was accepted for the DataWorld workshop at ICML'25! πŸ₯³

Thanks to @mersault.bsky.social!

10.06.2025 14:10 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Our demo "mlidea: Interactively Improving ML Data Preparation Code via 'Shadow Pipelines'" was accepted at VLDB! πŸ₯³

We demo suggestions for ML pipelines, similar to IntelliJ code inspections or Grammarly suggestions

youtu.be/ePGm1J6S2qk

Joint work w/ @mersault.bsky.social @p-groth.bsky.social

30.05.2025 19:09 β€” πŸ‘ 12    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

πŸ“’ We are hosting a DuckDB meetup in Berlin during the week of the SIGMOD conference.

πŸ“ The meetup will take place on June 26 (Thursday) south of the Tiergarten and will feature talks by Amine Mhedhbi, David Justen and dltHub!

πŸ“ If you plan to attend, please register at duckdb.org/events/2025/...

28.05.2025 15:15 β€” πŸ‘ 17    πŸ” 8    πŸ’¬ 0    πŸ“Œ 1

We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation pipelines that optimize ML models along responsibility objectives.

This is a fully-funded position at @bifold.berlin, co-supervised by Julia Stoyanovich from NYU.

Details: deem.berlin#jobs-17725

26.05.2025 04:05 β€” πŸ‘ 7    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Post image 24.05.2025 16:24 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

One more week to apply to this exciting position... and another position on #CausalRepresentationLearning and #ReinforcementLearning for learning provably correct #concepts from raw data opening up soon!

12.05.2025 09:21 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0

We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on data preparation for ML/AI systems.

This is a fully-funded position with salary level E13 at the newly founded DEEM Lab, as part of @bifold.berlin .

Details available at deem.berlin#jobs-2225

12.05.2025 03:33 β€” πŸ‘ 16    πŸ” 12    πŸ’¬ 0    πŸ“Œ 0
Post image 04.05.2025 20:15 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

πŸ“’ Our extended benchmark on self-supervised learning for single-cell data, scSSL-Bench 🧬, is now accepted at ICML (spotlight)!

Thanks to all collaborators from @bifold.berlin and @ethzurich.bsky.social!

01.05.2025 22:34 β€” πŸ‘ 7    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Post image

I was invited to review for the "Journal of Pipeline Systems Engineering and Practice", seems our work on ML pipelines is finally recognised by other communities as well ;D

19.04.2025 09:19 β€” πŸ‘ 16    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

Thank you!

10.04.2025 18:23 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@recsys.bsky.social Quick question, is the full list of accepted workshops already published somewhere? I am looking for a target venue for the early work of a student of mine. Thx.

08.04.2025 08:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We have openings for student assistants in the DEEM Lab at @bifold.berlin. This is a great opportunity to work with PhD students, implement cool stuff, gather research experience and become a co-author of scientific publications :)

deem.berlin#jobs-193487

08.04.2025 07:18 β€” πŸ‘ 8    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

Thanks!

24.03.2025 16:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Still interviewing. Are you thinking about a second PhD? ;)

24.03.2025 09:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our visionΒ "Towards Regaining Control over Messy ML Pipelines"Β was accepted for theΒ DAIS workshop at ICDE! πŸ₯³

Initial experiments show LLMs are promising for extracting declarative query plans from messy ML code.

Joint work w/Β @guangchen811.bsky.social @oovcharenko.bsky.social @mersault.bsky.social

07.03.2025 13:56 β€” πŸ‘ 11    πŸ” 5    πŸ’¬ 0    πŸ“Œ 0

The Data Management for End-to-End Machine Learning workshop (@deem-workshop.bsky.social) will be back at #SIGMOD2025! ✨

πŸ”— Check out the CfP: deem-workshop.github.io
πŸ“ Submission deadline: March 21
πŸ“’ Notifications: April 25

Join us for the 9th edition in Berlin!

#DEEM2025

07.02.2025 20:58 β€” πŸ‘ 7    πŸ” 4    πŸ’¬ 1    πŸ“Œ 2

We have a **Postdoc opening** in Berlin on Responsible Data Engineering!

This is a fully-funded position with salary level E14 at the newly founded DEEM Lab, as part of @bifold.berlin .

Details available at deem.berlin#jobs-57624

05.02.2025 08:31 β€” πŸ‘ 6    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

We have a PhD opening in Berlin on "Responsible Data Engineering", with a focus on efficiently creating, maintaining and evaluating datasets and pipelines for ML use cases.

This is a fully-funded position at the newly founded DEEM Lab, as part of @bifold.berlin .

deem.berlin#jobs-2225

03.02.2025 08:08 β€” πŸ‘ 15    πŸ” 5    πŸ’¬ 1    πŸ“Œ 3

@mersault is following 20 prominent accounts