Florian Huber's Avatar

Florian Huber

@me-datapoint.bsky.social

Professor for data science at HSD, @zdd-hsd.bsky.social | ML fan & critic | current research mostly #datascience, #machinelearning, #cheminformatics #dataviz #nlp | ✨ #openscience #openaccess #rse | living data point 🚲

2,057 Followers  |  588 Following  |  45 Posts  |  Joined: 08.09.2024  |  1.8617

Latest posts by me-datapoint.bsky.social on Bluesky

Please stop saying β€œThe Tanimoto similarity is” – RDKit blog A simple tip to explain what you actually did

Today's #RDKit blog post is a heartfelt plea for clearer communication.
greglandrum.github.io/rdkit-blog/p...

17.07.2025 11:22 β€” πŸ‘ 30    πŸ” 7    πŸ’¬ 2    πŸ“Œ 1

Great post!

We also noted the same thing, which triggered us to point out some pitfalls of various fingerprints --> www.biorxiv.org/content/10.1...

17.07.2025 11:40 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
BREAKING NEWS: AI coding may not be helping as much as you think Coding has been the strongest use case. But a new study from METR just dropped.

BREAKING NEWS: #AI coding may not be helping as much as you think

"But for now, the disconnnect between what coders thought they would get out of the tools efficiency-wise and what they actually did get out of them is cause for reevaluation." ~ @garymarcus.bsky.social

garymarcus.substack....

10.07.2025 23:13 β€” πŸ‘ 39    πŸ” 12    πŸ’¬ 2    πŸ“Œ 0
Preview
Paris cycling numbers double in one year thanks to massive investment and it's not stopping The report delves into the nuances of Parisian cycling culture, exploring the vibrant community of riders who navigate the city's streets

Paris cycling numbers double in one year thanks to massive investment and it’s not stopping.
A visionary urban policy lead by @annehidalgo.bsky.social πŸ’«πŸ«ΆπŸ»πŸ™πŸ»
momentummag.com/paris-cyclin...

12.07.2025 08:26 β€” πŸ‘ 389    πŸ” 125    πŸ’¬ 7    πŸ“Œ 18

Hier in @duesseldorf.bsky.social wird vorerst lieber noch jeder Parkplatz verteidigt...

(und leider nicht nur hier)

03.07.2025 21:08 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I don’t think anyone is prepared for what they just did w/ ICE.

This is not a simple budget increase. It is an explosion - making ICE bigger than the FBI, US Bureau of Prisons, DEA,& others combined.

It is setting up to make what’s happening now look like child’s play. And people are disappearing.

03.07.2025 18:58 β€” πŸ‘ 98572    πŸ” 38511    πŸ’¬ 4581    πŸ“Œ 2735
Preview
Verwaltung der Digitalisierung gestalten: Neue Arbeitsgruppe startet! | D64 – Zentrum fΓΌr digitalen Fortschritt Wir grΓΌnden am 17. Juli 2025 eine neue Arbeitsgruppe zur Verwaltungsdigitalisierung. Hier bringen wir digitale Kompetenz und politische Gestaltung zusammen.

Hey Verwaltungs-Digitalisierer:innen! Am 17. Juli starten wir eine neue AG zur Verwaltungsdigitalisierung. Eure Expertise aus dem ΓΆffentlichen Dienst ist gefragt! Gemeinsam gestalten wir die Zukunft der ΓΆffentlichen, digitalen Verwaltung πŸ’ͺ

d-64.org/veranstaltun...

01.07.2025 11:38 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
Preview
Effective data visualization strategies in untargeted metabolomics Covering: 2014 to 2023 for metabolomics, 2002 to 2023 for information visualization LC-MS/MS-based untargeted metabolomics is a rapidly developing research field spawning increasing numbers of…

πŸ”“Read in our MS Metabolomics themed collection, a #OpenAccess review from Kevin Mildau, Henry Ehlers, @jjjvanderhooft.bsky.social et al. at @w-u-r.bsky.social‬ @tuwien.at‬ covering effective data visualization strategies in untargeted metabolomics #natprod

Find it hereπŸ”½

26.06.2025 11:39 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1
Post image Post image Post image

@jorainer.bsky.social and @philouail.bsky.social gave a great overview of the ecosystem around #RforMassSpectrometry and #XCMS!

#MetSoc25
I am super glad they now also provide options to combine with #Python and #matchms (thanksπŸ™)

26.06.2025 09:32 β€” πŸ‘ 11    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Post image

πŸ“’ Poster 1001 at #MetSoc2025: Marilyn De Graeve on our #SpectriPy #rstats package to integrate #python and #rstats packages for #MassSpec data analysis . TODAY

23.06.2025 11:09 β€” πŸ‘ 31    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0

Hi, in case your phone didn't pick up the QR code to the slides of my Hitch-Hikers Guide to Computational Metabolomics talk this morning at #Metabolomics2025, featuring #xcms, #massbank, not #metfrag but #CASMI and #MetFamily, please find them at doi.org/10.5281/zeno...

25.06.2025 09:15 β€” πŸ‘ 16    πŸ” 8    πŸ’¬ 1    πŸ“Œ 0
Post image Slide from presentation of Steffen Neumann

Slide from presentation of Steffen Neumann

Great keynote by @sneumann.bsky.social at #MetSoc25, strongly advocating for #opensource , data-sharing, and making things interoperable.

Glad to also spot #matchms in this universe :)

25.06.2025 07:35 β€” πŸ‘ 18    πŸ” 4    πŸ’¬ 2    πŸ“Œ 0
Post image

Proud of Niek de Jonge who did a fantastic job in presenting his work on cross-ion mode spectral similarity scoring! 😎 πŸ‘
Work with Florian Huber @me-datapoint.bsky.social

#metabolomics #CompMetabolomics #MetSoc25 #MS2DeepScore

23.06.2025 21:46 β€” πŸ‘ 20    πŸ” 4    πŸ’¬ 0    πŸ“Œ 0
Chemical Space Visualizations using UMAP and various molecular fingerprints.

Chemical Space Visualizations using UMAP and various molecular fingerprints.

4/4
We also highlight options for count fingerprints, such as log-counts and IDF weighted counts. The latter can be used to adjust the bit importance to a dataset of your choice.

An example use-case are chemical space visualizations.

Preprint: www.biorxiv.org/content/10.1...

23.06.2025 09:22 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

3/4
A huge issue is bit collisions.
Fingerprints with a high bit occupation (RDKit, MAP4) often lead to (1) arbitrary misinterpretations, (2) shifts to high Tanimoto scores, (3) very different handling of small and large molecules.

--> Consider using sparse fingerprints!
--> Morgan >> MAP4 / RDKit

23.06.2025 09:22 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Benchmarking plot on fingerprint duplications.

Benchmarking plot on fingerprint duplications.

2/4
We focused on weaknesses of the fingerprints.
Many show frequent duplicates, so same fingerprint for different compounds. Most problematic: this can include *very* different compounds ending up with identical fingerprints.

- MAP4 >> Morgan-type >> daylight
- count >> binary

#cheminformatics

23.06.2025 09:22 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Sketch of count/binary fingerprints and weighing options.

Sketch of count/binary fingerprints and weighing options.

New preprint out!
1/4

@julianpollmann.bsky.social and I went down several rabbit holes to assess some commonly used molecular fingerprints.

Bottom line: For large datasets, make an effort to select suitable settings. "We used Tanimoto" is not good enough.

--> www.biorxiv.org/content/10.1...

23.06.2025 09:22 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Good start for me at #metabolomics2025 with a hands-on workshop on MS2LDA by Jonas Dietrich, Rosina Torres Ortega and @jjjvanderhooft.bsky.social.

23.06.2025 08:11 β€” πŸ‘ 6    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Elbe river seen from a train somewhere after Dresden.

Elbe river seen from a train somewhere after Dresden.

Went by train to #Prague for #metabolomics2025.

These are the kind of moments that remind me how great the European project is. No border controls, no visas. Just a train following a river to the neighboring country.

22.06.2025 14:02 β€” πŸ‘ 16    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Orwell’s 1984, but with LLMs

22.06.2025 02:07 β€” πŸ‘ 237    πŸ” 51    πŸ’¬ 14    πŸ“Œ 8
Preview
Ergebnisse

Hier alle Ergebnisse: fahrradklima-test.adfc.de/ergebnisse

(Besonders von Ruhrpott bis KΓΆln ist es leider ziemlich traurig)

17.06.2025 19:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Screenshot vom ADFC Fahrradklima-Test 2024 fΓΌr DΓΌsseldorf.

Screenshot vom ADFC Fahrradklima-Test 2024 fΓΌr DΓΌsseldorf.

Da kann der BΓΌrgermeister @duesseldorf.bsky.social noch so oft die "Fahrradhauptstadt" (πŸ₯ΉπŸ€­πŸ˜­) herbeibeschwΓΆren... es braucht dann doch ein bisschen mehr als ein paar Kleckse Farbe.

#DΓΌsseldorf weiterhin konstant bei 4- im #ADFC Klimatest. LΓ€uft. @adfcnrw.bsky.social @adfc-duesseldorf.de

17.06.2025 19:35 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

At #ASMS2025, we presented a next-gen workflow pairing the #timsMetabo MoRE (Mobility Range Enhancement) with #mzmine and #DreaMS

Compared to conventional LC-MS2, the data delivered:

- 84% more detected features
- 5.7Γ— more MSΒ² spectra
- 3Γ— more spectral matches

Read the full poster to learn MoRE

09.06.2025 20:10 β€” πŸ‘ 11    πŸ” 3    πŸ’¬ 0    πŸ“Œ 1
Post image

When you prepare lesson material while being hungry...

(added some text edits and more sketches/figures to the NLP chapters of the "Hands-on Introduction to #DataScience with #Python" textbook)

florian-huber.github.io/data_science...

#OpenScience #Teaching #CCBY

06.06.2025 11:38 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Look at the gender breakdown of who speaks in popular films!

26.05.2025 14:50 β€” πŸ‘ 385    πŸ” 128    πŸ’¬ 10    πŸ“Œ 19

Loving this: "The Copilot Delusion"

https://deplet.ing/the-copilot-delusion/

23.05.2025 11:58 β€” πŸ‘ 34    πŸ” 11    πŸ’¬ 1    πŸ“Œ 2
Post image

OpenAI just updated ChatGPT to be able to use RDKit, a cheminformatics Python package.

OpenAI's president says this makes ChatGPT "useful for scientific work across health, biology, and chemistry," but it is hilariously still not good at chemistry (🧡)

#chemsky #AI βš—οΈπŸ§ͺπŸ–₯️

23.05.2025 19:52 β€” πŸ‘ 81    πŸ” 27    πŸ’¬ 8    πŸ“Œ 10
Post image Post image

New release of my "Hands-on Introduction to Data Science with Python" textbook!

Contains many text edits and figure updates. For instance, in the sections on Clustering and Machine Learning.

All fully #opensource and #openaccess. Figures are #CCBY.

--> florian-huber.github.io/data_science...

14.05.2025 20:10 β€” πŸ‘ 23    πŸ” 6    πŸ’¬ 1    πŸ“Œ 0
Preview
Translating community-wide spectral library into actionable chemical knowledge: a proof of concept with monoterpene indole alkaloids - Journal of Cheminformatics With over 3000 representatives, the monoterpene indole alkaloids (MIAs) class is among the most diverse families of plant natural products. The MS/MS spectral space exploration of these complex compou...

It is with tremendous emotion that I share with you our recent work @jcheminf.bsky.social rb.gy/gynwlf that resulted in the update of the MIADB and the generation of valuable spectrometric signatures that could be used as #MassQL queries πŸ™
S. Szwarc @univparissaclay.bsky.social @adafede.bsky.social

28.04.2025 21:42 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 2    πŸ“Œ 1
Preview
Sage Journals: Discover world-class research Subscription and open access journals from Sage, the world's leading independent academic publisher.

Great thing about working in "Data Science" is working on so many different fun projects and topics!

Til Hunke, a master student supervised by Jochen Steffens and me, used NLP tools to analyze the lyrics of pop songs in the German charts from 1954 to 2022.
--> journals.sagepub.com/doi/10.1177/...

28.04.2025 07:15 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@me-datapoint is following 20 prominent accounts