Aneesa Valentine's Avatar

Aneesa Valentine

@aneesavalentine.bsky.social

scientist | algorithms for single-cell -omics | enabling the next generation of scientists: stem careers, sci-comm & community blog: medium.com/@aneesav/

40 Followers  |  104 Following  |  22 Posts  |  Joined: 24.11.2024  |  1.7619

Latest posts by aneesavalentine.bsky.social on Bluesky

Post image

Yes.

12.01.2025 01:54 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

But I could chalk that up to your point on threads of independent reasoning.

Appreciate you coming down this rabbit hole with me lol.

03.01.2025 17:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Yup, that exactly. I suppose sometimes I wonder at what point do preserved high-level trends become a product of excellent reproducibility, vs an artifact of an elbow plot that I decided should be cut off at PC15 vs another group’s PC4.

03.01.2025 17:19 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I find there’s so much left to chance: random seed, nuances in software/tooling, personal preference of parameterization etc.

03.01.2025 16:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I do think there’s a distinction to be made between reproducing code, and reproducing biological insights. To your point, reproducing code is relatively simple. But reproducing analyses to yield similar insight?

03.01.2025 16:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Solid point. Often reproducibility is challenging not even between independent research groups, but simply iterating on a single experiment within a lab.

03.01.2025 16:41 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Right. Confirming accuracy of things is somewhat easier in a wet lab. If your cells die, you probably did something wrong. The qualitative confirmation is usually explicit, and re-tracing your steps is structured.

What methods do you take to support deterministic results when doing deep learning?

03.01.2025 16:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - aneesav/singlecell: Reproducing figures in reputable scientific papers with open-source data Reproducing figures in reputable scientific papers with open-source data - aneesav/singlecell

How many people actually try to reproduce figures in scientific papers?

How many authors publish clear, robust methods?

And how do you reliably infer accuracy of reproduction in a computational space where β€œrandom seed” reigns?

github.com/aneesav/sing...

03.01.2025 15:11 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

You don’t have 1000+ citations in <24hrs? You’re clearly not locked in.

01.01.2025 16:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image Post image

Everybody’s got an β€œunwrapped” now. First Spotify, then Apple, now LinkedIn.

Here’s to a dynamic 2025: more writing, coding, teaching, learning.

More doing.

Happy New Year’s Eve!

31.12.2024 14:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft The project’s leader says that allowing everyone to access the collection of public-domain books will help β€œlevel the playing field” in the AI industry.

Why release another open-source model that’s actually not as accessible as folks might think? How about releasing resources to enable training instead?

Good stuff, Harvard.

www.wired.com/story/harvar...

12.12.2024 12:59 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A Scientist’s Lament: Do You Hate Command Line Too? β€œScience doesn’t happen in a notebookΒ .” Here’s why.

β€œScience doesn’t happen in a notebook.”

That little black box was the bane of my existence in grad school. GUIs >>>.

But if you find command line intimidating like I did, hopefully this piece I wrote can help you get started: medium.com/@aneesav/a-s...

11.12.2024 23:32 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
New report highlights the scientific impact of open source software Two of the scientists who won this year’s Nobel Prize for cracking the code of proteins’ intricate structures relied, in part, on a series of computing

Then, are the 1% of scientists that sit at the bleeding edge of scientific discovery truly the sole contributors to said discovery?

It would seem that those pioneering new methods are piggy-backing off the work of their predecessors.

Science is a team effort.

www.statnews.com/sponsor/2024...

04.12.2024 16:00 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Listened to an interview last week that made me think.

What considerations does one need to implement from ideation, when building a biology product that uses ML, vs an ML product for biological use?

Naturally, these are 2 very different goals. But I wonder how do you approach dev for either?

02.12.2024 15:03 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Our new study shows that SARS-CoV-2 spike protein accumulates & persists in the body for years after infection, especially in the skull-meninges-brain axis, potentially driving long COVID. mRNA vaccines help but cannot stop itπŸ”¬πŸ§ πŸ¦ πŸ§΅Your weekend readπŸ‘‡
@cellpress.bsky.social
cell.com/cell-host-mi...

29.11.2024 16:01 β€” πŸ‘ 743    πŸ” 381    πŸ’¬ 56    πŸ“Œ 63

Fascinating. I wonder what level of effort assembling an analogous training β€œpangenome” dataset (sampled across every demographic) might require. Certainly is sorely needed, both in pure AI and AI for better patient outcomes.

Thanks for sharing Tim.

29.11.2024 16:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Not obsolete at all, and won’t be for the foreseeable future. We need more representative reference genomes.

But I do hope the β€œAGI 2025” folks see this πŸ˜…

29.11.2024 16:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Not obsolete at all, and won’t be for the foreseeable future. We need more representative reference genomes.

But I do hope the β€œAGI 2025” folks see this πŸ˜…

29.11.2024 16:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

After this I gave it additional context by saying, β€œI’m a woman, try again.” This was the output:

29.11.2024 16:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0
Post image

I asked ChatGPT to generate an image of what it thinks I look like, based on everything it knows about me.

Needless to say, we’ve got some work to do still.

#ai #bias

29.11.2024 16:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I made a starter pack for algorithmic genomics. It's certainly incomplete, but already has a ton of awesome peeps. Let me know if you know people I should add (with a focus on algorithms and data structures in genomics)

go.bsky.app/TRWCnZs

12.11.2024 14:03 β€” πŸ‘ 141    πŸ” 65    πŸ’¬ 25    πŸ“Œ 2

Ok, I tried to create my own list of people working on developing statistical or machine learning models applied to omics data. I am sure I missed a lot of cool people. If you'd like to be added, let me know. #Stats #ML #Omics
go.bsky.app/73rcuJn

24.11.2024 07:50 β€” πŸ‘ 95    πŸ” 36    πŸ’¬ 38    πŸ“Œ 4

TechBio Transformers (founded by @dr-alphalyrae.bsky.social) is a growing global community of folks who want to (+)ly impact the field of technology & biology, while also intentionally building healthy dialogue and community

Follow along, & even consider joining the group!
bsky.app/starter-pack...

25.11.2024 18:15 β€” πŸ‘ 26    πŸ” 9    πŸ’¬ 2    πŸ“Œ 1

Such cool renderings. Do you host these anywhere? Would love to share 1 or 2 but want to be able to give you the appropriate credit.

26.11.2024 19:34 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

I sometimes forget how flexible proteins are. It becomes really apparent when you use NMR states for an animation. That poor cofactor is getting pushed around a lot πŸ˜…

#sciart #blender3d #biocatalysis

25.11.2024 17:15 β€” πŸ‘ 477    πŸ” 82    πŸ’¬ 13    πŸ“Œ 9

Here's a starter pack of Black Scientists and Organizations (currently STEMM disciplines)! go.bsky.app/Ao3Qt9a

12.11.2024 19:03 β€” πŸ‘ 181    πŸ” 91    πŸ’¬ 26    πŸ“Œ 3
Preview
Cookiecutter for Computational Molecular Sciences: A Best Practices Ready Python Project Generator Scientific software development takes far more than good programming abilities and scientific reasoning. Concepts such as version control, continuous integration, packaging, deployment, automatic documentation compiling, licensing, and even file structure are not traditionally taught to scientific programmers. The skill gap leads to inconsistent code quality and difficulty deploying products to the broader audience. Most of the implementation of these skills however can be constructed at project inception. The Cookiecutter for Computational Molecular Sciences generates ready-to-go Python projects that incorporate all of the concepts above from a single command. The final product is then a software project which lets developers focus on the science and minimizes worries about nonscientific and nonprogramming concepts because the best practices, as established by the Molecular Sciences Software Institute, have already been incorporated for them. This is a community driven project with widespread adoption across the computational molecular sciences. The Molecular Sciences Software Institute and Computational Molecular Sciences community also continually contribute and update the Cookiecutter for Computational Molecular Science, ensuring that the project is responsive to community needs and tool updates. All are welcome to suggest changes and contribute to making this the best starting point for Python-based scientific code.

Students aren’t taught software dev best practices in their formal comp bio training.

Cookiecutter helps by generating a directory on a user’s computer with all of the configurations and file structures needed.

pubs.acs.org/doi/10.1021/...

25.11.2024 20:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Humans share 70% of our DNA with zebrafish. So when you're having difficulty getting anything done, it's usually because a zebrafish is using the DNA.

12.02.2024 22:02 β€” πŸ‘ 2125    πŸ” 572    πŸ’¬ 24    πŸ“Œ 26
Preview
Deep learning-based predictions of gene perturbation effects do not yet outperform simple linear methods Advanced deep-learning methods, such as transformer-based foundation models, promise to learn representations of biology that can be employed to predict in silico the outcome of unseen experiments, su...

Shocker.

Deep learning is cool and all, but simpler methods still do the job. And quite well it seems.

www.biorxiv.org/content/10.1...

24.11.2024 21:49 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@aneesavalentine is following 20 prominent accounts