@miangoar.bsky.social
Biologist that navigate in the oceans of diversity through space-time Protein evolution, metagenomics, AI/ML/DL Website https://miangoaren.github.io/
Proteins are dynamic structures, but structural biology often shows them as static snapshots. Inspired by long-exposure photography and generative art, I built ProteinCHAOS, an artistic tool inspired by molecular dynamics to capture protein flexibility over time, much like long-exposure images.
23.11.2025 23:18 β π 74 π 20 π¬ 5 π 0A table showing profit margins of major publishers. A snippet of text related to this table is below. 1. The four-fold drain 1.1 Money Currently, academic publishing is dominated by profit-oriented, multinational companies for whom scientific knowledge is a commodity to be sold back to the academic community who created it. The dominant four are Elsevier, Springer Nature, Wiley and Taylor & Francis, which collectively generated over US$7.1 billion in revenue from journal publishing in 2024 alone, and over US$12 billion in profits between 2019 and 2024 (Table 1A). Their profit margins have always been over 30% in the last five years, and for the largest publisher (Elsevier) always over 37%. Against many comparators, across many sectors, scientific publishing is one of the most consistently profitable industries (Table S1). These financial arrangements make a substantial difference to science budgets. In 2024, 46% of Elsevier revenues and 53% of Taylor & Francis revenues were generated in North America, meaning that North American researchers were charged over US$2.27 billion by just two for-profit publishers. The Canadian research councils and the US National Science Foundation were allocated US$9.3 billion in that year.
A figure detailing the drain on researcher time. 1. The four-fold drain 1.2 Time The number of papers published each year is growing faster than the scientific workforce, with the number of papers per researcher almost doubling between 1996 and 2022 (Figure 1A). This reflects the fact that publishersβ commercial desire to publish (sell) more material has aligned well with the competitive prestige culture in which publications help secure jobs, grants, promotions, and awards. To the extent that this growth is driven by a pressure for profit, rather than scholarly imperatives, it distorts the way researchers spend their time. The publishing system depends on unpaid reviewer labour, estimated to be over 130 million unpaid hours annually in 2020 alone (9). Researchers have complained about the demands of peer-review for decades, but the scale of the problem is now worse, with editors reporting widespread difficulties recruiting reviewers. The growth in publications involves not only the authorsβ time, but that of academic editors and reviewers who are dealing with so many review demands. Even more seriously, the imperative to produce ever more articles reshapes the nature of scientific inquiry. Evidence across multiple fields shows that more papers result in βossificationβ, not new ideas (10). It may seem paradoxical that more papers can slow progress until one considers how it affects researchersβ time. While rewards remain tied to volume, prestige, and impact of publications, researchers will be nudged away from riskier, local, interdisciplinary, and long-term work. The result is a treadmill of constant activity with limited progress whereas core scholarly practices β such as reading, reflecting and engaging with othersβ contributions β is de-prioritized. What looks like productivity often masks intellectual exhaustion built on a demoralizing, narrowing scientific vision.
A table of profit margins across industries. The section of text related to this table is below: 1. The four-fold drain 1.1 Money Currently, academic publishing is dominated by profit-oriented, multinational companies for whom scientific knowledge is a commodity to be sold back to the academic community who created it. The dominant four are Elsevier, Springer Nature, Wiley and Taylor & Francis, which collectively generated over US$7.1 billion in revenue from journal publishing in 2024 alone, and over US$12 billion in profits between 2019 and 2024 (Table 1A). Their profit margins have always been over 30% in the last five years, and for the largest publisher (Elsevier) always over 37%. Against many comparators, across many sectors, scientific publishing is one of the most consistently profitable industries (Table S1). These financial arrangements make a substantial difference to science budgets. In 2024, 46% of Elsevier revenues and 53% of Taylor & Francis revenues were generated in North America, meaning that North American researchers were charged over US$2.27 billion by just two for-profit publishers. The Canadian research councils and the US National Science Foundation were allocated US$9.3 billion in that year.
The costs of inaction are plain: wasted public funds, lost researcher time, compromised scientific integrity and eroded public trust. Today, the system rewards commercial publishers first, and science second. Without bold action from the funders we risk continuing to pour resources into a system that prioritizes profit over the advancement of scientific knowledge.
We wrote the Strain on scientific publishing to highlight the problems of time & trust. With a fantastic group of co-authors, we present The Drain of Scientific Publishing:
a π§΅ 1/n
Drain: arxiv.org/abs/2511.04820
Strain: direct.mit.edu/qss/article/...
Oligopoly: direct.mit.edu/qss/article/...
Source
Protein sequence-to-structure learning Is this the end(-to-end revolution)
onlinelibrary.wiley.com/doi/abs/10.1...
I strongly recommend making cat-based diagrams to illustrate complex topics in protein science: "Figure 4 considers [...] invariance and equivariance with respect to translations and rotations in 3D. For illustration purposes, the figure includes a series of cat cartoons in 2D."
24.10.2025 05:48 β π 0 π 0 π¬ 1 π 0I just want to create hype and say that I made a 10-class course to introduce people to AI-driven protein design. Itβs around 750 slides and will be freely available for anyone who wants to use them and, most importantly, improve them. Stay tuned :)
09.10.2025 17:59 β π 4 π 1 π¬ 0 π 0I'm not sure. LinkedIn makes me cringe, it feels so inorganic to me. On the other hand, my Twitter algorithm recommends really good stuff about proteins, microbes, and AI. In contrast, the algorithm of π¦ is bad :( and the good content (I.e. Science) mostly comes from reposts by our colleagues.
17.09.2025 19:53 β π 2 π 0 π¬ 1 π 0Happy to share Piecing Together the History of Protein Folds From a Fragmented Evolutionary Record π§ͺ
It appears as part of a Special Section in @genomebiolevol.bsky.social organized by @cpuentelelievre.bsky.social @proteinmechanic.bsky.social and J. Douglas
doi.org/10.1093/gbe/...
π«‘
12.09.2025 01:10 β π 3 π 1 π¬ 0 π 0This is a very cool ancestral reconstruction study by @krishnareddy.bsky.social et al. that I recommend reading! @rachellegaudet.bsky.social and I thought it was so interesting that we wrote a News & Views about it, check it out: rdcu.be/eCfyl
26.08.2025 04:28 β π 8 π 4 π¬ 0 π 1If you're an undergrad and want to intern with me, this is where you need to apply!
13.09.2025 10:42 β π 6 π 5 π¬ 0 π 1Structural bioinformatics is incredibly powerful on its own or when paired with theory or experiment. One of the PDB's superpowers isnβt from one structure, but comparing many to uncover folds, binding sites, and subtle conformational shifts. chemrxiv.org/engage/chemr...
11.09.2025 14:28 β π 54 π 15 π¬ 1 π 2This is a breakthrough for protein scienceπ₯AFAIK this is the largest protein DB, with >100B seqs (3B clustered at 50%). New biology will come from LOGAN: new folds, topologies, etc. You can also improve your AlphaFold models by building better MSAs. Future AI models will also use LOGAN for training
05.09.2025 16:52 β π 4 π 1 π¬ 0 π 0ππ©βπ¬ For 15+ years biology has accumulated petabytes (million gigabytes) ofπ§¬DNA sequencing data𧬠from the far reaches of our planet.π¦ ππ΅
Logan now democratizes efficient access to the worldβs most comprehensive genetics dataset. Free and open.
doi.org/10.1101/2024...
13/13
BindCraft
nature.com/articles/s41...
A recent conference about BindCraft
youtube.com/watch?v=qQih...
Boltzdesign1
www.biorxiv.org/content/10.1...
12/13 Bindcraft started as a binder design tutorial for the Boston Protein Design and Modeling Club, and it evolved into one of the most promising tools in AI-based protein design. And Importantly, it is open-source!π€
Congrats to all the authors!
11/13 The authors have gone a step further and are currently developing BoltzDesign1, which instead of designing binders, focuses on biomolecular interactions between proteins and small molecules. However, one of the main limitations of both AIs is their high computational cost.
27.08.2025 19:54 β π 0 π 0 π¬ 1 π 010/13 Bindcraftβs capabilities have also been validated by other labs and, perhaps most notably, in an international binder design competition organized by a company called AdaptyvBio, where Bindcraft won.
adaptyvbio.com/blog/po104
9/13 the most important results IMO was the determination of atomic structures of four binders, where in all cases, the computational designs were highly consistent with the experimentally determined ones.
27.08.2025 19:54 β π 0 π 0 π¬ 1 π 08/13 They designed binders targeting:
*proteins with no known binding sites
*membrane proteins , which are much harder than intra/extra-cellular proteins
*proteins lacking evolutionary information
*proteins that interact with DNA/RNA
*medically relevant proteins such as those causing allergies
7/13 Then it uses ProteinMPNN to optimize for solubility, increasing the chances of experimental success. Finally, uses AF2 to predict the structure. To demonstrate Bindcraftβs utility, the authors carried out many wet-lab experiments, something not as common as I would like.
27.08.2025 19:54 β π 0 π 0 π¬ 1 π 06/13 Bindcraft takes advantage of this by first proposing a random seq and predicting its structure to assess how well it interacts with the target protein. It then uses info from each interaction, successful or not, to optimize the seqs until it arrives at a credible interaction
27.08.2025 19:54 β π 0 π 0 π¬ 1 π 05/13 Bindcraft is an improved version of AlphaFold2, specifically AF-Multimer, which predicts the structure of protein complexes. Having been trained on thousands of structures, AF-Multimer learned to identify which sites are most likely to form proteinβprotein interactions.
27.08.2025 19:54 β π 1 π 0 π¬ 1 π 04/13 Bindcraft designs both the sequence and structure of binders, achieving a success rate between 10-100%, since designing large or complex binders is more challenging. This is enormous, considering that our previous best physics/biochemistry-based methods reached a 0.1%.
27.08.2025 19:54 β π 1 π 0 π¬ 1 π 03/13 We have learned how to design PPI so that one protein, called a binder, can bind to another and regulate it. e.g., cancer drugs are binders. However, designing binders requires yrs of research and detailed biomolecular knowledge. So, what if we teach an AI to design binders?
27.08.2025 19:54 β π 0 π 0 π¬ 1 π 02/13 Proteins carry out many functions on their own, but when they interact with each other, they generate a diversity of mechanisms that expand and regulate those functions. PPI arose over millions of years of evolution, giving rise to processes as complex as metabolism.
27.08.2025 19:54 β π 0 π 0 π¬ 1 π 01/13 π§΅ Today, Bindcraft was published in
@nature.com , one of the most famous AIs in biology for designing proteinβprotein interactions (PPI). In my opinion. Bindcraft represents one of the most important advances in the postβAlphaFold2 era.
"We find that the ECOD and CATH provide the most extensive structural coverage of the PDB. ECOD and SCOPe have the most consistent domain boundary conditions, whereas CATH and SCOP2 both differ significantly."
pubmed.ncbi.nlm.nih.gov/34179613/
However that was before the AF2-explosion πΆ
AFAIK PFAM was recently harmonized with ECOD
pubmed.ncbi.nlm.nih.gov/39565196/
pubmed.ncbi.nlm.nih.gov/39540428/
pubmed.ncbi.nlm.nih.gov/39197652/
And CATH has been massively expanded with the Encyclopedia of Domains (~7k putative novel folds)
pubmed.ncbi.nlm.nih.gov/39565206/
Does anyone know of a recent comparison of the main structural classification schemes of proteins and guidance on when to choose one? Something like this but including ECOD and perhaps seq-based schemes like Pfam, SUPERFAMILY and CDD.
Img source (2020)
pubmed.ncbi.nlm.nih.gov/32302382/