Fig. 1 | Schematic of the OrthologTransformer model and downstream selec- tion. a Input: a coding DNA sequence from Species A (source), prepended with a source_species token stgt , is encoded. Output: the decoder, conditioned by a tar- get_species token stgt , generates an orthologous coding sequence for Species B, permitting synonymous and conservative nonโsynonymous substitutions and indels where supported by ortholog supervision. The model features a 20-layer encoder-decoder structure, with each layer equipped with Add & Normalization layers and Multi-head Attention mechanisms. Species tokens (ssrc, stgt) are prepended to the input sequence, enabling species-specific sequence conversion. b OrthologTransformer employs a two-stage learning approach consisting of pre- training and fine-tuning. In the pretraining phase, the model learns general sequence conversion patterns from many-to-many orthologous relationships across multiple species. In the fine-tuning phase, the model is specialized for spe- cific one-to-one species pair conversions using targeted training data. c During candidate selection, a multiโobjective Monte Carlo Tree Search (MCTS) routine jointly optimizes GC content and mRNA secondaryโstructure stability (MFE).
Fig. 5 | Predicted Structures, global and local structural conservations, and sequence-level properties of AI-designed PETase variants. a Predicted tertiary structures of twelve different PETase variants (AI-S1โAI-L5) generated by Ortho- logTransformer with various degrees of sequence modifications. The wild-type PETase structure (PDB entry 5XJH) is shown on the left for reference. The four numbers below each structure denote the counts of the modifications introduced in each variant in the following order: insertions/deletions/synonymous substitu- tions/nonโsynonymous substitutions. b Global and local structural conservation of AI-designed PETase variants. TMโscore (global fold similarity), predicted structural stability, backbone RMSD, and perโresidue pLDDT are shown for AIโdesigned var- iants (AI-S1โAI-L5), wildโtype (WT), and codonโoptimized (CO). The AI-designed variants, particularly those trained on broader datasets, achieved a favorable bal- ance across these measures, indicating preservation of the PETase fold while per- mitting small, evolution-consistent modifications and highlighting the benefit of multi-objective optimization relative to conventional codon optimization.
c Sequence-level properties. GC content and RNA secondary-structure free energy (ฮG) among AI-S1โAI-L5, WT, and CO are shown. The AI-designed variants converge toward the GC composition of B. subtilis (target host), whereas the wild-type I. sakaiensis PETase gene is substantially more GC-rich (~66.7%). The AI-designed sequences also exhibit favorable mRNA secondary-structure energetics. Source data for (b, c) is available in the Source Data file.
Cross-species gene redesign leveraging ortholog information and generative modeling
doi.org/10.1038/s414...
06.03.2026 20:51 โ
๐ 1
๐ 0
๐ฌ 1
๐ 0
The ISCB platform is an excellent place to advertise and find positions in computational biology!
careers.iscb.org/jobs
04.03.2026 14:31 โ
๐ 3
๐ 1
๐ฌ 0
๐ 0
Home
This looks absolutely great. For those of us interested in pangenomes, I am sure this will be a super place to get data and the interface is very clean (plotly). Congrats to the authors (I don't know if they are on bsky): pangbank.genoscope.cns.fr
04.03.2026 10:50 โ
๐ 8
๐ 4
๐ฌ 1
๐ 0
Excited to share our pre-print on the curation of a new bioactivity dataset for metabolic transformations. ๐งช๐งโ๐ป We were surprised to find that roughly a quarter of our drug-metabolite-target combinations contain metabolites with retained or increased bioactivity relative to the parent drugs! #chemsky
02.03.2026 10:47 โ
๐ 10
๐ 3
๐ฌ 1
๐ 0
Fig. 1 | FANTASIA pipeline overview. Input proteomes are preprocessed to remove sequences based on length and sequence similarity if needed. Then, embeddings are computed, and distance embedding similarity is calculated against the reference database (using two metrics at will). Optionally, it converts the standard GOPredSim output file to the input file format for topGO20 to facilitate its application in a wider biological workflow.
FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
www.nature.com/articles/s42...
01.03.2026 17:58 โ
๐ 2
๐ 0
๐ฌ 0
๐ 0
A very good list of Computational Biology Conferences and their deadlines!
databio.org/conferences/
27.02.2026 14:14 โ
๐ 11
๐ 0
๐ฌ 0
๐ 0
Now out in @natcomms.nature.com :
versions 2.0 of both BiG-SCAPE and BiG-SLiCE! With significant speed and accuracy increases, as well as new interactive functionalities.
Read the full paper here #openaccess:
www.nature.com/articles/s41...
26.02.2026 12:18 โ
๐ 37
๐ 18
๐ฌ 1
๐ 1
Figure 1: Summary of TF-MoDISco
Figure 3: Continuous Jaccard similarity is preferable to cross-correlation for matching seqlets. Green checkmarks indicate matching positions.
TF-MoDISco: Transcription Factor Motif Discovery from
Importance Scores (2017) arxiv.org/abs/1811.00416
YouTube: youtube.com/watch?v=fXPGVJg956E
GitHub: github.com/kundajelab/t...
23.02.2026 12:54 โ
๐ 2
๐ 0
๐ฌ 0
๐ 0
Can we simulate realistic evolutionary trajectories and โreplay the tape of lifeโ? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...
21.02.2026 17:13 โ
๐ 83
๐ 35
๐ฌ 3
๐ 1
Structural Variants ESEB Special Topic Network
ESEB Special Topic Network
๐งฌWe are launching STRiVE, a @eseb.bsky.social Special Topic Network on the evolutionary role of structural genomic variation.
๐๏ธStd:
29/04: Online seminar w/ L. Rieseberg
8-10/07: Kick-off in Porto
Join us: structuralvariantsstn.github.io #Evolution #Genomics #StructuralVariants #Biology #PopGen
20.02.2026 11:49 โ
๐ 14
๐ 9
๐ฌ 0
๐ 0
Vacancies
My university (Chalmers University of Technology in ๐ธ๐ช) is recruiting an assistant professor in data-driven cell & molecular biology, funded by the DDLS program @scilifelab.se #chemsky #facultychemjobs
The position comes with a nice start-up package
www.chalmers.se/en/about-cha...
19.02.2026 14:37 โ
๐ 11
๐ 6
๐ฌ 1
๐ 0
Come join us again in a next round of this massive online open science community effort! ๐ช
Sign up using the link in the thread.
Itโs great fun, and really helps the scientific community. What more can you ask? ๐
20.02.2026 05:20 โ
๐ 13
๐ 7
๐ฌ 0
๐ 0
kache-hash: A dynamic, concurrent, and cache-efficient hash table for streaming k-mer operations https://www.biorxiv.org/content/10.64898/2026.02.13.705625v1
17.02.2026 05:47 โ
๐ 10
๐ 7
๐ฌ 0
๐ 0
COMBINE-lab - The skepticโs guide to generative AI assisted coding
An easy-to-use, flexible website template for labs, with automatic citations, GitHub tag imports, pre-built components, and more.
Iโve written a post about my recent experiences (successes) with AI coding models; the experiences that caused me to re-evaluate my initial judgements, the surprise I had at what can be accomplished, & some fears I have about these tools. Discussion welcome! combine-lab.github.io/blog/2026/02...
15.02.2026 04:31 โ
๐ 51
๐ 15
๐ฌ 8
๐ 5
Come to Ascona and attend talks from
Maria Brbic
Charlotte Bunne
Faisal Mahmood
Dana Peโer
Barbara Engelhardt
Caroline Uhler
Julien Gagneur
Marinka Zitnik
Julie Josse (INRIA)
Basile Wicky
Fabian Frรถhlich
with a beautiful view of the lake in the Swiss Alps! ascona2026.sciencesconf.org
16.01.2026 11:00 โ
๐ 3
๐ 1
๐ฌ 0
๐ 0
If you are a scientist, working on biology, wondering where to submit your manuscript given the current issues with the academic publishing system, check out wheretopublish.github.io!
We did this thinking change is possible. Letโs make it happen!
11.02.2026 20:31 โ
๐ 7
๐ 6
๐ฌ 0
๐ 0
Figure 1. The genome is enriched with active promoters relative to random DNA.
(A) We cloned the random library of 150 bp N-mer sequences (n=17,129, purple), and the genomic library of 100-300 bp sequences (n=91,866, magenta) into the dual-reporter plasmid MR1 (pMR1), which drives the expression of green fluorescent protein (GFP, teal) from inserts on the top DNA strand, and that of red fluorescent protein (RFP, orange) on the bottom strand. We transformed E. coli cells with the plasmid libraries. (B) We sorted the bacterial libraries into fluorescence bins at four fluorescence strengths: none, weak, moderate, and strong for both GFP and RFP (eight bins total) with a cell-sorter. We bulk-sequenced the library inserts from each bin and calculated fluorescence scores in arbitrary units (a.u.) ranging between one (none) and four (strongest) (Methods). (C) The probability that a DNA sequence in the random (purple) and genomic (magenta) libraries is a promoter relative to its AT-content. (D) For 102 position-weight matrices (PWMs) for transcription factors and sigma (ฯ) factors, we plot the percentage of sequences in each library (purple: random, magenta: genome) that encode at least one putative factor binding site (vertical axis) against the respective PWMโs information content in bits. We test for equality of the frequency distributions between the random and genome libraries with a paired t-test (p=7.48ร10โ12).
De-novo promoters emerge more readily from random DNA than from genomic DNA
www.biorxiv.org/content/10.1...
10.02.2026 11:17 โ
๐ 13
๐ 5
๐ฌ 0
๐ 0
Home | Timothy Fuqua
I'm looking for a Swiss department to host me for an SNSF Starting Grant.
I research how gene expression evolves and emerges by combining wet lab + computational work in a variety of model systems (E. coli, Drosophila, yeast). More: timothyfuqua.com
If your department might be a match, letโs chat!
10.02.2026 10:42 โ
๐ 4
๐ 4
๐ฌ 0
๐ 0
Introducing The Structural History of Eukarya (SHE): The first proteome-scale phylogeny constructed entirely from 3D structure.
We computed 300 trillion alignments across 1,542 species to map the tree of life. ๐งต๐ (1/5)
07.02.2026 08:50 โ
๐ 84
๐ 40
๐ฌ 2
๐ 0
Compbio Asia
Please spread the word:
We invite applications to a two-week Computational Biology workshop in Singapore, June 14-27.
This NSF-funded workshop brings together 16-20 US grad students with international peers.
Apply by March 21: compbioasia.net
๐งต Details below:
05.02.2026 17:22 โ
๐ 3
๐ 9
๐ฌ 2
๐ 1
Biodiversity Bioinformatics Summer School
This School is co-organized by SIB/ELIXIR Switzerland and de.NBI/ELIXIR Germany
Overview
Biodiversity is fundamental to ecosystem functioning, yet it
#biodiversity #bioinformatics Summer School announcement ... 21-26 June in Siegen, Germany co-organised by @sib.swiss & @denbi.bsky.social www.sib.swiss/training/cou...
๐ข eDNA & ecosystems
๐ฃ pangenome diversity
๐ต population genetics
๐ก comparative genomics
07.02.2026 12:07 โ
๐ 9
๐ 9
๐ฌ 0
๐ 0
๐จ๐งช Announcing our #ICLR2026 Workshop, Generative AI in Genomics (Gen2): Barriers and Frontiers! @iclr-conf.bsky.social
๐ฃCall for: Full workshop papers (5-8 pages) and Tiny papers (2-4 pages)
๐
Submission deadline: 7 February 2026 AoE
๐Learn more: genai-in-genomics.github.io
(1/7)
12.01.2026 03:15 โ
๐ 4
๐ 3
๐ฌ 1
๐ 0
Kipoi
Join us for our next Kipoi Seminar with Jun Cheng, DeepMind
๐ AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model
๐
Wed Feb 4, 5:30pm CET
๐งฌ kipoi.org/seminar
๐ฆ @kipoizoo.bsky.social
23.01.2026 12:20 โ
๐ 0
๐ 2
๐ฌ 0
๐ 1
Does the noncoding genome actually carry more genetic information than coding seqs? Motivated by this question we mutated every bp in the 10kb MYC locus. Results are even more exciting: Decoding the MYC locus reveals a druggable ultraconserved RNA element www.biorxiv.org/content/10.6...
31.01.2026 01:13 โ
๐ 128
๐ 47
๐ฌ 4
๐ 6
Figure showing the workflow for GLADE. GLADE uses an orthofinder results folder, and infers and maps evolutionary events (gains, losses, duplications), as well as reconstructing ancestral gene content
GLADE takes a fully phylogenetic approach.
It uses orthogroups, gene trees, and the species tree to infer gains, losses, and duplications, and to map each event onto the phylogeny.
(4/10)
29.01.2026 12:10 โ
๐ 6
๐ 3
๐ฌ 1
๐ 0