Sina Majidian's Avatar

Sina Majidian

@sinamajidian.bsky.social

Hiring PhDs/Postdocs CGRLab.github.io | Incoming DDLS Assistant Professor, Data Science and AI division, Chalmers University | Pangenomics for studying genome variation and evolution | | Formerly at JHU-LangmeadLab, UNIL-DessimozLab, WUR

1,408 Followers  |  1,306 Following  |  161 Posts  |  Joined: 17.11.2023
Posts Following

Posts by Sina Majidian (@sinamajidian.bsky.social)

Preview
Multi-context seeds enable fast and high-accuracy read mapping - Genome Biology A key step in sequence similarity search is to identify shared seeds between a query and a reference sequence. A well-known tradeoff is that longer seeds offer fast searches but reduce sensitivity in ...

1/ Our paper on Multi-Context Seeds is now out, with @tolyan.bsky.social spearheading the work and contributions from Nicolas and @marcelm.net. We introduce a new seeding concept that improves read alignment accuracy while maintaining speed.
link.springer.com/article/10.1...

09.03.2026 12:22 โ€” ๐Ÿ‘ 14    ๐Ÿ” 11    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Fig. 1 | Schematic of the OrthologTransformer model and downstream selec- tion. a Input: a coding DNA sequence from Species A (source), prepended with a source_species token stgt , is encoded. Output: the decoder, conditioned by a tar- get_species token stgt , generates an orthologous coding sequence for Species B, permitting synonymous and conservative nonโ€synonymous substitutions and indels where supported by ortholog supervision. The model features a 20-layer encoder-decoder structure, with each layer equipped with Add & Normalization layers and Multi-head Attention mechanisms. Species tokens (ssrc, stgt) are prepended to the input sequence, enabling species-specific sequence conversion. b OrthologTransformer employs a two-stage learning approach consisting of pre- training and fine-tuning. In the pretraining phase, the model learns general sequence conversion patterns from many-to-many orthologous relationships across multiple species. In the fine-tuning phase, the model is specialized for spe- cific one-to-one species pair conversions using targeted training data. c During candidate selection, a multiโ€objective Monte Carlo Tree Search (MCTS) routine jointly optimizes GC content and mRNA secondaryโ€structure stability (MFE).

Fig. 1 | Schematic of the OrthologTransformer model and downstream selec- tion. a Input: a coding DNA sequence from Species A (source), prepended with a source_species token stgt , is encoded. Output: the decoder, conditioned by a tar- get_species token stgt , generates an orthologous coding sequence for Species B, permitting synonymous and conservative nonโ€synonymous substitutions and indels where supported by ortholog supervision. The model features a 20-layer encoder-decoder structure, with each layer equipped with Add & Normalization layers and Multi-head Attention mechanisms. Species tokens (ssrc, stgt) are prepended to the input sequence, enabling species-specific sequence conversion. b OrthologTransformer employs a two-stage learning approach consisting of pre- training and fine-tuning. In the pretraining phase, the model learns general sequence conversion patterns from many-to-many orthologous relationships across multiple species. In the fine-tuning phase, the model is specialized for spe- cific one-to-one species pair conversions using targeted training data. c During candidate selection, a multiโ€objective Monte Carlo Tree Search (MCTS) routine jointly optimizes GC content and mRNA secondaryโ€structure stability (MFE).

Fig. 5 | Predicted Structures, global and local structural conservations, and sequence-level properties of AI-designed PETase variants. a Predicted tertiary structures of twelve different PETase variants (AI-S1โ€“AI-L5) generated by Ortho- logTransformer with various degrees of sequence modifications. The wild-type PETase structure (PDB entry 5XJH) is shown on the left for reference. The four numbers below each structure denote the counts of the modifications introduced in each variant in the following order: insertions/deletions/synonymous substitu- tions/nonโ€synonymous substitutions. b Global and local structural conservation of AI-designed PETase variants. TMโ€score (global fold similarity), predicted structural stability, backbone RMSD, and perโ€residue pLDDT are shown for AIโ€designed var- iants (AI-S1โ€“AI-L5), wildโ€type (WT), and codonโ€optimized (CO). The AI-designed variants, particularly those trained on broader datasets, achieved a favorable bal- ance across these measures, indicating preservation of the PETase fold while per- mitting small, evolution-consistent modifications and highlighting the benefit of multi-objective optimization relative to conventional codon optimization.
c Sequence-level properties. GC content and RNA secondary-structure free energy (ฮ”G) among AI-S1โ€“AI-L5, WT, and CO are shown. The AI-designed variants converge toward the GC composition of B. subtilis (target host), whereas the wild-type I. sakaiensis PETase gene is substantially more GC-rich (~66.7%). The AI-designed sequences also exhibit favorable mRNA secondary-structure energetics. Source data for (b, c) is available in the Source Data file.

Fig. 5 | Predicted Structures, global and local structural conservations, and sequence-level properties of AI-designed PETase variants. a Predicted tertiary structures of twelve different PETase variants (AI-S1โ€“AI-L5) generated by Ortho- logTransformer with various degrees of sequence modifications. The wild-type PETase structure (PDB entry 5XJH) is shown on the left for reference. The four numbers below each structure denote the counts of the modifications introduced in each variant in the following order: insertions/deletions/synonymous substitu- tions/nonโ€synonymous substitutions. b Global and local structural conservation of AI-designed PETase variants. TMโ€score (global fold similarity), predicted structural stability, backbone RMSD, and perโ€residue pLDDT are shown for AIโ€designed var- iants (AI-S1โ€“AI-L5), wildโ€type (WT), and codonโ€optimized (CO). The AI-designed variants, particularly those trained on broader datasets, achieved a favorable bal- ance across these measures, indicating preservation of the PETase fold while per- mitting small, evolution-consistent modifications and highlighting the benefit of multi-objective optimization relative to conventional codon optimization. c Sequence-level properties. GC content and RNA secondary-structure free energy (ฮ”G) among AI-S1โ€“AI-L5, WT, and CO are shown. The AI-designed variants converge toward the GC composition of B. subtilis (target host), whereas the wild-type I. sakaiensis PETase gene is substantially more GC-rich (~66.7%). The AI-designed sequences also exhibit favorable mRNA secondary-structure energetics. Source data for (b, c) is available in the Source Data file.

Cross-species gene redesign leveraging ortholog information and generative modeling
doi.org/10.1038/s414...

06.03.2026 20:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

The ISCB platform is an excellent place to advertise and find positions in computational biology!
careers.iscb.org/jobs

04.03.2026 14:31 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Home

This looks absolutely great. For those of us interested in pangenomes, I am sure this will be a super place to get data and the interface is very clean (plotly). Congrats to the authors (I don't know if they are on bsky): pangbank.genoscope.cns.fr

04.03.2026 10:50 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Excited to share our pre-print on the curation of a new bioactivity dataset for metabolic transformations. ๐Ÿงช๐Ÿง‘โ€๐Ÿ’ป We were surprised to find that roughly a quarter of our drug-metabolite-target combinations contain metabolites with retained or increased bioactivity relative to the parent drugs! #chemsky

02.03.2026 10:47 โ€” ๐Ÿ‘ 10    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Fig. 1 | FANTASIA pipeline overview. Input proteomes are preprocessed to remove sequences based on length and sequence similarity if needed. Then, embeddings are computed, and distance embedding similarity is calculated against the reference database (using two metrics at will). Optionally, it converts the standard GOPredSim output file to the input file format for topGO20 to facilitate its application in a wider biological workflow.

Fig. 1 | FANTASIA pipeline overview. Input proteomes are preprocessed to remove sequences based on length and sequence similarity if needed. Then, embeddings are computed, and distance embedding similarity is calculated against the reference database (using two metrics at will). Optionally, it converts the standard GOPredSim output file to the input file format for topGO20 to facilitate its application in a wider biological workflow.

Post image Post image

FANTASIA leverages language models to decode the functional dark proteome across the animal tree of life
www.nature.com/articles/s42...

01.03.2026 17:58 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Paralog interference contributes to the preservation of genetic redundancy Duplicated self-interacting proteins can interact and interfere with each otherโ€™s function. Cisneros, Mattenberger, et al. show that selection against interfering loss-of-function alleles extends the ...

New paper alert: Paralog interference contributes to the preservation of genetic redundancy www.cell.com/current-biol...

28.02.2026 02:00 โ€” ๐Ÿ‘ 45    ๐Ÿ” 28    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

A very good list of Computational Biology Conferences and their deadlines!
databio.org/conferences/

27.02.2026 14:14 โ€” ๐Ÿ‘ 11    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Now out in @natcomms.nature.com :
versions 2.0 of both BiG-SCAPE and BiG-SLiCE! With significant speed and accuracy increases, as well as new interactive functionalities.
Read the full paper here #openaccess:
www.nature.com/articles/s41...

26.02.2026 12:18 โ€” ๐Ÿ‘ 37    ๐Ÿ” 18    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Assistant/Associate/Full Professor Computational Biology Assistant/Associate/Full Professor Computational Biology

Sounds like a great opportunity for a professor position in computational biology in the Netherlands careers.universiteitleiden.nl/job/Assistan...

25.02.2026 15:28 โ€” ๐Ÿ‘ 13    ๐Ÿ” 13    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Figure 1: Summary of TF-MoDISco

Figure 1: Summary of TF-MoDISco

Figure 3: Continuous Jaccard similarity is preferable to cross-correlation for matching seqlets. Green checkmarks indicate matching positions.

Figure 3: Continuous Jaccard similarity is preferable to cross-correlation for matching seqlets. Green checkmarks indicate matching positions.

TF-MoDISco: Transcription Factor Motif Discovery from
Importance Scores (2017) arxiv.org/abs/1811.00416

YouTube: youtube.com/watch?v=fXPGVJg956E
GitHub: github.com/kundajelab/t...

23.02.2026 12:54 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Can we simulate realistic evolutionary trajectories and โ€œreplay the tape of lifeโ€? In this work, we propose a flexible, generalizable deep learning framework for modeling how the entire protein sequence evolves over time while capturing complex interactions across sites. 1/n
doi.org/10.64898/202...

21.02.2026 17:13 โ€” ๐Ÿ‘ 83    ๐Ÿ” 35    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1
Structural Variants ESEB Special Topic Network ESEB Special Topic Network

๐ŸงฌWe are launching STRiVE, a @eseb.bsky.social Special Topic Network on the evolutionary role of structural genomic variation.

๐Ÿ—“๏ธStd:
29/04: Online seminar w/ L. Rieseberg
8-10/07: Kick-off in Porto

Join us: structuralvariantsstn.github.io #Evolution #Genomics #StructuralVariants #Biology #PopGen

20.02.2026 11:49 โ€” ๐Ÿ‘ 14    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Vacancies

My university (Chalmers University of Technology in ๐Ÿ‡ธ๐Ÿ‡ช) is recruiting an assistant professor in data-driven cell & molecular biology, funded by the DDLS program @scilifelab.se #chemsky #facultychemjobs

The position comes with a nice start-up package

www.chalmers.se/en/about-cha...

19.02.2026 14:37 โ€” ๐Ÿ‘ 11    ๐Ÿ” 6    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Come join us again in a next round of this massive online open science community effort! ๐Ÿ’ช
Sign up using the link in the thread.
Itโ€™s great fun, and really helps the scientific community. What more can you ask? ๐Ÿ™‚

20.02.2026 05:20 โ€” ๐Ÿ‘ 13    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

kache-hash: A dynamic, concurrent, and cache-efficient hash table for streaming k-mer operations https://www.biorxiv.org/content/10.64898/2026.02.13.705625v1

17.02.2026 05:47 โ€” ๐Ÿ‘ 10    ๐Ÿ” 7    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Annotating genomes at increased scale and resolution Nature Reviews Genetics - In this Review, Ji et al. overview how rapidly advancing experimental and computational methods are enabling improved and automated annotation of gene structure and...

Our new review on genome annotation just appeared in @naturerevgenet.bsky.social, with a particular focus on the human genome, with Hayden Ji and Mihaela Pertea: rdcu.be/e4mI1

17.02.2026 12:46 โ€” ๐Ÿ‘ 24    ๐Ÿ” 12    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
COMBINE-lab - The skepticโ€™s guide to generative AI assisted coding An easy-to-use, flexible website template for labs, with automatic citations, GitHub tag imports, pre-built components, and more.

Iโ€™ve written a post about my recent experiences (successes) with AI coding models; the experiences that caused me to re-evaluate my initial judgements, the surprise I had at what can be accomplished, & some fears I have about these tools. Discussion welcome! combine-lab.github.io/blog/2026/02...

15.02.2026 04:31 โ€” ๐Ÿ‘ 51    ๐Ÿ” 15    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 5

Come to Ascona and attend talks from

Maria Brbic
Charlotte Bunne
Faisal Mahmood
Dana Peโ€™er
Barbara Engelhardt
Caroline Uhler
Julien Gagneur
Marinka Zitnik
Julie Josse (INRIA)
Basile Wicky
Fabian Frรถhlich

with a beautiful view of the lake in the Swiss Alps! ascona2026.sciencesconf.org

16.01.2026 11:00 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

If you are a scientist, working on biology, wondering where to submit your manuscript given the current issues with the academic publishing system, check out wheretopublish.github.io!
We did this thinking change is possible. Letโ€™s make it happen!

11.02.2026 20:31 โ€” ๐Ÿ‘ 7    ๐Ÿ” 6    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Figure 1. The genome is enriched with active promoters relative to random DNA.
(A) We cloned the random library of 150 bp N-mer sequences (n=17,129, purple), and the genomic library of 100-300 bp sequences (n=91,866, magenta) into the dual-reporter plasmid MR1 (pMR1), which drives the expression of green fluorescent protein (GFP, teal) from inserts on the top DNA strand, and that of red fluorescent protein (RFP, orange) on the bottom strand. We transformed E. coli cells with the plasmid libraries. (B) We sorted the bacterial libraries into fluorescence bins at four fluorescence strengths: none, weak, moderate, and strong for both GFP and RFP (eight bins total) with a cell-sorter. We bulk-sequenced the library inserts from each bin and calculated fluorescence scores in arbitrary units (a.u.) ranging between one (none) and four (strongest) (Methods). (C) The probability that a DNA sequence in the random (purple) and genomic (magenta) libraries is a promoter relative to its AT-content. (D) For 102 position-weight matrices (PWMs) for transcription factors and sigma (ฯƒ) factors, we plot the percentage of sequences in each library (purple: random, magenta: genome) that encode at least one putative factor binding site (vertical axis) against the respective PWMโ€™s information content in bits. We test for equality of the frequency distributions between the random and genome libraries with a paired t-test (p=7.48ร—10โˆ’12).

Figure 1. The genome is enriched with active promoters relative to random DNA. (A) We cloned the random library of 150 bp N-mer sequences (n=17,129, purple), and the genomic library of 100-300 bp sequences (n=91,866, magenta) into the dual-reporter plasmid MR1 (pMR1), which drives the expression of green fluorescent protein (GFP, teal) from inserts on the top DNA strand, and that of red fluorescent protein (RFP, orange) on the bottom strand. We transformed E. coli cells with the plasmid libraries. (B) We sorted the bacterial libraries into fluorescence bins at four fluorescence strengths: none, weak, moderate, and strong for both GFP and RFP (eight bins total) with a cell-sorter. We bulk-sequenced the library inserts from each bin and calculated fluorescence scores in arbitrary units (a.u.) ranging between one (none) and four (strongest) (Methods). (C) The probability that a DNA sequence in the random (purple) and genomic (magenta) libraries is a promoter relative to its AT-content. (D) For 102 position-weight matrices (PWMs) for transcription factors and sigma (ฯƒ) factors, we plot the percentage of sequences in each library (purple: random, magenta: genome) that encode at least one putative factor binding site (vertical axis) against the respective PWMโ€™s information content in bits. We test for equality of the frequency distributions between the random and genome libraries with a paired t-test (p=7.48ร—10โˆ’12).

De-novo promoters emerge more readily from random DNA than from genomic DNA
www.biorxiv.org/content/10.1...

10.02.2026 11:17 โ€” ๐Ÿ‘ 13    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Home | Timothy Fuqua

I'm looking for a Swiss department to host me for an SNSF Starting Grant.

I research how gene expression evolves and emerges by combining wet lab + computational work in a variety of model systems (E. coli, Drosophila, yeast). More: timothyfuqua.com

If your department might be a match, letโ€™s chat!

10.02.2026 10:42 โ€” ๐Ÿ‘ 4    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Introducing The Structural History of Eukarya (SHE): The first proteome-scale phylogeny constructed entirely from 3D structure.
We computed 300 trillion alignments across 1,542 species to map the tree of life. ๐Ÿงต๐Ÿ‘‡ (1/5)

07.02.2026 08:50 โ€” ๐Ÿ‘ 84    ๐Ÿ” 40    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Compbio Asia

Please spread the word:

We invite applications to a two-week Computational Biology workshop in Singapore, June 14-27.

This NSF-funded workshop brings together 16-20 US grad students with international peers.
Apply by March 21: compbioasia.net
๐Ÿงต Details below:

05.02.2026 17:22 โ€” ๐Ÿ‘ 3    ๐Ÿ” 9    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
Tenure-Track Assistant Professor / Associate Professor in Bioinformatics and/or Computational Biology at Aarhus University, Denmark - Vacancy at Aarhus University Vacancy at Department of Molecular Biology and Genetics - BiRC - Bioinformatics Research Center, Aarhus University

Aarhus University is seeking a Tenure-Track Assistant/Associate Professor in Bioinformatics/Coโ€ฆ https://nat.au.dk/en/about-the-faculty/vacant-positions-and-career/job/tenure-track-assistant-professor-associate-professor-in-bioinformatics-and-or-computational-biology-at-aarhus-university-denmark #job

07.02.2026 23:11 โ€” ๐Ÿ‘ 19    ๐Ÿ” 33    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Biodiversity Bioinformatics Summer School This School is co-organized by SIB/ELIXIR Switzerland and de.NBI/ELIXIR Germany Overview Biodiversity is fundamental to ecosystem functioning, yet it

#biodiversity #bioinformatics Summer School announcement ... 21-26 June in Siegen, Germany co-organised by @sib.swiss & @denbi.bsky.social www.sib.swiss/training/cou...
๐ŸŸข eDNA & ecosystems
๐ŸŸฃ pangenome diversity
๐Ÿ”ต population genetics
๐ŸŸก comparative genomics

07.02.2026 12:07 โ€” ๐Ÿ‘ 9    ๐Ÿ” 9    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

๐Ÿšจ๐Ÿงช Announcing our #ICLR2026 Workshop, Generative AI in Genomics (Gen2): Barriers and Frontiers! @iclr-conf.bsky.social

๐Ÿ“ฃCall for: Full workshop papers (5-8 pages) and Tiny papers (2-4 pages)
๐Ÿ“…Submission deadline: 7 February 2026 AoE
๐ŸŒLearn more: genai-in-genomics.github.io
(1/7)

12.01.2026 03:15 โ€” ๐Ÿ‘ 4    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Kipoi

Join us for our next Kipoi Seminar with Jun Cheng, DeepMind

๐Ÿ‘‰ AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model

๐Ÿ“… Wed Feb 4, 5:30pm CET
๐Ÿงฌ kipoi.org/seminar
๐Ÿฆ‹ @kipoizoo.bsky.social

23.01.2026 12:20 โ€” ๐Ÿ‘ 0    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Does the noncoding genome actually carry more genetic information than coding seqs? Motivated by this question we mutated every bp in the 10kb MYC locus. Results are even more exciting: Decoding the MYC locus reveals a druggable ultraconserved RNA element www.biorxiv.org/content/10.6...

31.01.2026 01:13 โ€” ๐Ÿ‘ 128    ๐Ÿ” 47    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 6
Figure showing the workflow for GLADE. GLADE uses an orthofinder results folder, and infers and maps evolutionary events (gains, losses, duplications), as well as reconstructing ancestral gene content

Figure showing the workflow for GLADE. GLADE uses an orthofinder results folder, and infers and maps evolutionary events (gains, losses, duplications), as well as reconstructing ancestral gene content

GLADE takes a fully phylogenetic approach.

It uses orthogroups, gene trees, and the species tree to infer gains, losses, and duplications, and to map each event onto the phylogeny.

(4/10)

29.01.2026 12:10 โ€” ๐Ÿ‘ 6    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0