Thanks to my amazing collaborators: @samsja19.bsky.social , Johannes Hagemann, @shangshang-wang.bsky.social , Jason Wiemels, Jeff Kaufman, and @willieneis.bsky.social
Special shout out to the Nucleic Acid Observatory for the sequencing data, and @PrimeIntellect for compute support.
06.01.2025 17:04 β π 1 π 0 π¬ 0 π 0
Weβre sharing METAGENE-1βs:
πPaper: metagene.ai/metagene-1-p...
πWebsite: metagene.ai
π€Model weights: huggingface.co/metagene-ai
π§΅7/
06.01.2025 17:04 β π 5 π 3 π¬ 1 π 0
π‘Tailored for detection, not design. We scoped METAGENE-1 to minimize risks while maximizing potential for public health and biosurveillance. Responsible open-sourcing matters. With open weights, we aim to drive progress in interpretability and safe genomics research.
π§΅6/
06.01.2025 17:04 β π 3 π 0 π¬ 1 π 0
πMETAGENE-1 achieves state-of-the-art results in:
- Pathogen detection
- Genomic embedding benchmarks
- Generalization to multi-species tasks
It already shows promise in public health and biosurveillance, and we are collaborating with experts to unlock its full impact.
π§΅5/
06.01.2025 17:04 β π 5 π 0 π¬ 1 π 1
The METAGENE-1 model is 7B parameter Llama-style transformer π¦, pretrained and optimized for anomaly detection, embedding, and multi-species genomics. Fully compatible with π€Hugging Face (huggingface.co/metagene-ai) β ready to use like any of your favorite LLMs!
π§΅4/
06.01.2025 17:04 β π 2 π 0 π¬ 1 π 0
πThe data behind METAGENE-1:
- Brand-new dataset collected with experts from Southern California & Missouri
- 1.5 trillion base pairs from diverse wastewater samples
- Short reads (100β300 BPs), deep sequencing at scale
- Byte-Pair Encoding customized for genomic sequences
π§΅3/
06.01.2025 17:04 β π 2 π 1 π¬ 1 π 0
Why is METAGENE-1 special? π€We trained it on wastewater metagenomics, capturing the human-adjacent microbiome across the US for the past 12 months. This unlocks powerful capabilities for early pathogen detection and microbial ecosystems understanding. π±π¦
πWebsite: metagene.ai
π§΅2/
06.01.2025 17:04 β π 2 π 0 π¬ 1 π 0
Introducing METAGENE-1π§¬, an open-source 7B-parameter metagenomics foundation model pretrained on 1.5 trillion base pairs. Built for pandemic monitoring, pathogen detection, and biosurveillance, with SOTA results across many genomics tasks.
π§΅1/
06.01.2025 17:04 β π 27 π 6 π¬ 2 π 0
Landed at Vancouver to attend #NeurIPS :-) Excited to chat about multimodal models, AI4Science, decision making, and more!
10.12.2024 00:28 β π 15 π 0 π¬ 0 π 0
Let's go! We are releasing SmolVLM, a smol 2B VLM built for on-device inference that outperforms all models at similar GPU RAM usage and tokens throughputs.
SmolVLM can be fine-tuned on a Google collab and be run on a laptop! Or process millions of documents with a consumer GPU!
26.11.2024 15:57 β π 104 π 22 π¬ 4 π 4
π nlp@usc student. thanks!
25.11.2024 03:56 β π 3 π 0 π¬ 0 π 0
tfw you realize that this isn't an alt twitter for academic posting but an alt insta for cute doggos.
this is doodle, our border collie pup that often used as adversarial attacks for image classification models (they classify him as corgi :-)
18.11.2024 14:44 β π 13 π 0 π¬ 1 π 0
yes please if there's still space left :-P
18.11.2024 05:40 β π 1 π 0 π¬ 0 π 0
our border collie pup doodle absolutely wants nothing from that plate of banana :-P
18.11.2024 04:24 β π 5 π 0 π¬ 1 π 0
Dynamical Systems and Deep Learning Research @ UIUC
Prev: CompSci and Economics @ UC Berkeley
using computational methods to understand the linguistic mechanisms of social problems | NLP, socioling, discourse-pragmatics | asst prof at UC Davis Linguistics
https://robvoigt.faculty.ucdavis.edu/
Computer Science PhD Student at the University of Chicago. Genome-scale language models. AI steered molecular dynamics. AI4Science.
Research Fellow @flatironinstitute.org @simonsfoundation.org
Formerly @csail.mit.edu @msftresearch.bsky.social @uconn.bsky.social
Computational systems x structure biology | he/him | https://samsl.io | π¨πΌβπ»
Principal Researcher in BioML at Microsoft Research. He/him/δ». πΉπΌ yangkky.github.io
AI for Science, deep generative models, inverse problems. Professor of AI and deep learning @universitedeliege.bsky.social. Previously @CERN, @nyuniversity. https://glouppe.github.io
PhD student @ University of Zurich | Remote Sensing | Computer Vision | Sustainability | Ex intern at Google DeepMind
Kaggle.com - Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals.
@_angie_chen at the other place
PhD student @NYU, formerly at
@Princeton π
Interested in LLMs/NLP, pastries, and running. She/her.
CS PhD @Harvard β’ Pre-doc @GoogleDeepMind β’ Anything `science', ~cosmos, and Oxford commas
PhD student in NLP at Cambridge | ELLIS PhD student
https://lucasresck.github.io/
Assistant Professor @ UChicago CS/DSI (NLP & HCI) | Writing with AI βοΈ
https://minalee-research.github.io/
ML PhD @ CMU | Prev: research intern @ Microsoft Research New England, Amazon, Bosch.
Research Scientist at Flatiron Institute CCM. Prev Meta AI, NYU, Inria, Quora.
http://alberto.bietti.me
AI&Science | Resarch Fellow at @FlatironCCM | Member of @PolymathicAI | ex PhD student at @ENS_Ulm
Senior Research Scientist at NVIDIA | Computer Vision | AI for science
Professor and Head of Machine Learning Department at Carnegie Mellon. Board member OpenAI. Chief Technical Advisor Gray Swan AI. Chief Expert Bosch Research.
Prof (CS @Stanford), Co-Director @StanfordHAI, Cofounder/CEO @theworldlabs, CoFounder @ai4allorg #AI #computervision #robotics #AI-healthcare
Assistant professor of Linguistics and Data Science at Boston University. NLP, computational linguistics, interpretability, social bias and fairness. she/her. https://www.notaphonologist.com/