βThe science of today is the technology of tomorrow.β
β Edward Teller
@gwencheni.bsky.social
Building stealth AI+bio. Prev @KhoslaVentures @indbio @sosvπ§¬π» @ucsfπ @jpmorgan @GoldmanSachs @yale @UChicago @LMU_Muenchen
βThe science of today is the technology of tomorrow.β
β Edward Teller
Code and paper on Github: github.com/deepseek-ai/...
21.01.2025 02:20 β π 0 π 0 π¬ 0 π 0Emergent properties:
Thinking time steadily improved throughout the training process π³
Uses Group Relative Policy Optimization (GRPO) instead of Proximal Policy Optimization (PPO): foregoes critic model same size as policy model, instead estimates baseline from group scores instead, using the average reward of multiple samples to reduce memory use.
21.01.2025 02:19 β π 0 π 0 π¬ 1 π 0The secret sauce is rewards: ground truth computed by hardcoded rules. Learned rewards can easily be hacked by RL.
21.01.2025 02:18 β π 0 π 0 π¬ 1 π 0In addition to open source, DeepSeek-R1 is significant because itβs complete reinforcement learning (RL), no supervised fine-tuning (SFT)(βcold startβ). Reminiscent of AlphaZero (which mastered Go, Shogi, and Chess from scratch, without playing against human grandmasters).
21.01.2025 02:18 β π 1 π 0 π¬ 1 π 0DeepSeek-R1: pure reinforcement learning (RL), no supervised fine-tuning (SFT), no chain-of-thought (CoT) #1minPapers π§΅π
21.01.2025 02:18 β π 1 π 0 π¬ 1 π 013. Janet Woodcock (former FDA): potential to look at prospective studies for certain rare indications, instead of only randomized controlled trials.
17.01.2025 02:05 β π 0 π 0 π¬ 0 π 012. Scott Gottlieb
@scottgottliebmd.bsky.social : 50% of oncology INDs at the FDA are from China.
11. Bob Nelson: market is provisionally open. If have strong shareholder base already and book ready, then marketβs open. Biotech IPOs are funding events: ARCH doesnβt view IPOs as exits, will stay past IPO for 3β4yrs till clinical milestone.
17.01.2025 02:03 β π 1 π 0 π¬ 1 π 010. Big pharmas acquiring model teams are rare (Prescient Design was one off), more partnerships, claim having their own teams.
17.01.2025 02:03 β π 1 π 0 π¬ 1 π 09. Org matter in decision making. e.g. Merck organized into Research vs Development. J&J organized along indication areas. Do you invest on risk, or on inflection points?
17.01.2025 02:03 β π 0 π 0 π¬ 1 π 08. Every pharma's interested in obesity, but also careful because already have 3 players, hard to differentiate.
17.01.2025 02:03 β π 0 π 0 π¬ 1 π 07. Pharmas need to look over their shoulders prior to billion-dollar acquisitions in case generics come out of China in a few years with the same MOA. One pharma CEO, βwe have to get the cost of R&D down to be competitive.β
17.01.2025 02:03 β π 0 π 0 π¬ 1 π 06. Large deals tend to result in cost cuts, not topline growth rates, and this industry trades on topline growth rate. Bolt-ons and mega-billion dollar deals β barbell strategy β may be in 2025.
17.01.2025 02:03 β π 0 π 0 π¬ 1 π 05. 2023 was a record M&A year, $130bn. 2024 was a digestion year: not horrible for the number of deals, but private deals because capital markets closed. Scale is imprt in pharma, drives how much R&D is allocated. Previous admin was against large deals. New admin not against.
17.01.2025 02:02 β π 0 π 0 π¬ 1 π 04. IRA shifted focus to bigger cancers, may be here to stay. Biologics and small molecules timelines may not be aligned, 13 vs 9. Small molecules have challenges with tox, and only have 9yrs to recoup investment. There could hopefully be bipartisan support to even this 9 vs 13.
17.01.2025 02:02 β π 0 π 0 π¬ 1 π 03. Saw a lot of fast following the last few years: 3β4 drugs on same MOA is hard to get a return. Do VCs shift to lower risk lower reward investments instead?
17.01.2025 02:02 β π 0 π 0 π¬ 1 π 02. Last yearβs IPOs, 80% are below water, thus capitalize your company such that you arenβt dependent on an IPO. Have optionality. Is M&A the goal? If you are taking a drug to market, you may not have other options but to IPO.
17.01.2025 02:02 β π 0 π 0 π¬ 1 π 0Takeaways from JPM Healthcare Conf 2025 #JPM2025 For having survived the past two years of biotech winter and current political uncertainties, the crowd was pretty cautiously optimistic for dealflow to recover. And yes, the word βagenticβ AI should have been a drinking game.π§΅π
17.01.2025 02:01 β π 3 π 0 π¬ 1 π 0Paper on arXiv: arxiv.org/abs/2501.04519
12.01.2025 16:48 β π 0 π 0 π¬ 0 π 0SLM as a process preference model (PPM) to predict reward labels for each reasoning step. Q-values can reliably distinguish positive (correct) steps from negative. Using preference pairs and pairwise ranking loss, instead of direct Q-values, eliminate the inherently noise. 6/n
12.01.2025 16:47 β π 0 π 0 π¬ 1 π 0SLM samples candidate nodes, each generating CoT and corresponding Python code. Only nodes with successful execution are retained. MCTS automatically assign (self-annotate) a Q-value to each intermediate step based on its contribution: more trajectories=higher Q. 5/n
12.01.2025 16:47 β π 0 π 0 π¬ 1 π 0Process reward modeling (PRM) provides fine-grained feedback on intermediate steps because incorrect intermediate steps significantly decrease data quality in math. 4/n
12.01.2025 16:47 β π 0 π 0 π¬ 1 π 0Result: β4 rounds of self-evolution with millions of synthesized solutions for 747k math problems β¦ it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.β 3/n
12.01.2025 16:47 β π 0 π 0 π¬ 1 π 0βUnlike solutions relying on superior LLMs for data synthesis, rStar-Math leverages smaller language models (SLMs) with Monte Carlo Tree Search (MCTS) to establish a self-evolutionary process, iteratively generating higher-quality training data.β 2/n
12.01.2025 16:46 β π 0 π 0 π¬ 1 π 0#1minPapers MSFTβs rStar-Math small language model self-improves and generates own training data - second time in recent months that a small model performed equally well (or better) than the billion-parameter large models. π§΅π
12.01.2025 16:46 β π 0 π 0 π¬ 1 π 0Full interview here: www.youtube.com/watch?v=w9WE...
10.01.2025 03:02 β π 0 π 0 π¬ 0 π 0Speculate how o1 works: search possible chains-of-thought. By backtracking and editing which branches work better, it ends up with a natural language program that adapts to novelty. Clearly doing search in chai-of-thought space at test-time: telltale sign=compute and latencyβ¬οΈ 11/n
10.01.2025 03:02 β π 0 π 0 π¬ 1 π 0Some recombination patterns of the building blocks will occur more often in certain contexts, extract this as a reservoir form (higher-level abstraction fitted to the problem), add it back to the building blocks, such that next time you solve it in fewer steps. 10/n
10.01.2025 03:02 β π 0 π 0 π¬ 1 π 0