Flaviu Cipcigan's Avatar

Flaviu Cipcigan

@flaviucipcigan.bsky.social

Building AIs for scientific discovery. Discovered antibiotics and materials for carbon capture. Tango dancer. See more at flaviucipcigan.com. Opinions my own.

6,429 Followers  |  465 Following  |  351 Posts  |  Joined: 06.10.2023  |  1.8508

Latest posts by flaviucipcigan.bsky.social on Bluesky

Preview
GitHub - marcomaroni-github/twitter-to-bluesky: Import all tweets exported from X/Twitter to a Bluesky account. Import all tweets exported from X/Twitter to a Bluesky account. - marcomaroni-github/twitter-to-bluesky

there's a bunch of scripts that can migrate old posts through the api, such as this one: github.com/marcomaroni-...

i don't know if timestamps would migrate, but I've seen folks who have posts with older timestamps than the start of the bluesky

26.02.2025 11:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Super interesting application of program search

Goals are mapped to programs which are embedded in a latent space.

A fitness metric is assigned to the programs and program search is done to synthesise new human-like goals.

22.02.2025 11:53 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Thanks! Not sure, I'll try it πŸ€”

21.02.2025 08:30 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Discovery of novel reticular materials for carbon dioxide capture using GFlowNets Artificial intelligence holds promise to improve materials discovery. GFlowNets are an emerging deep learning algorithm with many applications in AI-assisted discovery. Using GFlowNets, we generate po...

Using GFlowNets to discover new materials for carbon capture

20.02.2025 21:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
20.02.2025 21:09 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

One of my big motivations is accelerating science with AI.

Every discovery project had a beautiful aha moment, such as the structure of antibiotics emerging in the latent space of a model or a GFlowNet proposing new carbon capture materials.

Here's some of the threads I've wrote on this topic.

20.02.2025 21:09 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Support for the YAML 1.2 Core and JSON schemas [Take 2] by perlpunk Β· Pull Request #555 Β· yaml/pyyaml Supersedes #512 This is a draft and subject to discussion. See also #486 (For #512: Thanks to @SUSE for another hackweek! I had four days of work time dedicated to an open source project of my choi...

Seems like they're fixing this in PyYAML 7.0

17.02.2025 16:49 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
{'lol': ['5.0E6',
  '5.0e6',
  '5.E6',
  '5.e6',
  '5E6',
  '5e6',
  5e-06,
  5e-06,
  5e-06,
  5e-06,
  '5E-6',
  '5e-6',
  5000000.0,
  5000000.0,
  5000000.0,
  5000000.0,
  '5E+6',
  '5e+6']}

{'lol': ['5.0E6', '5.0e6', '5.E6', '5.e6', '5E6', '5e6', 5e-06, 5e-06, 5e-06, 5e-06, '5E-6', '5e-6', 5000000.0, 5000000.0, 5000000.0, 5000000.0, '5E+6', '5e+6']}

Wanna try to guess which of those gets parsed as a string and which as a number? Answer in alt text.

YAML parsing in python is weird.

17.02.2025 16:49 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Interesting idea to generate responses using diffusion rather than left-to-right auto-regressive models

17.02.2025 12:31 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
From here https://www.youtube.com/watch?v=dZQ7x0-MZcI

From here https://www.youtube.com/watch?v=dZQ7x0-MZcI

Supercomputers - large computer clusters - allow you to work a number of years ahead.

Creating the GUI at PARC seemed like a "waste of FLOPs" but revolutionized computing.

15.02.2025 12:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Where do large compute clusters come into play in this case?

Alan Kay talked about the Wayne Gretzky game, a hockey player famous for his quote about skating where the puck will be.

15.02.2025 12:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Similarly, the benchmark scores of a model with a given number of parameters increases each generation due to better data and training algorithms, caveated by dataset leakage.

15.02.2025 12:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For each generation, for a fixed parameter count, the speed of training & inferring decreases due to hardware and software advances, like flash attention and multi-head latent attention.

At each generation, larger and larger number of parameters can be ran locally.

15.02.2025 12:56 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

My first computer used a processor in the Intel 8086 generation, which had about 29k transistors.

Today, an Apple M4 has 28B transistors, meaning I experienced a scale-up of 1,000,000x in my lifetime.

I expect a similar scale-up for language models.

15.02.2025 12:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

What is large for a language model? Is it 400B, 70B or maybe 1T?

I think focus on raw number of parameters is a less useful frame than thinking about inference speed, cost and location of inference (on-device vs cloud).

15.02.2025 12:56 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Follow where curiosity leads. It's the most durable source of motivation in research.

13.02.2025 18:20 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

More open reasoning datasets and distilled models.

It's great to see the energy of the community that got unleashed after open models that generate chains of thought!

13.02.2025 15:58 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

ColabFit Exchange is another great dataset curation effort that I'd like to boost.

Great work by @stemartiniani.bsky.social and team to curate the most diverse materials database in the world!

13.02.2025 13:53 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Neat idea! Fine-tuning using majority voting and length filtering generalises a model's capabilities.

Models generalise to slightly harder versions of a problem, and the correct answers are used to bootstrap the next model and the next one and so on.

13.02.2025 13:17 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Link to the initial data, more to comeπŸ‘‡

13.02.2025 10:18 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@flaviucipcigan is following 20 prominent accounts