Sweta Karlekar's Avatar

Sweta Karlekar

@swetakar.bsky.social

Machine learning PhD student @ Blei Lab in Columbia University Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling! Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams. πŸ”— www.sweta.dev

2,556 Followers  |  1,193 Following  |  31 Posts  |  Joined: 18.11.2024  |  2.1365

Latest posts by swetakar.bsky.social on Bluesky

Post image

Hello!

We will be presenting Estimating the Hallucination Rate of Generative AI at NeurIPS. Come if you'd like to chat about epistemic uncertainty for In-Context Learning, or uncertainty more generally. :)

Location: East Exhibit Hall A-C #2703
Time: Friday @ 4:30
Paper: arxiv.org/abs/2406.07457

12.12.2024 18:13 β€” πŸ‘ 22    πŸ” 4    πŸ’¬ 0    πŸ“Œ 1

fun @bleilab.bsky.social x oatml collab

come chat with Nicolas , @swetakar.bsky.social , Quentin , Jannik , and i today

13.12.2024 17:26 β€” πŸ‘ 10    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Check out our new paper from the Blei Lab on probabilistic predictions with conditional diffusions and gradient boosted trees! #Neurips2024

02.12.2024 23:02 β€” πŸ‘ 34    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0

Check out our new paper about hypothesis testing the circuit hypothesis in LLMs! This work previously won a top paper award at the ICML mechanistic interpretability workshop, and we’re excited to share it at #Neurips2024

10.12.2024 19:07 β€” πŸ‘ 7    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

For anyone interested in fine-tuning or aligning LLMs, I’m running this free and open course called smol course. It’s not a big deal, it’s just smol.

🧡>>

03.12.2024 09:21 β€” πŸ‘ 329    πŸ” 64    πŸ’¬ 9    πŸ“Œ 4

Very happy to share some recent work by my colleagues @velezbeltran.bsky.social, @aagrande.bsky.social and @anazaret.bsky.social! Check out their work on tree-based diffusion models (especially the websiteβ€”it’s quite superb 😊)!

02.12.2024 22:49 β€” πŸ‘ 13    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - andrewyng/aisuite: Simple, unified interface to multiple Generative AI providers Simple, unified interface to multiple Generative AI providers - GitHub - andrewyng/aisuite: Simple, unified interface to multiple Generative AI providers

Just learned about @andrewyng.bsky.social's new tool, aisuite (github.com/andrewyng/ai...) and wanted to share! It's a standardized wrapper around chat completions that lets you easily switch between querying different LLM providers, including OpenAI, Anthropic, Mistral, HuggingFace, Ollama, etc.

29.11.2024 20:25 β€” πŸ‘ 22    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Announcing the NeurIPS 2024 Test of Time Paper AwardsΒ  – NeurIPS Blog

Test of Time Paper Awards are out! 2014 was a wonderful year with lots of amazing papers. That's why, we decided to highlight two papers: GANs (@ian-goodfellow.bsky.social et al.) and Seq2Seq (Sutskever et al.). Both papers will be presented in person 😍

Link: blog.neurips.cc/2024/11/27/a...

27.11.2024 15:48 β€” πŸ‘ 110    πŸ” 14    πŸ’¬ 1    πŸ“Œ 2

Sorry John, that isn’t my area of expertise!

25.11.2024 00:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is very interesting! Do you have any intuition as to whether or not this phenomenon happens only with very simple β€œreasoning” steps? Does relying on retrieval increase as you progress from simple math to more advanced prompts like GSM8K or adversarially designed prompts (like adding noise)?

24.11.2024 16:29 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Many circles of different sizes, representing a visualization of inequality

Many circles of different sizes, representing a visualization of inequality

The Gini coefficient is the standard way to measure inequality, but what does it mean, concretely? I made a little visualization to build intuition:
www.bewitched.com/demo/gini

23.11.2024 15:35 β€” πŸ‘ 199    πŸ” 57    πŸ’¬ 10    πŸ“Œ 8
Post image Post image Post image Post image

Interested in machine learning in science?

Timo and I recently published a book, and even if you are not a scientist, you'll find useful overviews of topics like causality and robustness.

The best part is that you can read it for free: ml-science-book.com

15.11.2024 09:45 β€” πŸ‘ 131    πŸ” 30    πŸ’¬ 7    πŸ“Œ 4
Preview
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers Large pretrained language models have shown surprising in-context learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without parameter u...

Learning doesn’t have to mean explicit weight changes; ICL can be viewed as temporary implicit finetuning (arxiv.org/abs/2212.10559) or like a β€œstate” change to the model instead of a weight change, akin to how learning happens in fast RL vs slow RL (www.cell.com/trends/cogni...).

22.11.2024 22:31 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A statistical approach to model evaluations A research paper from Anthropic on how to apply statistics to improve language model evaluations

new paper from Anthropic on LLM evaluation recommendations

www.anthropic.com/research/sta...

22.11.2024 12:47 β€” πŸ‘ 13    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Book outline

Book outline

Over the past decade, embeddings β€” numerical representations of
machine learning features used as input to deep learning models β€” have
become a foundational data structure in industrial machine learning
systems. TF-IDF, PCA, and one-hot encoding have always been key tools
in machine learning systems as ways to compress and make sense of
large amounts of textual data. However, traditional approaches were
limited in the amount of context they could reason about with increasing
amounts of data. As the volume, velocity, and variety of data captured
by modern applications has exploded, creating approaches specifically
tailored to scale has become increasingly important.
Google’s Word2Vec paper made an important step in moving from
simple statistical representations to semantic meaning of words. The
subsequent rise of the Transformer architecture and transfer learning, as
well as the latest surge in generative methods has enabled the growth
of embeddings as a foundational machine learning data structure. This
survey paper aims to provide a deep dive into what embeddings are,
their history, and usage patterns in industry.

Over the past decade, embeddings β€” numerical representations of machine learning features used as input to deep learning models β€” have become a foundational data structure in industrial machine learning systems. TF-IDF, PCA, and one-hot encoding have always been key tools in machine learning systems as ways to compress and make sense of large amounts of textual data. However, traditional approaches were limited in the amount of context they could reason about with increasing amounts of data. As the volume, velocity, and variety of data captured by modern applications has exploded, creating approaches specifically tailored to scale has become increasingly important. Google’s Word2Vec paper made an important step in moving from simple statistical representations to semantic meaning of words. The subsequent rise of the Transformer architecture and transfer learning, as well as the latest surge in generative methods has enabled the growth of embeddings as a foundational machine learning data structure. This survey paper aims to provide a deep dive into what embeddings are, their history, and usage patterns in industry.

Cover image

Cover image

Just realized BlueSky allows sharing valuable stuff cause it doesn't punish links. 🀩

Let's start with "What are embeddings" by @vickiboykis.com

The book is a great summary of embeddings, from history to modern approaches.

The best part: it's free.

Link: vickiboykis.com/what_are_emb...

22.11.2024 11:13 β€” πŸ‘ 654    πŸ” 102    πŸ’¬ 22    πŸ“Œ 6

(Shameless) plug for David Blei's lab at Columbia University! People in the lab currently work on a variety of topics, including probabilistic machine learning, Bayesian stats, mechanistic interpretability, causal inference and NLP.

Please give us a follow! @bleilab.bsky.social

20.11.2024 20:41 β€” πŸ‘ 20    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Hi! Our lab does Bayesian stuff :) Could you add Dave Blei's lab to this pack as well if it's not already full? @bleilab.bsky.social

20.11.2024 15:38 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Could you add Dave Blei's lab to this pack as well if it's not already full? @bleilab.bsky.social

20.11.2024 15:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Could you add Dave Blei's lab to this pack as well if it's not already full? @bleilab.bsky.social

20.11.2024 15:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Could you add Dave blei's lab to this pack as well if it's not already full! @bleilab.bsky.social

20.11.2024 15:36 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We created an account for the Blei Lab! Please drop a follow 😊

@bleilab.bsky.social

20.11.2024 15:34 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - thu-ml/SageAttention: Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various mod... Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models. - thu-ml/SageAttention

Almost 3x faster than FlashAttentiion2 github.com/thu-ml/SageA...

20.11.2024 15:06 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Oh, I’ve been meaning to check out that YouTube seriesβ€”thanks! Also sadly, there's no class website, but I can share the "super quick intro to mech interp" presentation I made. It’s somewhat rough, but hopefully, it gets the main points across! sweta.dev/files/intro_...

20.11.2024 15:08 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Mailing list contact information Information to be added to the post-Bayes mailing list.

πŸ“’ Post-Bayesian online seminar series coming!πŸ“’
To stay posted, sign up at
tinyurl.com/postBayes
We'll discuss cutting-edge methods for posteriors that no longer rely on Bayes Theorem.
(e.g., PAC-Bayes, generalised Bayes, Martingale posteriors, ...)
Pls circulate widely!

19.11.2024 20:22 β€” πŸ‘ 16    πŸ” 6    πŸ’¬ 0    πŸ“Œ 3
Preview
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 β€” AI Alignment Forum This post represents my personal hot takes, not the opinions of my team or employer. This is a massively updated version of a similar list I made two…

Adding Neel Nanda's favorite paper list as well: www.alignmentforum.org/posts/NfFST5...
(6/n)

20.11.2024 14:55 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Can’t believe I forgot about this paper, thanks so much!!

20.11.2024 13:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I haven’t read the first one but it looks very informative, thank you!! We also had a separate unit on transformer architecture; I’m going to add this to that paper list as well!

20.11.2024 13:55 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Starter packs I found:
AI (*about* AI, not *for* an AI) go.bsky.app/SipA7it
Spoken Language Processing bsky.app/starter-pack...
Diversify Tech's pack bsky.app/starter-pack...
Women in Tech bsky.app/starter-pack...
Great UK Commentators bsky.app/starter-pack...
Linguistics bsky.app/starter-pack...

19.11.2024 22:04 β€” πŸ‘ 19    πŸ” 3    πŸ’¬ 3    πŸ“Œ 0

Ooo this is a really cool resource; I didn’t know about it before, thank you for sharing the link!

19.11.2024 21:47 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Would love to be added!

19.11.2024 16:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@swetakar is following 19 prominent accounts