Amir Hossein Kargaran's Avatar

Amir Hossein Kargaran

@kargaranamir.bsky.social

PhD Student at @cislmu.bsky.social Multilingual NLP and LLMs Twitter: https://x.com/amir_nlp Homepage: https://kargaranamir.github.io

35 Followers  |  91 Following  |  6 Posts  |  Joined: 25.11.2024  |  1.5248

Latest posts by kargaranamir.bsky.social on Bluesky

Are you working on multilingual, multicultural #LLM? Interested in diverse & inclusive language modeling?

😎 Stay tuned at our MELT workshop at #COLM2025

πŸ”— melt-workshop.github.io

We welcome 2p (EA), 4p (short), 8p (long) papers as well as talented reviewers:

πŸ”— forms.gle/MYcXED7RLJDS...

05.06.2025 08:39 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - cisnlp/code-specific-neurons: πŸ’»πŸ” How Programming Concepts and Neurons Are Shared in Code Language Models πŸ’»πŸ” How Programming Concepts and Neurons Are Shared in Code Language Models - cisnlp/code-specific-neurons

This work has been accepted as a Findings paper at ACL 2025 (@aclmeeting), in collaboration with Yihong Liu, @yvofr.bsky.social, and Hinrich SchΓΌtze. Code available at: github.com/cisnlp/code-specific-neurons

03.06.2025 17:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We observe both similarities and differences in how LLMs represent natural languages versus prgramming langauges.

03.06.2025 17:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

New paper: How does pretraining on programming languages + English shape LLMs' concept space?
πŸ” Do LLMs use English or a programming language as a kind of pivot language?
🧠 Are neurons language-specific or shared across programming languages and English?
πŸ”— arxiv.org/abs/2506.01074

03.06.2025 17:22 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0

Thanks to everyone who stopped by at our work! I’ll be at the conference until the closing night and would love to meet and connect with more people. Feel free to DM me here or on the Whova app.

13.12.2024 04:18 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages The need for large text corpora has increased with the advent of pretrained language models and, in particular, the discovery of scaling laws for these models. Most available corpora have sufficient d...

πŸ‡¨πŸ‡¦ I'll be in Montreal December 4–8, then Vancouver for NeurIPS to present our work on pretraining data for minority languages (arxiv.org/abs/2410.23825). Looking forward to reconnecting and meeting new people. DM me if you want to meet in the upcoming days! :)

01.12.2024 21:18 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

@kargaranamir is following 20 prominent accounts