Antonis Anastasopoulos @antonisa

We’re having a (human) language acquisition meetup at #ACL2025. RSVP on Whova for updates!

28.07.2025 15:08 — 👍 2 🔁 1 💬 0 📌 0

GitHub - andhmak/rule_dialnorm: Code and datasets associated with the paper titled "Dialect Normalization using Large Language Models and Morphological Rules" Code and datasets associated with the paper titled "Dialect Normalization using Large Language Models and Morphological Rules" - GitHub - andhmak/rule_dialnorm: Code and datasets associa...

Proud to work with John Pavlopoulos and @antonisa.bsky.social on this publication!

Check out the data and code here: github.com/andhmak/rule...

4/4

25.07.2025 17:52 — 👍 2 🔁 1 💬 0 📌 0

I'm attending #ACL in Vienna this week. 🇦🇹 We're running a BoF on Language Technologies for Crisis Response and Preparedness, co-hosted w/Will Lewis

📆 Wed. 30th, 11am. Room 1.33.
You can join us virtually too. DM me if you’re interested ✨

@wildlewis.bsky.social @antonisa.bsky.social

28.07.2025 12:45 — 👍 2 🔁 1 💬 0 📌 0

Looking forward to this year's edition! With great speakers: Ryan McDonald Yulan He @vn-ml.bsky.social @antonisa.bsky.social Raquel Fernandez @annarogers.bsky.social Preslav Nakov @mohitbansal.bsky.social @eunsol.bsky.social Marie-Catherine de Marnefffe !

06.06.2025 09:10 — 👍 6 🔁 3 💬 0 📌 0

📢 It's official! Save the Date!

The #AthNLP Summer School is coming!
📅 4-10 September 2025
📍 Athens, Greece
@athnlp.bsky.social a top NLP summer school, offers a week of lectures, workshops, and networking.
📖 athnlp.github.io/2025/index.h...
#AthNLP2025 #NLP #AI #SummerSchool

06.03.2025 15:07 — 👍 5 🔁 3 💬 0 📌 0

📢 I am looking for a postdoc for the next academic year!
(Due to the funding source, US persons preferred)

Interested in multimodal LLMs and their application to education domains (as well as multilinguality, cross-lingual, and low-resource learning)?

If yes, send me a message here or an email!

09.05.2025 18:38 — 👍 3 🔁 0 💬 0 📌 0

Usually on an ipad these days...
I think I just hate having to write notes in the middle of a line in the 1-col papers, so it's probably about how close the space is to the text as opposed to how abundant it is

20.02.2025 16:07 — 👍 0 🔁 0 💬 0 📌 0

I find the 2-col format easier for reviewing/note-taking/suggesting edits, because the info is spread out vertically and I have more margin space for notes closer to the actual text.

But for just reading, agreed, we should just produce dynamic pubs that people can customize to their preferences.

19.02.2025 17:18 — 👍 2 🔁 0 💬 2 📌 0

A recently discharged Division 252 officer describes the arbitrary nature of this boundary: "For the division, the kill zone extends as far as a sniper can see." But the issue goes beyond geography. "We're killing civilians there who are then counted as terrorists," he says. "The IDF spokesperson's announcements about casualty numbers have turned this into a competition between units. If Division 99 kills 150 [people], the next unit aims for 200." These accounts of indiscriminate killing and the routine classification of civilian casualties as terrorists emerged repeatedly in Haaretz's conversations with recent Gaza veterans.

Fucking hell... Seriously, there's no going back from this. If it were not from Haaretz, nobody would believe it. Worse part? Nobody fucking cares.
archive.ph/NVG4p#select...

19.12.2024 07:48 — 👍 1788 🔁 824 💬 25 📌 55

Another example of a point that supports this argument is the observation that more than 50% of the "facts" that are available in Wikipedia/Wikidata, they are only available or retrievable in a _single_ language. The observation is hidden somewhere in this paper
aclanthology.org/2020.emnlp-m...

05.12.2024 15:34 — 👍 4 🔁 0 💬 0 📌 0

PNAS Proceedings of the National Academy of Sciences (PNAS), a peer reviewed journal of the National Academy of Sciences (NAS) - an authoritative source of high-impact, original research that broadly spans...

Hm, I think I have a couple of examples.
The first is the linked PNAS paper, titled "Language extinction triggers the loss of unique medicinal knowledge" www.pnas.org/doi/epdf/10....

05.12.2024 15:34 — 👍 3 🔁 0 💬 1 📌 0

Excited to announce the launch of our ML-SUPERB 2.0 challenge @interspeech.bsky.social 2025! Join us in pushing the boundaries of multilingual ASR and LID! 🚀

💻 multilingual.superbbenchmark.org

04.12.2024 18:09 — 👍 8 🔁 3 💬 0 📌 0

In the above examples, some people had a problem, and a computer scientist steps in to help produce a solution, and they write a paper about it so that if anyone else has a similar problem in the future, there's a guide to solving it. How's that not enough of a contribution?

05.12.2024 04:09 — 👍 16 🔁 0 💬 2 📌 0

next thing you know you realize that you need and you starr building a simplification dataset for the contact language (let's pick Rioplatense Spanish for this example), or building NER tools that can handle the specific regional orthographic variations of Cypriot Greek.

05.12.2024 04:09 — 👍 14 🔁 0 💬 1 📌 0

Or they might result from the specific needs of a scientific (or not) team. Hypothetical example: a sociologist teams up with a meteorologist and a computer scientist to figure out how to best convey changing climate threats to an indigenous community, and ...

05.12.2024 04:09 — 👍 14 🔁 0 💬 1 📌 0

Or the leaders of a different community might actually want an LLM to ensure their language has the same perceived prestige and tool access as a more dominant language that might be threatening theirs.

05.12.2024 04:09 — 👍 16 🔁 0 💬 1 📌 0

A lot of the "narrow"-focus datasets on otherwise underserved languages might be the result of the specific needs of the community: a community might not need an LLM, but they might need a morphosyntactic analyser that they can deploy in a classroom to teach their language.

05.12.2024 04:09 — 👍 16 🔁 1 💬 1 📌 0

Sure, If your space of scientific questions is only "how can I train a model to do X?", then the slight variation of "how can I train a model to do X in language Y?" is not too interesting in and of itself (although there might be arguments just for that, see above)

05.12.2024 04:09 — 👍 9 🔁 0 💬 1 📌 0

The extent to which we understand them, in our current setting, is measured by the datasets that you complain about.
Yes, we might _believe_ that model X will be able to perform task Y in some language Z and context W, but we don't _know_, not until we actually try (and often find "...Not quite").

05.12.2024 04:09 — 👍 11 🔁 0 💬 1 📌 0

Human beings have come up with 7000+ ways to communicate. And each of these modes encodes unique sociocultural, historical, (and potentially more types of *al) information. So being able to understand (or have machines understand) them ensures that we don't lose part of our collective knowledge.

05.12.2024 04:09 — 👍 28 🔁 4 💬 2 📌 1

Late to the party, but after reading most subthreads, I'll bite.
I think your whole premise suggests a very narrow view of what is science or scientific contributions.

05.12.2024 04:09 — 👍 9 🔁 0 💬 1 📌 0

These are good tips!

03.12.2024 18:22 — 👍 1 🔁 0 💬 1 📌 0

interspeech2025.org challenge MultiLingual Speech processing Universal PERformance Benchmark Organizers: Antonios Anastasopoulos, Martijn Bartelds, William Chen, Dan Jurafsky, Hung-yi Lee, Karen Livescu, Chutong Meng, Jiatong Shi, Hsiu-Hsuan Wang, Shih-Heng Wang, Shinji Watanabe

🌍🎤 ML-SUPERB 2.0 Challenge at #Interspeech2025: Push the boundaries of cross-lingual speech processing!
🚀 154 languages & 200+ accents/dialects
📊 Live leaderboard & online evaluation! Join now: multilingual.superbbenchmark.org

30.11.2024 21:22 — 👍 10 🔁 5 💬 1 📌 0

a bald man wearing a chicago fire department vest is clapping his hands ALT: a bald man wearing a chicago fire department vest is clapping his hands

29.11.2024 15:38 — 👍 1 🔁 0 💬 0 📌 0

AI About-Face: 'Mantis' Turns LLM Attackers Into Prey Experimental counter-offensive system responds to malicious AI probes with their own surreptitious prompt-injection commands.

We had a great discussion with @robertlemos.bsky.social from Dark Reading about our new paper "Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks"(arxiv.org/abs/2410.20911). Mantis turns the hardness of dealing with prompt injections into an opportunity!

21.11.2024 13:22 — 👍 7 🔁 4 💬 1 📌 0

The multi-number phonebook retrieval task entails retrieving several phone numbers from a phonebook at once, given names. Hawk trained using Birdie strongly outperforms Hawk trained using Next Token Prediction on the multi-number phonebook retrieval task. Hawk trained using Next Token Prediction performs just above random guessing when retrieving 1 and 4 phone numbers, and falls to random performance when retrieving more than 4 phone numbers. In contrast, Hawk trained using Birdie gets 100% accuracy when retrieving 1 phone number. That 100% score slowly decays to about 80% accuracy when retrieving up to 32 phone numbers simultaneously. Two Transformers are included, one trained using Birdie, and the other trained using Next Token Prediction. They both always achieve about 100% accuracy, even when retrieving 32 phone numbers.

Meet Birdie 🐤!

Our EMNLP 2024 paper boosts SSMs like Mamba and Hawk on long-range, context-heavy tasks, closing the gap with Transformers.

Proud to work with @jimmysmith1919.bsky.social, @antonisa.bsky.social, & Amarda Shehu.

📄 Paper: arxiv.org/abs/2411.01030
💻 Code: github.com/samblouir/bi...

18.11.2024 17:28 — 👍 17 🔁 1 💬 1 📌 0

Are there any resources for helping introverted students navigate the craziness that is conference attendance, so that they make the most of it?

I mean, I can tell my students to "go talk to people", but I'm looking for something more comprehensive.

I (extrovert advisor) am struggling.

14.11.2024 02:38 — 👍 5 🔁 0 💬 1 📌 0

This 👇
the other place has been not-fun for a while -- I'll try to post here as well!

11.11.2024 23:10 — 👍 6 🔁 1 💬 0 📌 0

Antonis Anastasopoulos

Latest posts by antonisa.bsky.social on Bluesky

@antonisa is following 20 prominent accounts