Antonios Dimakis's Avatar

Antonios Dimakis

@antoniosdimakis.bsky.social

PhD fellow at Archimedes Unit, Athena Research Center | PhD student at the National and Kapodistrian University of Athens Interested in NLP for low-resource languages/terms, tokenization, and linguistics

8 Followers  |  15 Following  |  4 Posts  |  Joined: 25.07.2025  |  1.3962

Latest posts by antoniosdimakis.bsky.social on Bluesky

Preview
GitHub - andhmak/rule_dialnorm: Code and datasets associated with the paper titled "Dialect Normalization using Large Language Models and Morphological Rules" Code and datasets associated with the paper titled "Dialect Normalization using Large Language Models and Morphological Rules" - GitHub - andhmak/rule_dialnorm: Code and datasets associa...

Proud to work with John Pavlopoulos and @antonisa.bsky.social on this publication!

Check out the data and code here: github.com/andhmak/rule...

4/4

25.07.2025 17:52 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Regions clustered based on the embeddings of their proverbs. Normalized proverbs produce much more meaningful groupings.

Regions clustered based on the embeddings of their proverbs. Normalized proverbs produce much more meaningful groupings.

We implement our method for Greek, and experiment on a proverb dataset. We therefore very cheaply extend NLU coverage of models pre-trained on just the standard to almost every Greek dialect.

After normalizing we even find cultural insights which were previously obscured!

3/4

25.07.2025 17:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Table showing normalization quality for different setups, with the full setup obtaining good scores.

Table showing normalization quality for different setups, with the full setup obtaining good scores.

"Dialect Normalization using Large Language Models and Morphological Rules"

By applying rule-based, linguistically informed transformations to the input before passing it to a LLM, with targeted few-shot prompting, we can obtain high-quality normalized outputs.

2/4

25.07.2025 17:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Example of a dialectal sentence being normalized incorrectly when using a base LLM, and the same sentence normalized correctly using our method.

Example of a dialectal sentence being normalized incorrectly when using a base LLM, and the same sentence normalized correctly using our method.

How can we make models understand dialectal input, even in dialects with very little data available?

Our work indicates that Rule-Based Normalization can significantly help.

If you're at #ACL2025, check out our poster on Monday at 6pm! aclanthology.org/2025.finding...

1/4

25.07.2025 17:46 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@antoniosdimakis is following 15 prominent accounts