David Dukiฤ‡'s Avatar

David Dukiฤ‡

@ddaviddukic.bsky.social

PhD in NLP | TakeLab ๐Ÿ‡ญ๐Ÿ‡ท | Information extraction, representation learning & analysis | Making LLMs better one step at a time

14 Followers  |  43 Following  |  18 Posts  |  Joined: 24.06.2025  |  1.7966

Latest posts by ddaviddukic.bsky.social on Bluesky


We already know prompt repetition is a handy hack to improve a decoder-only LMโ€™s performance as it allows the model to โ€œseeโ€ bidirectionally, an ability otherwise suppressed by the causal mask.

But what happens if we increase the number of repetitions? ๐Ÿค”๐Ÿงต @eaclmeeting.bsky.social #EACL2026

02.02.2026 12:04 โ€” ๐Ÿ‘ 5    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

๐Ÿ‘‹๐ŸŒŠ๐Ÿ‡ญ๐Ÿ‡ท

18.08.2025 10:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - dd1497/cro-diachronic-emb: Code for the paper Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings accepted at the 10th Workshop on Slavic Natural Language Process... Code for the paper Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings accepted at the 10th Workshop on Slavic Natural Language Processing 2025 (SlavicNLP 2025) - dd149...

Check out our work at the Slavic NLP workshop at ACL 2025 & our code/embeddings on Github github.com/dd1497/cro-d...

Feel free to reach out for any questions โœŒ๏ธ

Thanks to all my co-authors! @prshootana.bsky.social @camuljak.bsky.social @chatruncata.bsky.social @mtutek.bsky.social

15.07.2025 12:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

So, news becomes more positive as the years go by. Or does it? We trained sentiment classifiers on STONE & 24sata, then analyzed sentiment over 5 periods of the TL Retriever. We find that positivity rises at the expense of neutrality. But negativity in news headlines also increases.

15.07.2025 12:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We detect sentiment shift by swapping embeddings across periods. Using later-period embeddings in earlier periods results in increased positive sentiment. Using earlier-period embeddings in later periods results in decreased positive sentiment.

15.07.2025 12:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We wondered if the trained embeddings could tell us something about the shift in sentiment. Can we detect changes in positivity and negativity just using the trained embeddings? The answer is yes!

15.07.2025 12:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We identify words that change the most by their cumulative cosine distance scores within the last 25 years. For these words, we unveil the change in meaning by picking five nearest neighbors per period. We group the words into three major topics: EU, technology, and COVID.

15.07.2025 12:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We train embeddings using skip-gram with negative sampling (SGNS) method from Word2Vec. We align embeddings between different periods using Procrustes alignment. We validate the quality of embeddings on two word similarity datasets.

15.07.2025 12:14 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
TakeLab Retriever TakeLab Retriever

We leverage the TakeLab Retriever ๐Ÿ• (retriever.takelab.fer.hr) corpus of 10 million articles from Croatian news outlets, which we split into five equal periods (2000--2024).
Semantic change is measured using the cumulative cosine distance between embeddings in neighboring periods.

15.07.2025 12:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Despite traditional diachronic studies using corpora spanning centuries, we also find interesting results when training diachronic embeddings on only 25 years of news data. We detect words from 3 turbulent topicsโ€”EU, Technology, and COVIDโ€”whose semantics were strongly affected.

15.07.2025 12:14 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

๐Ÿ“ฃ๐Ÿ“ฃ New preprint alert!!

Despite events in the world becoming bleaker, the news isโ€ฆ more positive?

We conduct a diachronic study of word embeddings trained on 10M Croatian news articles spanning 25 years and find some surprising results!

arxiv.org/abs/2506.13569

15.07.2025 12:14 โ€” ๐Ÿ‘ 2    ๐Ÿ” 2    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1
Preview
GitHub - dd1497/cro-diachronic-emb: Code for the paper Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings accepted at the 10th Workshop on Slavic Natural Language Process... Code for the paper Characterizing Linguistic Shifts in Croatian News via Diachronic Word Embeddings accepted at the 10th Workshop on Slavic Natural Language Processing 2025 (SlavicNLP 2025) - dd149...

Check out our work at the Slavic NLP workshop at ACL 2025 & our code/embeddings on Github github.com/dd1497/cro-d...

Feel free to reach out for any questions โœŒ๏ธ

Thanks to all my co-authors! @prshootana.bsky.social @camuljak.bsky.social @chatruncata.bsky.social @mtutek.bsky.social

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

So, news becomes more positive as the years go by. Or does it? We trained sentiment classifiers on STONE & 24sata, then analyzed sentiment over 5 periods of the TL Retriever. We find that positivity rises at the expense of neutrality. But negativity in news headlines also increases.

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We detect sentiment shift by swapping embeddings across periods. Using later-period embeddings in earlier periods results in increased positive sentiment. Using earlier-period embeddings in later periods results in decreased positive sentiment.

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

We wondered if the trained embeddings could tell us something about the shift in sentiment. Can we detect changes in positivity and negativity just using the trained embeddings? The answer is yes!

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We identify words that change the most by their cumulative cosine distance scores within the last 25 years. For these words, we unveil the change in meaning by picking five nearest neighbors per period. We group the words into three major topics: EU, technology, and COVID.

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

We train embeddings using skip-gram with negative sampling (SGNS) method from Word2Vec. We align embeddings between different periods using Procrustes alignment. We validate the quality of embeddings on two word similarity datasets.

15.07.2025 12:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
TakeLab Retriever TakeLab Retriever

We leverage the TakeLab Retriever ๐Ÿ• (retriever.takelab.fer.hr) corpus of 10 million articles from Croatian news outlets, which we split into five equal periods (2000--2024).
Semantic change is measured using the cumulative cosine distance between embeddings in neighboring periods.

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Despite traditional diachronic studies using corpora spanning centuries, we also find interesting results when training diachronic embeddings on only 25 years of news data. We detect words from 3 turbulent topicsโ€”EU, Technology, and COVIDโ€”whose semantics were strongly affected.

15.07.2025 12:08 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@ddaviddukic is following 20 prominent accounts