@camuljak - Bluesky Profile

For some reason my account got suspended after posting this. Weird moderation.
Having restored my account, I'm reposting to increase visibility.

07.02.2026 11:33 — 👍 1 🔁 0 💬 0 📌 0

The paper is accepted at EACL Findings. See you in Rabat! 🇲🇦

Shoutout to @mlkukic.bsky.social (just started his MS, hire him!) @ddaviddukic.bsky.social @mtutek.bsky.social and sensei Jan Šnajder for this cute collaboration.

📜https://arxiv.org/abs/2601.17585
💻https://github.com/takelab/repetition-sl

02.02.2026 12:04 — 👍 0 🔁 0 💬 0 📌 0

We establish multi-fold repetition with early exiting as a viable strategy for decoder-as-encoder adaptation, one that does not require complex architectural modifications or extensive training. 💰

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

For Mistral-7B, we find that embeddings from layer 24 (out of 32) can even outperform those at the last layer, while matching the processing time of the input sequence with no repetitions.

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

To counteract the computational overhead, we experiment with early exiting, by using the representations in the intermediate layers of the models.💡

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

However, the performance gains saturate around 4 repetitions. Also – adding many repetitions incurs computational costs. 🤨

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

Indeed, we observe performance gains over SotA baselines such as removing the causal mask on all the layers in the model (full unmasking) or only the ones in the middle (middle unmasking), and SotA encoder-only models (ModernBERT and RoBERTa).

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

Therefore, additional repetitions bring the model closer to a balanced ratio of left- and right-context information throughout the entire input sequence. ⚖️

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

We demonstrate the utility of increased repetitions on sequence labeling tasks such as NER or aspect-based sentiment analysis. 📈

We focus on token-level tasks as they require bidirectional context at each token, something decoder-only models lack.

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

Additional repetitions increase the proportion of bidirectional blocks, and with a little bit of high school math, it is easy to see that this proportion approaches 1 at infinite repetitions, thus resembling an encoder-only model.

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

💡Thus, we wanted to have a look at what happens if a model is fine-tuned to utilize additional repetitions. In theory, repeating a sequence once leads to a bidirectional block in the attention matrix.

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

Previous works found that performance gains dissipate at higher repetition counts. 🔁🔁🔁...
We found this phenomenon counterintuitive since additional repetitions effectively increase the processing capacity of the model.

02.02.2026 12:04 — 👍 0 🔁 0 💬 1 📌 0

We already know prompt repetition is a handy hack to improve a decoder-only LM’s performance as it allows the model to “see” bidirectionally, an ability otherwise suppressed by the causal mask.

But what happens if we increase the number of repetitions? 🤔🧵 @eaclmeeting.bsky.social #EACL2026

02.02.2026 12:04 — 👍 5 🔁 4 💬 1 📌 1

Very honored to be one out of seven outstanding papers at this years' EMNLP :)

Huge thanks to my amazing collaborators @fatemehc.bsky.social @anamarasovic.bsky.social @boknilev.bsky.social , this would not have been possible without them!

07.11.2025 08:58 — 👍 22 🔁 6 💬 2 📌 2

CLAN for AI Research on Language and Networks CLAN for AI Research on Language and Networks, or **CLAN** for short, is a research group and a part of the larger [DIS Group](https://cs.au.dk/research/data-intensive-systems) in the [Department of C...

Back from #ICML2025 🛬, and off 🚄 to Norrköping 🇸🇪 for #ic2s2

CLAN (cs.au.dk/~clan/) members are presenting 2 papers: 1 spotlight and 1 oral. See 🧵for posters and summaries

👋 Reach out to chat about observational studies, causality, LLM agents, human-centered AI etc.

21.07.2025 08:08 — 👍 4 🔁 2 💬 1 📌 0

We did a cool group project exploring diachronic embeddings for Croatian and found that (among other things) embeddings trained on later periods are more positive when plugged into models trained on earlier time periods.

Check out the thread 🧵 & come talk to us in Vienna about this & other works 🍻

15.07.2025 14:50 — 👍 2 🔁 1 💬 0 📌 0

Latest posts by camuljak.bsky.social on Bluesky

@camuljak is following 8 prominent accounts