Back after a successful #EMNLP2025 conference in Suzhou, China -- some impressions ⤵️
Our papers: www.copenlu.com/news/8-paper...
@apepa.bsky.social @rnv.bsky.social @siddesh.bsky.social @kirekara.bsky.social @shoejoe.bsky.social @zainmujahid.me @lucasresck.bsky.social @copenlu.bsky.social
#NLProc
Attending EMNLP 2025 this week? So is CopeNLU -- come find us there! ⤵️
www.copenlu.com/news/8-paper...
@apepa.bsky.social @rnv.bsky.social @kirekara.bsky.social @shoejoe.bsky.social @dustinbwright.com @zainmujahid.me @lucasresck.bsky.social @iaugenstein.bsky.social
#NLProc #AI #EMNLP2025
The last round of applause goes to the @copenlu.bsky.social lab, @ucph.bsky.social and my amazing colleagues and friends there for the heartwarming, inspiring and fun times we had ♥️ to everyone involved in this journey goes my deepest sympathy ♥️♥️
I also want to thank the fantastic PhD committee,
@barbaraplank.bsky.social , Ivan Titov and
@delliott.bsky.social sky.social, for their deep, thought-provoking and insightful questions and analysis.
I defended my PhD at the University of Copenhagen ☺️ What a journey! I want to give massive thanks to my amazing supervisors, @iaugenstein.bsky.social and @neuralnoise.com who were there with me throughout the whole process.
Thesis on: osoblanco.github.io/thesis/
The Arxiv version is coming soon!
@dfdazac.bsky.social was an honor to work with someone as amazing as you.
The line made me teary 🥹🥹♥️♥️
Hello bluesky!
I'm using this first post to share that my PhD thesis is now available online at research.vu.nl/en/publicati...
Thanks to all my collaborators who joined me in this journey!
I think given the current weird/awful state of how reviewing is handled in major ML venues we would explicitly need ranking the reviewers even if they are anonymous. This can help (S)ACs at least internally filter out malicious and unqualified ones.
Will work on smth like this closer to ~ICML.
What i secretly desire is even stricter than grounding with RAG. Maybe have a big Knowledge Graph for grounding and use a good neural link predictor for confirming if the facts are correct. This covers factuality, we also would like deductive and analytic reasoning similar to a theorem prover.
The main question about the current LLM “reasoning” research is what to do next. Most go into synthetic generation and training on maybe with self-Refinement in hopes the model becomes better. I think we are missing controlled task formalization, step by step reasoning and strict step verification.
My amazing collaborators will be presenting three papers next week at EMNLP 2024! I wrote a blog post about our EMNLP papers and some of the other projects we're brewing 🚀🙂 neuralnoise.com/2024/nov-res...
The results consistently show that, over each model, traces that lead to correct answers had a higher percentage of unique emergent facts and overlap in the relations used between the code and search, while the portion of underutilized relations was lower.🤔🤔
By comparing relations in code with those in search traces, we measure emergent hallucinations and unused relations, highlighting areas of sub-optimal reasoning. We also assess the uniqueness of emergent facts per inference hop, indicating the extent of problem-space exploration.
We found out that there is a strong correlation between the search faithfulness towards the code and model performance across all of the models.
Using FLARE also allows the evaluation of faithfulness of the completed search w.r.t. the defined facts, relations, and search logic (taken from Prolog). We simply compare (ROUGE-Lsum) the simulated search with the actual code execution when available.
The method boosts the performance of various LLMs at different scales (8B -> 100B+) compared to CoT and Faithful CoT on various Mathematical, Multi-Hop, and Relation Inference tasks.
LLM formalizes the tasks using Prolog into facts, relations, and search logic and simulates exhaustive search by iteratively exploring the problem space with backtracking.
👋Psst! Want more faithful, verifiable and robust #LLM reasoning than with CoT, but using external solvers is meh? Our FLARE💫uses Logic Prog with Exhaustive Simulated Search to achieve this.🧵
@pminervini.bsky.social, Patrick Lewis, Pat Verga and @iaugenstein.bsky.social
arxiv.org/abs/2410.11900
At #EMNLP2024 we will present our paper on LLM values and opinions!
We introduce tropes: repeated and consistent phrases which LLMs generate to argue for political stances.
Read the paper to learn more! arxiv.org/abs/2406.19238
Work done Uni Copenhagen + Pioneer Center for AI
Hey! 🙂 we analysed what happens during pre-training, and for causal LMs, intra-document causal masking helps quite a bit both in terms of pre-training dynamics and downstream task performance: arxiv.org/abs/2402.13991