Joshua Ong's Avatar

Joshua Ong

@jong21.bsky.social

BSc @ University of Edinburgh Natural Language Processing, LLM Reasonings Actively seeking for PhD position for 2025 spring/fall✨

861 Followers  |  344 Following  |  13 Posts  |  Joined: 18.11.2024  |  1.6354

Latest posts by jong21.bsky.social on Bluesky

MMLU-Redux Poster at NAACL 2025

MMLU-Redux Poster at NAACL 2025

MMLU-Redux just touched down at #NAACL2025! πŸŽ‰
Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope πŸ˜…
If anyone's swinging by, give our research some love! Hit me up if you check it out! πŸ‘‹

02.05.2025 13:00 β€” πŸ‘ 16    πŸ” 11    πŸ’¬ 0    πŸ“Œ 0

Thanks @nolovedeeplearning.bsky.social for the picture!!! πŸ₯°

06.12.2024 21:54 β€” πŸ‘ 21    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1

Very cool work! πŸ‘πŸš€ Unfortunately, errors in the original dataset will propagate to all new languages πŸ˜•

We investigated the issue of existing errors in the original MMLU in
arxiv.org/abs/2406.04127

@aryopg.bsky.social @neuralnoise.com

06.12.2024 13:57 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 1

For clarity -- great project, but most of the MMLU errors we found (and fixed) in our MMLU Redux paper (arxiv.org/abs/2406.04127) are also present in this dataset. We also provide a curated version of MMLU, so it's easy to fix 😊

06.12.2024 09:26 β€” πŸ‘ 15    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0

Super Cool work from Cohere for AI! πŸŽ‰ However, this highlights a concern raised by our MMLU-Redux team (arxiv.org/abs/2406.04127): **error propagation to many languages**. Issues in MMLU (e.g., "rapid intervention to solve ebola") seem to persist in many languages. Let's solve the root cause first?

06.12.2024 09:38 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0
Post image

Sohee (@soheeyang.bsky.social) in the house! πŸš€πŸš€πŸš€

05.12.2024 14:38 β€” πŸ‘ 9    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

The OLMo 2 models sit at the Pareto frontier of training FLOPs vs model average performance.

Meet OLMo 2, the best fully open language model to date, including a family of 7B and 13B models trained up to 5T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B β€” As always, we released our data, code, recipes and more 🎁

26.11.2024 20:51 β€” πŸ‘ 151    πŸ” 36    πŸ’¬ 5    πŸ“Œ 12
Post image

This papers' findings about testing LLMs on NLI aligns with many of personal thoughts:

1) NLI remains a difficult task for LLMs
2) Having more few-shot examples is helpful (in my view, helping LLMs better understand class boundaries)
3) Incorrect predictions are often a result of ambiguous labels

24.11.2024 16:38 β€” πŸ‘ 27    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

Hey John! Thanks for reaching outβ€”I’ve sent you a DM to discuss this further!

24.11.2024 22:25 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hii I’d love to join as well!!!πŸ™‹πŸΌβ€β™€οΈ

24.11.2024 03:48 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hii I’d love to join as well!!

24.11.2024 03:46 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Check out our CoMAT: Chain of Mathematically Annotated Thought, which improves mathematical reasoning by converting mathematical questions into structured symbolic representations and performing step-by-step reasoningπŸŽ‰ works on various languages and challenging benchmarks

arxiv.org/pdf/2410.103...

20.11.2024 15:29 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The main question about the current LLM β€œreasoning” research is what to do next. Most go into synthetic generation and training on maybe with self-Refinement in hopes the model becomes better. I think we are missing controlled task formalization, step by step reasoning and strict step verification.

19.11.2024 05:34 β€” πŸ‘ 24    πŸ” 3    πŸ’¬ 5    πŸ“Œ 1

Thanksss!!!!!

20.11.2024 14:50 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

1/ Introducing α΄α΄˜α΄‡Ι΄κœ±α΄„Κœα΄ΚŸα΄€Κ€: a retrieval-augmented LM to help scientists synthesize knowledge πŸ“š
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

19.11.2024 16:30 β€” πŸ‘ 161    πŸ” 39    πŸ’¬ 6    πŸ“Œ 8

Hi I’d love to be added as well!πŸ™‹πŸΌβ€β™€οΈ

20.11.2024 13:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hey, I’m available! However, I can’t send you a dm since it’s restricted to followers. If you could send me a message instead, that’d be great!

20.11.2024 13:40 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

I’ll be travelling to London from Wednesday to Friday for an upcoming event and would be very happy to meet up! πŸš€
I'd love to chat about my recent works (DeCoRe, MMLU-Redux, etc.). DM me if you’re around! πŸ‘‹

DeCoRe: arxiv.org/abs/2410.18860
MMLU-Redux: arxiv.org/abs/2406.04127

18.11.2024 13:48 β€” πŸ‘ 12    πŸ” 7    πŸ’¬ 0    πŸ“Œ 0

dm-ed you!

20.11.2024 00:43 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Added! Thanks!!

18.11.2024 11:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I made a starter pack with the people doing something related to Neurosymbolic AI that I could find.

Let me know if I missed you!
go.bsky.app/RMJ8q3i

11.11.2024 15:27 β€” πŸ‘ 91    πŸ” 36    πŸ’¬ 16    πŸ“Œ 2

Hi I would love to be added as well!!

18.11.2024 09:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Hi, I would love to be added as well!

18.11.2024 09:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hi, I’d love to be added as well!

18.11.2024 09:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hi, I’d love to be added, thanks!!!

18.11.2024 08:50 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@jong21 is following 20 prominent accounts