Marek Suppa mrshu - Bluesky Statics

A great deal if you can get it indeed!

19.05.2025 15:06 — 👍 0 🔁 0 💬 1 📌 0

How?

19.05.2025 14:38 — 👍 0 🔁 0 💬 1 📌 0

Could this be “Reconstructing Training Data from Trained
Neural Networks”?

giladude1.github.io/reconstructi...

proceedings.neurips.cc/paper_files/...

07.03.2025 16:20 — 👍 1 🔁 0 💬 0 📌 0

Zulip — organized team chat Zulip is an organized team chat app for distributed teams of all sizes.

Am open-source alternative to (say) Slack or Discord: zulip.com

06.03.2025 07:20 — 👍 1 🔁 0 💬 0 📌 0

Pomodoro Technique - Wikipedia

It's a time management method

en.m.wikipedia.org/wiki/Pomodor...

11.01.2025 16:40 — 👍 2 🔁 0 💬 0 📌 0

Paper: openreview.net/pdf/a9c812c0...

Code: github.com/Flossiee/Hon...

06.12.2024 21:07 — 👍 0 🔁 0 💬 0 📌 0

𝗛𝗼𝗻𝗲𝘀𝘁𝗟𝗟𝗠

- Introduces 𝙃𝙊𝙉𝙀𝙎𝙀𝙏, a dataset with 930 queries in six categories to evaluate LLM honesty

- Proposes curiosity-driven prompting and two-stage fine-tuning for improving honesty and helpfulness

- Demonstrates up to 124.7% honesty and helpfulness improvement in models like Mistral-7b

06.12.2024 21:06 — 👍 0 🔁 0 💬 1 📌 0

Paper: openreview.net/pdf?id=IRXyP...

05.12.2024 22:30 — 👍 0 🔁 0 💬 0 📌 0

Figure 1: Fine-grained feedback from multimodal large language model help to yield more human-preferred images. Left: Output generated by the baseline text-to-image generative model. Right: Output generated by the baseline model optimized with fine-grained feedback from multimodal large language model. We illustrate improvements in generation quality across four aspects: PromptFollowing, Aesthetic, Fidelity and Harmlessness. See in Appendix for more visualization examples.

Multimodal Large Language Models Make Text-to-Image Generative Models Align Better

- VisionPrefer datset captures diverse preferences (prompt-following, aesthetic, fidelity, harmlessness) using multimodal LLMs

- VP-Score model matches human accuracy in preference prediction, guiding model tuning

05.12.2024 22:28 — 👍 0 🔁 0 💬 1 📌 0

Yeah, it would certainly be awesome to benchmark this empirically 🙂

29.11.2024 08:02 — 👍 0 🔁 0 💬 0 📌 0

Large language models for aspect-based sentiment analysis Large language models (LLMs) offer unprecedented text completion capabilities. As general models, they can fulfill a wide range of roles, including those of more specialized models. We assess the perf...

It seems to be model dependent -- see for instance the GPT-3.5-Turbo vs. GPT-4 differences in here:

ar5iv.labs.arxiv.org/html/2310.18...

28.11.2024 19:16 — 👍 1 🔁 0 💬 1 📌 0

The Super Weight in Large Language Models Recent works have shown a surprising result: a small fraction of Large Language Model (LLM) parameter outliers are disproportionately important to the quality of the model. LLMs contain billions of pa...

The Super Weight in Large Language Models

Setting as few as a single weight to zero will make various LLMs go from generating coherent text to outputting gibberish.

arxiv.org/abs/2411.07191

28.11.2024 09:15 — 👍 12 🔁 3 💬 2 📌 0

It unfortunately doesn't work that well with short (<200 tokens) responses.

www.nature.com/articles/s41...

24.11.2024 09:47 — 👍 1 🔁 0 💬 0 📌 0

Does the TULU paper count?

arxiv.org/abs/2306.04751

23.11.2024 21:38 — 👍 0 🔁 0 💬 0 📌 0

OpenReview

It doesn't need much more than a bit of gamification StackOverflow-style. Getting a bunch of badges for great reviews would go a long way.

Much of it seems to be low-hanging-fruit. E.g. my reviews were marked "Excellent" in the past but you cannot find it in my OpenReview.net profile.

23.11.2024 19:35 — 👍 1 🔁 0 💬 1 📌 0

Lobe Chat (github.com/lobehub/lobe...) + Ollama is a solid option

15.11.2024 16:09 — 👍 2 🔁 0 💬 0 📌 0

Posts by Marek Suppa (@mrshu.bsky.social)