michael ginn @mginn - Bluesky Profile

Is linguistically-motivated data augmentation worth it? Data augmentation, a widely-employed technique for addressing data scarcity, involves generating synthetic data examples which are then used to augment available training data. Researchers have seen s...

Find out in our new paper, which my colleague Ray Groshan will present at ACL!!

arxiv.org/abs/2506.03593

05.06.2025 06:53 — 👍 2 🔁 0 💬 0 📌 0

“Linguistically-motivated” techniques for data augmentation sounds good on paper, but is it worth the cost?

05.06.2025 06:53 — 👍 0 🔁 0 💬 1 📌 0

Measuring Contextual Informativeness in Child-Directed Text To address an important gap in creating children's stories for vocabulary enrichment, we investigate the automatic evaluation of how well stories convey the semantics of target vocabulary words, a tas...

Excited to be presenting my work with @teaywright.bsky.social at #COLING2025 next week in Abu Dhabi! Find us in poster session 6/E on Jan 22nd (11 AM in the atrium).

Paper: arxiv.org/abs/2412.17427

16.01.2025 23:16 — 👍 10 🔁 3 💬 1 📌 0

14.01.2025 20:41 — 👍 1 🔁 0 💬 0 📌 0

Well isn’t the idea that the entire layer defines a high dimensional space, where each neuron is a dimension?

14.01.2025 19:37 — 👍 0 🔁 0 💬 1 📌 0

There’s no conspiracy to make tech products worse by AI in things, AI is just very immediately and clearly productivity enhancing to the people making tech products in a way that it isn’t necessarily to the people using them.

13.01.2025 01:52 — 👍 110 🔁 7 💬 9 📌 0

Randomly stumbled on an arxiv paper where im pretty sure the listed affiliations are false, what would you even get out of that?

08.01.2025 08:02 — 👍 0 🔁 0 💬 0 📌 0

Been reading a lot of old-school finite-state automata papers for a project

It is so refreshing to read an interesting, colorfully-written paper that isn’t hyperoptimized for reviewer preferences

08.01.2025 07:56 — 👍 1 🔁 0 💬 0 📌 0

Probably doesn’t help that there is effectively an online cult promoting all of this

08.01.2025 07:55 — 👍 2 🔁 0 💬 0 📌 0

I have very mixed feelings on the current era in tech—I started a PhD because I thought LLMs were pretty cool, but I absolutely cannot stand the disingenuous hype, insane competitiveness, and slop features that have since come with them

08.01.2025 07:51 — 👍 8 🔁 0 💬 2 📌 0

It’s interesting how they describe patches of bytes that are determined by changes in entropy, without making any reference to morphology…Zellig Harris did basically the same thing 50 years ago

13.12.2024 22:03 — 👍 1 🔁 0 💬 0 📌 0

Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation The data and compute requirements of current language modeling technology pose challenges for the processing and analysis of low-resource languages. Declarative linguistic knowledge has the potential ...

Can RAG+LLM systems help boost small models for rare languages?

Find out in “Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation” by Bhargav Shandilya and @alexispalmer.bsky.social

arxiv.org/abs/2410.00387

02.12.2024 18:43 — 👍 1 🔁 1 💬 1 📌 0

Also shout-out to the morphology reviewers!

27.11.2024 21:35 — 👍 1 🔁 0 💬 0 📌 0

Adding my love letter to

arxiv.org/pdf/2304.01315

Empirical Design in Reinforcement Learning
by
Andrew Patterson, Samuel Neumann, Martha White, Adam White

JMLR 25 (2024) 1-63
#ReinforcementLearning

These aren’t the heroes we deserve, but they are the heroes we need.

23.11.2024 13:40 — 👍 211 🔁 47 💬 7 📌 6

I’m a big proponent of an accumulating reviewer score (complementary to h-index). I think people would absolutely care about optimizing it even with no concrete incentive.

24.11.2024 21:10 — 👍 6 🔁 0 💬 1 📌 0

I feel like reviewers often expect short papers to be long papers condensed into 4 pages. They should really be a venue to showcase focused and incremental work.

24.11.2024 21:04 — 👍 4 🔁 1 💬 0 📌 0

🙋‍♂️

23.11.2024 23:29 — 👍 1 🔁 0 💬 0 📌 0

Interested in ML open source? There’s a great list for you

23.11.2024 06:26 — 👍 3 🔁 0 💬 0 📌 0

Python typing is great until you want to use any package ever

23.11.2024 01:52 — 👍 0 🔁 0 💬 0 📌 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

23.11.2024 01:48 — 👍 1 🔁 0 💬 0 📌 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

23.11.2024 01:48 — 👍 2 🔁 0 💬 0 📌 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

23.11.2024 01:48 — 👍 1 🔁 0 💬 0 📌 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

23.11.2024 01:48 — 👍 1 🔁 0 💬 0 📌 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

23.11.2024 01:47 — 👍 1 🔁 0 💬 0 📌 0

bsky.app/profile/mgin...

23.11.2024 01:47 — 👍 0 🔁 0 💬 0 📌 0

bsky.app/profile/mgin...

23.11.2024 01:47 — 👍 1 🔁 0 💬 0 📌 0

Just got ICL surgery (implantable lenses) and on one hand, modern medicine is incredible, but on the other hand seeing your eye get sliced open is terrifying

22.11.2024 20:53 — 👍 0 🔁 0 💬 0 📌 0

GitHub - CUNY-CL/yoyodyne: Small-vocabulary sequence-to-sequence generation with optional feature conditioning Small-vocabulary sequence-to-sequence generation with optional feature conditioning - CUNY-CL/yoyodyne

@adamwiemerslage.bsky.social maintains github.com/CUNY-CL/yoyo...

22.11.2024 16:23 — 👍 3 🔁 0 💬 1 📌 0

Arxiv should have a comment section

22.11.2024 05:36 — 👍 18 🔁 0 💬 3 📌 0

Hi, unfortunately the pack is now full, however @datatherapist.bsky.social started a third one! go.bsky.app/CUuio7g

20.11.2024 22:59 — 👍 1 🔁 0 💬 1 📌 2

michael ginn

Latest posts by mginn.bsky.social on Bluesky

@mginn is following 19 prominent accounts