Mark Dredze @mdredze - Bluesky Profile

Mark Dredze named director of Johns Hopkins Data Science and AI Institute Dredze, a member of JHU’s faculty since 2009, has been selected to lead the university institute dedicated to harnessing the power of AI to translate data-driven discovery into real-world impact.

Congratulations again to John C. Malone Professor of Computer Science @mdredze.bsky.social on this accomplishment!

17.10.2025 17:10 — 👍 7 🔁 2 💬 1 📌 0

Headshots of Mark Dredze, Jason Eisner, Peter Kazanzides, and Tom Lippincott.

Congratulations to CS faculty @mdredze.bsky.social, Jason Eisner, Peter Kazanzides, and @tom-lippincott.bsky.social
on their @jhu.edu Nexus Awards! Learn more about their funded projects here: www.cs.jhu.edu/news/compute...

09.09.2025 16:23 — 👍 5 🔁 1 💬 0 📌 0

🚨 You are only evaluating a slice of your test-time scaling model's performance! 🚨

📈 We consider how models’ confidence in their answers changes as test-time compute increases. Reasoning longer helps models answer more confidently!

📝: arxiv.org/abs/2502.13962

20.02.2025 15:14 — 👍 12 🔁 10 💬 1 📌 1

David Broniatowski on LinkedIn: NIH has just announced that it will save $4 billion by capping university… | 61 comments NIH has just announced that it will save $4 billion by capping university indirect costs on federal grants. Will this actually save money? Bottom line: No. It… | 61 comments on LinkedIn

Please read and share this excellent FAQ on University indirect costs by my friend @broniatowski.bsky.social

He explains why these funds are essential and a critical investment for research in the United States.

www.linkedin.com/posts/david-...

19.02.2025 18:02 — 👍 4 🔁 3 💬 0 📌 0

I know I can improve my ARR reviews, but there really is no need for name calling. 😁

05.02.2025 14:13 — 👍 19 🔁 0 💬 1 📌 0

Helpful
Insightful
Probing
Valuable
Thoughtful
Illuminating
Constructive

In author feedback, these are synonyms for "we hate your review."

30.01.2025 07:24 — 👍 1 🔁 0 💬 0 📌 0

Do reviewers purposely write confusing reviews with typos to demonstrate that the review wasn't written by a LLM?

27.01.2025 23:42 — 👍 2 🔁 0 💬 0 📌 0

Golden idea for an NLP paper: a group of llamas is called a "cria herd".

That would make a great name for a LLM method, model, or paper.

Just remember to acknowledge me in your paper.

You're welcome.

24.01.2025 21:39 — 👍 9 🔁 0 💬 0 📌 0

Idea for GenAI app: rewrite click bait headlines to normal headlines in the browser.

Input: you’ll never guess this one company organizing the best deals of the year

Output: Amazon has a modest sale on phone chargers

21.01.2025 19:41 — 👍 23 🔁 2 💬 2 📌 0

Good idea!

20.01.2025 19:10 — 👍 0 🔁 0 💬 0 📌 0

The ARR submission checklist is already pretty extensive, but I suggest we add an additional question:

"I certify that I know the difference between \citet and \citep."

20.01.2025 14:49 — 👍 22 🔁 1 💬 1 📌 1

ARR: Reviews are due today.

Me:

20.01.2025 13:29 — 👍 1 🔁 0 💬 0 📌 0

I feel seen. This is why I always access my API keys from my laptop.

17.01.2025 19:50 — 👍 10 🔁 1 💬 0 📌 1

Do you have any of those fortune cookies that mock academics?

Sure!

14.01.2025 22:19 — 👍 6 🔁 0 💬 1 📌 0

Starting a new year and reflecting on how lucky I am to work at @hopkinsengineer.bsky.social with amazing people @jhucompsci.bsky.social @jhuclsp.bsky.social.

I was promoted to full professor in 2023, and my students presented me with this amazing poster of current and former PhD students.

02.01.2025 17:40 — 👍 12 🔁 2 💬 0 📌 0

AI Ethics and Safety — A Contradiction in Terms? Podcast Episode · On with Kara Swisher · 01/02/2025 · 53m

Listen to @karaswisher.bsky.social's new podcast where she interviews @ruchowdh.bsky.social, @ghadfield.bsky.social and me about AI Ethics and Safety. The podcast was recorded before a live audience at @jhu.edu Bloomberg Center.

podcasts.apple.com/us/podcast/a...

02.01.2025 17:38 — 👍 2 🔁 0 💬 0 📌 0

Examining the generated QA pairs, you can really see the difference. Our generations (bottom) look harder and more interesting.

Try our strategy for your synthetic generation task? Check out our paper, being presented at #ML4H2024 .
arxiv.org/abs/2412.04573

22.12.2024 16:01 — 👍 2 🔁 0 💬 0 📌 0

Training a Clinical QA system on our data gives big improvements, whether we generate data from Llama or GPT-4o. These improvements are both in F1 and any overlap between the extracted and true answers.

22.12.2024 16:01 — 👍 1 🔁 0 💬 1 📌 0

The generated pair has a lot of advantages: it doesn't use the same language as the report, it includes harder questions, and the answers are sometimes not in the report (unanswerable questions.) The result? Harder, more diverse and more realistic QA pairs.

22.12.2024 16:01 — 👍 1 🔁 0 💬 1 📌 0

Second, we use a summarize-then-generate strategy. The LLM first summarizes a given clinical record in a structured format. The summary keeps the key points but loses the details, such as specific terminology and content. We then use the summary to generate a new QA pair.

22.12.2024 16:01 — 👍 0 🔁 0 💬 1 📌 0

We explore two strategies. First, we craft instructions to encourage QA diversity. We formulate these as constraints on the answers to the questions. It helps, but we need more.

22.12.2024 16:01 — 👍 0 🔁 0 💬 1 📌 0

We can ask an LLM to write QA pairs, but they turn out to be too easy and repetitive. They don't come close to what you can get with real data. We need more diverse data! Typical methods (e.g. annealing) don't work. What can we do?

22.12.2024 16:01 — 👍 0 🔁 0 💬 1 📌 0

Paper at #ML42024!

Clinical QA can help doctors find critical information in patient records. But where do we get training data for these systems? Generating this data from an LLM is hard. 🧵

22.12.2024 16:01 — 👍 3 🔁 0 💬 1 📌 0

Are Clinical T5 Models Better for Clinical Text? Large language models with a transformer-based encoder/decoder architecture, such as T5, have become standard platforms for supervised tasks. To bring these technologies to the clinical domain, recent...

Takeaways: If you can fine-tune a model on a specific clinical domain, that's great. If you can't, you should probably use models that are better overall, even if they aren't trained on clinical data.

Many more details in the paper!
arxiv.org/abs/2412.05845

22.12.2024 15:58 — 👍 3 🔁 1 💬 0 📌 0

It turns out that when you have just a little supervised data, the models trained on more data and tasks, even when out of domain, do BETTER on the new clinical domain.

22.12.2024 15:58 — 👍 0 🔁 0 💬 1 📌 0

Maybe the real advantage for domain-tuned models lies in the low resource setting. With lots of supervised data, an out of domain model can do well. What about with just a few training examples?

22.12.2024 15:58 — 👍 0 🔁 0 💬 1 📌 0

We try a new clinical task and dataset/domain. In this case, the clinical T5 benefits disappear.

22.12.2024 15:58 — 👍 0 🔁 0 💬 1 📌 0

Comparing 2 clinical with 3 general models on 6 clinical datasets, we find that some clinical models improve. However, these clinical test sets come from the same domain as the clinical training data. Maybe the clinical models are better on THIS clinical data, but not in general?

22.12.2024 15:58 — 👍 0 🔁 0 💬 1 📌 0

T5 models are the workhorse of many clinical text applications (e.g. information extraction.) Several clinical T5 models have been trained using clinical data to improve performance on these tasks. Do these models work better than general T5 models?

22.12.2024 15:58 — 👍 0 🔁 0 💬 1 📌 0

Are Clinical T5 Models Better for Clinical Text? That's the question we asked in our #ML4H2024 paper.

Turns out clinical models may not be worth it. 🧵

arxiv.org/abs/2412.05845

22.12.2024 15:58 — 👍 3 🔁 0 💬 1 📌 0

Mark Dredze

Latest posts by mdredze.bsky.social on Bluesky

@mdredze is following 20 prominent accounts