Ai2's Avatar

Ai2

@ai2.bsky.social

Breakthrough AI to solve the world's biggest problems. › Join us: http://allenai.org/careers › Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxm

3,518 Followers  |  108 Following  |  320 Posts  |  Joined: 12.10.2023  |  2.256

Latest posts by ai2.bsky.social on Bluesky

Preview
GitHub - allenai/olmocr: Toolkit for linearizing PDFs for LLM datasets/training Toolkit for linearizing PDFs for LLM datasets/training - allenai/olmocr

💻 Ready to try it? Dive into the repo & docs: buff.ly/bwd8Xfh
💬 Chat about it in our Discord: buff.ly/DUlWmg3

01.08.2025 16:02 — 👍 1    🔁 1    💬 0    📌 0

⚡ Speed boost: up to 3,400 tokens/sec on a single GPU, powered by native FP8 compression and a smarter prompting ↔ retry loop.
🛠️ Reproducibility built‑in: brand‑new trainer code lets you recreate our checkpoints or fine‑tune your own models with just a few commands.

01.08.2025 16:02 — 👍 2    🔁 0    💬 1    📌 0
Post image

📊 Accuracy upgrade: +3 pts on the public olmOCR‑Bench means cleaner, more reliable text from your noisiest PDFs.

01.08.2025 16:02 — 👍 2    🔁 0    💬 1    📌 0
Post image

📝 olmOCR v0.2.1 has arrived with new models! Our open‑source OCR engine now reads tougher docs with greater precision—and it’s still 100 % open. 👇

01.08.2025 16:02 — 👍 14    🔁 4    💬 1    📌 3
Why Ai2? | Nathan Lambert & Kyle Wiggers
YouTube video by Ai2 Why Ai2? | Nathan Lambert & Kyle Wiggers

www.youtube.com/watch?v=T15z...

31.07.2025 20:32 — 👍 0    🔁 0    💬 0    📌 0
Video thumbnail

We’re committed to cementing Ai2’s role as the trusted non-profit voice in the AI landscape by prioritizing community engagement and championing transparency.

31.07.2025 20:32 — 👍 1    🔁 0    💬 1    📌 0
Video thumbnail

📏 Benchmarks matter.

At Ai2, we’re driving meaningful change by creating robust evaluation tools. These help others cut through hype and focus on what truly enables breakthroughs. 💡

31.07.2025 20:32 — 👍 0    🔁 0    💬 1    📌 0
Video thumbnail

🔬 What sets Ai2 apart: We’re #1 on the @hf.co heatmap for a reason—we’re continually releasing new research to advance the field.

31.07.2025 20:32 — 👍 0    🔁 0    💬 1    📌 0
Video thumbnail

New interview drop! ⚠️ Ai2 Senior Research Scientist @natolambert.bsky.social sat down with @kylelwiggers.bsky.social, Ai2’s new Comms Lead, to talk about what sets Ai2 apart from other frontier AI labs, and our role in the research community. Watch the highlights. 👇

31.07.2025 20:32 — 👍 7    🔁 2    💬 1    📌 1
Preview
Honduras Strengthens Conservation with EarthRanger Across Protected Areas Honduras is deploying EarthRanger across 75 protected areas to improve wildlife monitoring, support rangers, and advance its zero-deforestation strategy.

📝 Learn more here: buff.ly/yocTk60

31.07.2025 17:30 — 👍 1    🔁 0    💬 0    📌 0
Post image

🌐 EarthRanger is used by hundreds of teams globally in Latin America—Honduras joins Paraguay, Panama & Mexico in using the platform nationwide. By supercharging Honduras’ work with real‑time intel, Ai2 supports efforts to safeguard natural resources, today and for generations.

31.07.2025 17:30 — 👍 1    🔁 0    💬 1    📌 0
Post image

It’s worth underscoring: this rollout is also about protecting people. ❤️

With EarthRanger, conservationists can now track the movements of their teams, share locations, and flag threats—adding a layer of safety for those on the frontlines of conservation.

31.07.2025 17:30 — 👍 0    🔁 0    💬 1    📌 0
Post image

As EarthRanger rolls out across the country, teams are spotting critical patterns like a rise in snake encounters near communities. As habitats shrink, snakes are moving closer to people. Now, teams have the data to raise awareness and reduce risk where it matters most.

31.07.2025 17:30 — 👍 0    🔁 0    💬 1    📌 0
Post image

The rollout spans 75 land & marine protected areas and backs Honduras’ bold “Zero Deforestation by 2029” pledge—giving conservationists instant views of where their wildlife are, where threats are, and where to act in places such as Puca.

31.07.2025 17:30 — 👍 0    🔁 0    💬 1    📌 0
Post image

🌎🛡️ On #WorldRangerDay, we’re proud to share that Honduras is expanding the use of @EarthRangerTech – our real‑time wildlife‑protection tech platform – to advance zero‑deforestation and safeguard biodiversity. 🧵

31.07.2025 17:30 — 👍 2    🔁 0    💬 1    📌 0
Post image Post image

Ai2 is excited to be at #ACL2025 in Vienna, Austria this week. Come say hello, meet the team, and chat about the future of NLP. See you there! 🤝📚

28.07.2025 17:00 — 👍 9    🔁 3    💬 0    📌 0

➡️ Link: buff.ly/9GnUFu1

23.07.2025 19:30 — 👍 0    🔁 0    💬 0    📌 0
Post image

🚀 Ai2’s official subreddit has arrived. Discuss our latest work and happenings in the AI community, from LLMs to bleeding-edge scientific work. We look forward to seeing you there. 👋

23.07.2025 19:30 — 👍 6    🔁 0    💬 1    📌 0

issues w preference LM benchmarks:

🐡data contains cases where the "bad" response is just as good as chosen one
🐟model rankings can feel off (claude ranks lower than expected)

led by @cmalaviya.bsky.social, we study underspecified queries & detrimental effect on model evals; accepted to TACL 2025

22.07.2025 17:02 — 👍 14    🔁 4    💬 2    📌 0

Takeaway: evaluations need context to reflect real‑world use and to ensure models serve all users.

📚 Read more in our blog: buff.ly/tFGX95o
💻 Get the code: buff.ly/QACrWjr
📊 Download the data: buff.ly/h4C28PV

22.07.2025 15:03 — 👍 3    🔁 0    💬 0    📌 0
Post image

For example, we found that default model answers often align better with users from Western, higher‑income backgrounds—an equity gap that context‑free testing missed. 🌍⚠️

22.07.2025 15:03 — 👍 1    🔁 3    💬 1    📌 0
Post image

Our fix: contextualized evaluation. Supplying the missing info…
1️⃣ Boosts evaluator agreement
2️⃣ Sometimes completely flips which model “wins”
3️⃣ Leads to more judgments based on content, not style
4️⃣ Exposes biases in default model responses.

22.07.2025 15:03 — 👍 1    🔁 0    💬 1    📌 0
Post image

We analyzed 3,580 queries randomly sampled from popular language model benchmarks, including Chatbot Arena. We found that underspecification is widely prevalent:
➡️ The vast majority of queries are open-ended (76%)
➡️ Many are also subjective (19%) or incomplete (18%)

22.07.2025 15:03 — 👍 0    🔁 0    💬 1    📌 0

When evaluators get these “underspecified” prompts, they have to guess the backstory. The result? Unstable rankings and shaky conclusions about model quality. ⚠️

22.07.2025 15:03 — 👍 0    🔁 1    💬 1    📌 0
Post image

An LLM prompt like “Is coffee good for you?” feels simple, but a helpful answer depends on who’s asking (e.g., someone who’s pregnant versus a person with high blood pressure). Most benchmarks leave that context out.

22.07.2025 15:03 — 👍 2    🔁 0    💬 1    📌 0
Post image

In our new paper, “Contextualized Evaluations: Judging Language Model Responses to Underspecified Queries,” we find that adding just a bit of missing context can reorder model leaderboards—and surface hidden biases. 🧵👇

22.07.2025 15:03 — 👍 12    🔁 3    💬 1    📌 3

AutoDS shows how AI can turbo‑charge discovery. 🚀

📚 Read more in the blog: buff.ly/O2SXfHv
📝 Check out the paper: buff.ly/jlBxuZJ
💻 Try AutoDS for yourself: buff.ly/fOl9pJG

18.07.2025 16:32 — 👍 2    🔁 0    💬 0    📌 0

Evaluated across 21 real-world datasets, AutoDS outperformed competitors by 5-29% at finding discoveries that are surprising to an LLM. In a human study that involved more than 500 hypotheses, 67% of the discoveries made by AutoDS were also surprising to the experts. 📊

18.07.2025 16:32 — 👍 1    🔁 0    💬 1    📌 0
Post image

Like a tireless researcher, AutoDS spins up its own hypotheses, runs the stats, learns from the outcomes, and then repeats. 🔄 The system can use the results of statistical experiments it generates and conducts to propose new hypotheses.🧑🔬💡

18.07.2025 16:32 — 👍 1    🔁 0    💬 1    📌 0
Post image

Great science starts with great questions. 🤔✨ Meet AutoDS—an AI that doesn’t just hunt for answers, it decides which questions are worth asking. 🧵

18.07.2025 16:32 — 👍 7    🔁 0    💬 1    📌 0

@ai2 is following 18 prominent accounts