Michael Khoo's Avatar

Michael Khoo

@michaelkhoo.bsky.social

Sky chaser, earth lover, capitalism constrainer, agnotological luddist, counter-enshittification hobbyist, playlist junkie, piano player. Co-founder, UpShift Strategies. Co-chair, Climate Action Against Disinformation.

1,005 Followers  |  572 Following  |  24 Posts  |  Joined: 10.05.2023  |  2.0264

Latest posts by michaelkhoo.bsky.social on Bluesky


Post image

Big Tech never met a climate claim it couldnโ€™t greenwash.

While AI-driven electricity demand throws the fossil fuel industry a lifeline, weโ€™re told AI can help solve the climate crisis.

The reality? 74% of claims about AI climate benefits are unproven.
bit.ly/AIGreenwash

18.02.2026 10:00 โ€” ๐Ÿ‘ 4    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
AI data centre surge would put UKโ€™s climate change targets at risk MPs call for โ€˜national conversationโ€™ on potential drawbacks of sharp rise in data centres, which would use more power than the UK uses at its peak

โšก๏ธ The data centres needed to power the govt's AI revolution would use more electricity than the UK consumes at peak.

According to Ofgem, around 140 sites are seeking grid connections totalling 50GW (vs 45GW today), a surge MPs warn could put Britainโ€™s climate targets at risk.
buff.ly/Q2N6pbC

23.02.2026 11:57 โ€” ๐Ÿ‘ 2    ๐Ÿ” 4    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The new AI Climate Hoax, brought to you by the same silicon valley people who said social media would bring people together. 74% of these greenwashing claims are unproven.

Very proud to be part of this project led by @ketanjoshi.co with @foeus.bsky.social @caadcoalition.bsky.social

17.02.2026 17:54 โ€” ๐Ÿ‘ 17    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Claims that AI can help fix climate dismissed as greenwashing Industry using โ€˜diversionaryโ€™ tactics, says analyst, as energy-hungry complex functions such as video generation and deep research proliferate

๐Ÿ’กClaims that AI can help fix climate have been dismissed as greenwashing in a new report by @ketanjoshi.co @stand.earth @beyondfossilfuels.bsky.social @foeus.bsky.social

via @theguardian.com www.theguardian.com/technology/2...

17.02.2026 15:27 โ€” ๐Ÿ‘ 17    ๐Ÿ” 14    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

This is fantastic, and it's something that we've been saying for years.

When companies say that are using "AI for climate change", they are referring to much smaller models used for doing things like climate modeling.

Massive generative AI models are NOT USEFUL for mitigating climate change.

17.02.2026 12:07 โ€” ๐Ÿ‘ 141    ๐Ÿ” 50    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1
Post image

AI data centers are throwing the fossil fuel industry a lifeline.

"But AI will solve climate change, right?" Wrong.

@stand.earth's new report authored by independent researcher @ketanjoshi.co clearly lays out Big Tech's AI climate hoax. See for yourself here: stand.earth/resources/ai...

17.02.2026 16:01 โ€” ๐Ÿ‘ 39    ๐Ÿ” 14    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

There's a parliamentary COUP taking place in Brazil right now, aided by Meta โ€“ who is shadow banning left wing Instagram profiles, including president Lula's.

10.12.2025 11:48 โ€” ๐Ÿ‘ 79    ๐Ÿ” 34    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 3

I absolutely agree it's easily absorbed. But it was nearly their max and good to see the law is going to be used. I measure its effectiveness in Vance's pearl-clutching.

05.12.2025 17:38 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The 2nd best part of this decision, is that it shows the political/media pundits are full of sh*t in predicting the EU would fall to JD Vance and Peter Thiel's penis waving. Also, can we pls stop debunking the fascists' framing "censrhip"?
This is simply justice served and basic accountability.

05.12.2025 17:32 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
CLINB: A Climate Intelligence Benchmark for
Foundational Models
Michelle Chen Huebscher1
, Katharine Mach2
, Aleksandar Staniฤ‡1
, Markus Leippold1,3, Ben Gaiarin1
, Zeke
Hausfather4
, Elisa Rawat , Erich Fischer5
, Massimiliano Ciaramita1
, Joeri Rogelj6
, Christian Buck1
, Lierni
Sestorain Saralegui1 and Reto Knutti5
1Google DeepMind, 2University of Miami, 3University of Zurich, 4Stripe, 5ETH Zurich, 6
Imperial College London
Evaluating how Large Language Models (LLMs) handle complex, specialized knowledge remains a
critical challenge. We address this through the lens of climate change by introducing CLINB, a benchmark that assesses models on open-ended, grounded, multimodal question answering tasks with clear
requirements for knowledge quality and evidential support. CLINB relies on a dataset of real usersโ€™
questions and evaluation rubrics curated by leading climate scientists. We implement and validate a
model-based evaluation process and evaluate several frontier models. Our findings reveal a critical
dichotomy. Frontier models demonstrate remarkable knowledge synthesis capabilities, often exhibiting PhD-level understanding and presentation quality. They outperform โ€œhybrid" answers curated
by domain experts assisted by weaker models. However, this performance is countered by failures
in grounding. The quality of evidence varies, with substantial hallucination rates for references and
images. We argue that bridging this gap between knowledge synthesis and verifiable attribution is
essential for the deployment of AI in scientific workflows and that reliable, interpretable benchmarks
like CLINB are needed to progress towards building trustworthy AI systems.

CLINB: A Climate Intelligence Benchmark for Foundational Models Michelle Chen Huebscher1 , Katharine Mach2 , Aleksandar Staniฤ‡1 , Markus Leippold1,3, Ben Gaiarin1 , Zeke Hausfather4 , Elisa Rawat , Erich Fischer5 , Massimiliano Ciaramita1 , Joeri Rogelj6 , Christian Buck1 , Lierni Sestorain Saralegui1 and Reto Knutti5 1Google DeepMind, 2University of Miami, 3University of Zurich, 4Stripe, 5ETH Zurich, 6 Imperial College London Evaluating how Large Language Models (LLMs) handle complex, specialized knowledge remains a critical challenge. We address this through the lens of climate change by introducing CLINB, a benchmark that assesses models on open-ended, grounded, multimodal question answering tasks with clear requirements for knowledge quality and evidential support. CLINB relies on a dataset of real usersโ€™ questions and evaluation rubrics curated by leading climate scientists. We implement and validate a model-based evaluation process and evaluate several frontier models. Our findings reveal a critical dichotomy. Frontier models demonstrate remarkable knowledge synthesis capabilities, often exhibiting PhD-level understanding and presentation quality. They outperform โ€œhybrid" answers curated by domain experts assisted by weaker models. However, this performance is countered by failures in grounding. The quality of evidence varies, with substantial hallucination rates for references and images. We argue that bridging this gap between knowledge synthesis and verifiable attribution is essential for the deployment of AI in scientific workflows and that reliable, interpretable benchmarks like CLINB are needed to progress towards building trustworthy AI systems.


Total Reference URLs Generated
claude-opus-4-1
claude-sonnet-4
gpt-5
hybrid
gemini-2.5-pro
gemini-2.5-flash
o3
0.0
0.2
0.4
0.6
0.8
1.0
Proportion
Reference URL Status
hybrid
gemini-2.5-pro
claude-opus-4-1
o3
gemini-2.5-flash
claude-sonnet-4
0
200
400
600
800
1000
Count of URLs
Total Image URLs Generated
hybrid
claude-opus-4-1
gemini-2.5-flash
gemini-2.5-pro
claude-sonnet-4
o3
0.0
0.2
0.4
0.6
0.8
1.0
Proportion
Image URL Status
Status
OK
INACCESSIBLE_CONTENT
INVALID_URL
ERROR
Figure 3 | Number of reference (top), and image (bottom), URLs and their status.
Ablations We perform several ablation studies with the autorater (Table 4). Notably, removing
the question-specific rubrics from the prompt changes the results only in the bottom half, with the
Hybrid answers overtaken by Gemini 2.5 Flash and Claude Sonnet 4. This suggests that the additional
resolution provided by the rubrics applies primarily to the kind of responses used to develop the
rubrics. Or, in other words, that rubrics are far from complete. Hence, it is important that rubrics
adapt to new data as better models become availab

Total Reference URLs Generated claude-opus-4-1 claude-sonnet-4 gpt-5 hybrid gemini-2.5-pro gemini-2.5-flash o3 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Reference URL Status hybrid gemini-2.5-pro claude-opus-4-1 o3 gemini-2.5-flash claude-sonnet-4 0 200 400 600 800 1000 Count of URLs Total Image URLs Generated hybrid claude-opus-4-1 gemini-2.5-flash gemini-2.5-pro claude-sonnet-4 o3 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Image URL Status Status OK INACCESSIBLE_CONTENT INVALID_URL ERROR Figure 3 | Number of reference (top), and image (bottom), URLs and their status. Ablations We perform several ablation studies with the autorater (Table 4). Notably, removing the question-specific rubrics from the prompt changes the results only in the bottom half, with the Hybrid answers overtaken by Gemini 2.5 Flash and Claude Sonnet 4. This suggests that the additional resolution provided by the rubrics applies primarily to the kind of responses used to develop the rubrics. Or, in other words, that rubrics are far from complete. Hence, it is important that rubrics adapt to new data as better models become availab

A New Expert-Grounded Benchmark for Scientific AI We introduce CLINB, a benchmark for modelbased evaluation of frontier models on complex, multimodal scientific communication. Its core is a
new dataset of real-world climate questions paired with data-driven, question-specific evaluation rubrics,
curated and validated by leading climate scientists through a novel three-phase, human-in-the-loop
process.2
PhD-Level Synthesis vs. Attribution Failures Frontier models demonstrate remarkable knowledge
synthesis, often exhibiting a PhD-level understanding. However, this performance masks a critical
inadequacy in grounding. We report substantial hallucination rates for references (10% to 25%)
and even more failures for images (50% to 80% in certain settings), exposing a major gap between
synthesis and verifiable attribution.
Insights into Human-AI Collaboration Dynamics Autonomous frontier models surpass โ€™hybridโ€™
answers (curated by experts using weaker AI assistance), revealing the assisting modelโ€™s capabilityโ€”not
human oversightโ€”as the primary bottleneck. Counter-intuitively, highly motivated non-specialists
(our โ€™Advocatesโ€™) who deeply engage with AI tools can produce higher-quality answers than domain
experts who engage less with AI during answer curation.
A Validated Methodology for Scalable Oversight We validate a rigorous, rubric-based autorater.
Ablation studies demonstrate that structured prompts and automated evidence-checking are essential
for mitigating inherent LLM judge biases. This process is hampered by inaccessible sources (up to
50%). Furthermore, we identify evaluation challenges, including model familiarity bias in human
raters and the limitations of rubrics to generalize across models.

A New Expert-Grounded Benchmark for Scientific AI We introduce CLINB, a benchmark for modelbased evaluation of frontier models on complex, multimodal scientific communication. Its core is a new dataset of real-world climate questions paired with data-driven, question-specific evaluation rubrics, curated and validated by leading climate scientists through a novel three-phase, human-in-the-loop process.2 PhD-Level Synthesis vs. Attribution Failures Frontier models demonstrate remarkable knowledge synthesis, often exhibiting a PhD-level understanding. However, this performance masks a critical inadequacy in grounding. We report substantial hallucination rates for references (10% to 25%) and even more failures for images (50% to 80% in certain settings), exposing a major gap between synthesis and verifiable attribution. Insights into Human-AI Collaboration Dynamics Autonomous frontier models surpass โ€™hybridโ€™ answers (curated by experts using weaker AI assistance), revealing the assisting modelโ€™s capabilityโ€”not human oversightโ€”as the primary bottleneck. Counter-intuitively, highly motivated non-specialists (our โ€™Advocatesโ€™) who deeply engage with AI tools can produce higher-quality answers than domain experts who engage less with AI during answer curation. A Validated Methodology for Scalable Oversight We validate a rigorous, rubric-based autorater. Ablation studies demonstrate that structured prompts and automated evidence-checking are essential for mitigating inherent LLM judge biases. This process is hampered by inaccessible sources (up to 50%). Furthermore, we identify evaluation challenges, including model familiarity bias in human raters and the limitations of rubrics to generalize across models.

Deeply absurd. This Google PDF published on a blog (arxiv, not peer reviewed) claims an LLM is "PhD level" but in most cases the MAJORITY of reference URLs were invalid or inaccessible.

A PhD sitting down and just fabricating >50% of sources = career ending

arxiv.org/abs/2511.11597

24.11.2025 19:36 โ€” ๐Ÿ‘ 367    ๐Ÿ” 86    ๐Ÿ’ฌ 8    ๐Ÿ“Œ 6
Post image

Antรณnio Guterres: info integrity is vital at COP30:

โ€œWe cannot achieve climate action without information integrity. We must preserve both the information environment necessary for democratic decision-making and the global cooperation essential for addressing the climate crisis."

buff.ly/81zBNzH

18.11.2025 10:00 โ€” ๐Ÿ‘ 17    ๐Ÿ” 10    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 1
Preview
Countries commit to tackling climate disinformation at COP30 It is the first time states have formally committed to information integrity and fighting back against climate disinformation.

Good that Canada joins coalition to fight against climate disinformation (so maybe don't weaken greenwashing law?) www.euronews.com/green/2025/1...

13.11.2025 11:35 โ€” ๐Ÿ‘ 5583    ๐Ÿ” 1250    ๐Ÿ’ฌ 98    ๐Ÿ“Œ 40
Preview
At COP, disinformation is the next crisis to tackle If Canada steps up and joins co-signers like the U.K., France, and Spain, others will follow. Doing so would put the countryโ€™s resilience, strength, and democratic values on full display. This is howโ€ฆ

After years leading FEMA's external affairs through environmental disasters like Hurricane Helene, Justin รngel Knighten warns that tackling disinformation must be a priority.

buff.ly/fgoWizG

๐Ÿงต 1/2

11.11.2025 09:39 โ€” ๐Ÿ‘ 5    ๐Ÿ” 3    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿšจ Our new report, Deny, Deceive, Delay: Demystified, is out now. ๐Ÿšจ

The report explores how Big Carbon and Big Tech use disinformation to sabotage climate action and why, despite 89% of people worldwide demanding stronger action, progress gets derailed.

10.11.2025 11:30 โ€” ๐Ÿ‘ 8    ๐Ÿ” 12    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 2
Preview
Macron and Lula warn of the dangers of climate disinformation ahead of COP30 Headings

Macron and Lula warn of the dangers of climate disinformation ahead of COP30

"Climate disinformation today threatens our democracies, the Paris agenda and therefore our selective security"

07.11.2025 15:00 โ€” ๐Ÿ‘ 5    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

Kinda impressive how OpenAI has managed to occupy a seemingly quantum state between "imperial project", "non-profit research foundation", and "criminal enterprise".

06.11.2025 17:22 โ€” ๐Ÿ‘ 259    ๐Ÿ” 50    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 0

We will have Nuremberg trials over this, and their many other crimes.

06.11.2025 17:43 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Using a government jet to fly you to a girlfriends country music gig/wrestling match and then covering it up and rage tweeting about it on X is sort of a microcosm of Trumpism. thebulwark.com/p/kash-patel-fbi-director-private-jet-problem-nashville

06.11.2025 17:27 โ€” ๐Ÿ‘ 722    ๐Ÿ” 192    ๐Ÿ’ฌ 33    ๐Ÿ“Œ 9
Preview
Elon Musk is boosting the British right - and this shows how Elon Musk is boosting the British right - and this shows how

Vital piece of investigative reporting from Sky. They've uncovered the X algorithm which feeds users extremist right wing material from the moment they join the site. It is a far-right radicalisation engine, by design.

news.sky.com/story/the-x-...

06.11.2025 07:22 โ€” ๐Ÿ‘ 6360    ๐Ÿ” 3566    ๐Ÿ’ฌ 237    ๐Ÿ“Œ 454

this is a reminder that we dont have to settle for newsom in 2028

05.11.2025 02:44 โ€” ๐Ÿ‘ 18618    ๐Ÿ” 4827    ๐Ÿ’ฌ 223    ๐Ÿ“Œ 188

There are more of us than there are of them.

05.11.2025 02:31 โ€” ๐Ÿ‘ 32720    ๐Ÿ” 5202    ๐Ÿ’ฌ 442    ๐Ÿ“Œ 255

you love to see it

05.11.2025 11:18 โ€” ๐Ÿ‘ 536    ๐Ÿ” 74    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 1

Good things are possible and we donโ€™t have to settle.

05.11.2025 03:35 โ€” ๐Ÿ‘ 46830    ๐Ÿ” 8115    ๐Ÿ’ฌ 248    ๐Ÿ“Œ 218
Preview
World โ€˜very likelyโ€™ to exceed 1.5C climate goal in next decade: UN Despite Paris Agreement pledges, countries 'have landed off target' on climate goals multiple times, the UN warns.

Okay, yes, humanity did not enact the single best possible outcome in response to the single worst problem we have ever faced as a species

In no way was it wrong to try, and in no way is it wrong to continue trying to jam a wrench in the greedy fossil fuel economy. Everything is still on the table

05.11.2025 11:42 โ€” ๐Ÿ‘ 324    ๐Ÿ” 110    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 4

Maybe pundits should spend more time in densely packed, left leaning urban areas where the "real Americans" live

05.11.2025 02:51 โ€” ๐Ÿ‘ 4346    ๐Ÿ” 701    ๐Ÿ’ฌ 29    ๐Ÿ“Œ 17
Post image

โ€œIn this moment of political darkness, New York will be the light.โ€

Congratulations @zohrankmamdani.bsky.social

05.11.2025 05:19 โ€” ๐Ÿ‘ 1729    ๐Ÿ” 189    ๐Ÿ’ฌ 35    ๐Ÿ“Œ 8

But also, tomorrowโ€™s NYT headline: โ€œDid Dems win too much? Anonymous consultants fear they may find it hard to govern everything they just won.โ€

05.11.2025 05:32 โ€” ๐Ÿ‘ 103    ๐Ÿ” 13    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 0

Every race. Itโ€™s basically been every race.

Governors. Mayors. Long-held GOP dog-catchers. School boards. Water boards. Flipped a dungeon master in a rural Iowa D&D club. State senators. State reps. A janitor in Duluth. State justices. Three GOP Uber drivers.

Just everything.

05.11.2025 05:20 โ€” ๐Ÿ‘ 674    ๐Ÿ” 166    ๐Ÿ’ฌ 20    ๐Ÿ“Œ 21
Preview
Carney scraps anti-greenwashing law despite public call for climate truth Most Canadians want the federal government to do more about climate disinformation, especially during extreme weather, new research shows, even as the Carney government scraps Canada's laws against gr...

"Canada told the fossil fuel industry greenwashing was illegal, and its response was that having to be honest and transparent would makeย it too hard to do business. Apparently the threat worked," says Phil Newell of @caadcoalition.bsky.social www.nationalobserver.com/2025/11/04/n...

05.11.2025 00:23 โ€” ๐Ÿ‘ 27    ๐Ÿ” 24    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 1

@michaelkhoo is following 19 prominent accounts