Our paper on AI-powered spear phishing, co-authored with @fredheiding.bsky.social , has been accepted at the ICML 2025 Workshop on Reliable and Responsible Foundation Models!
openreview.net/pdf?id=f0uFp...
@simonlermen.bsky.social
I work on AI safety and AI in cybersecurity
Our paper on AI-powered spear phishing, co-authored with @fredheiding.bsky.social , has been accepted at the ICML 2025 Workshop on Reliable and Responsible Foundation Models!
openreview.net/pdf?id=f0uFp...
Do you think there is any comparable thing in China to AI Twitter or Bluesky? Where people discuss ideas
20.04.2025 10:51 β π 1 π 0 π¬ 1 π 0Are you working at DeepSeek?
20.04.2025 10:48 β π 1 π 0 π¬ 0 π 0Why so mean old man
28.02.2025 16:02 β π 0 π 0 π¬ 1 π 0Grok's DeepSearch was launched with Zero safety features, you can ask it about assasslnations, dru*gs. This has been online for a few days now with no changes.
25.02.2025 13:38 β π 2 π 0 π¬ 0 π 0Iβm mostly interested in not dying
23.01.2025 13:19 β π 2 π 0 π¬ 0 π 0If you are trying to understand its reasoning, it seems like a necessary step to have legible chain-of-thought.
22.01.2025 22:41 β π 2 π 0 π¬ 1 π 0you should be carefully here, huge datacenters with their own powerstructures are being discussed, huge new semiconductor facilities. situation might change
openai.com/global-affai...
To be fair, the pre-training and all those mega datacenters do have some significant environmental impact. buying products from AI labs does fund this. But agree that individual energy use per reply is like the weakest argument against AI.
15.01.2025 10:41 β π 1 π 0 π¬ 1 π 0I published a human study with @fredheiding.bsky.social
We use AI agents built from GPT-4o and Claude 3.5 Sonnet to search the web for available information on a target and use this for highly personalized phishing messages. achieved click-through rates above 50%
www.lesswrong.com/posts/GCHyDK...
Has anyone ever tried with constitutional AI to add something on: always show your entire reasoning? What happens if you ask the model if it left out steps in its reasoning? can it verbalize them?
02.01.2025 23:33 β π 0 π 0 π¬ 1 π 0They achieve this in part by immediately releasing models after training such as o3, other companies wait for safety and security evaluations and estimates of societal impact. They also used to wait with releases such as with GPT-4
28.12.2024 10:57 β π 1 π 0 π¬ 1 π 0sometimes fancy terms just serve to confuse people
27.12.2024 10:00 β π 3 π 0 π¬ 0 π 0They have already made billions in revenue, but defining it as profits makes it almost impossible to reach
27.12.2024 09:54 β π 0 π 1 π¬ 0 π 0crazy that they use profits instead of revenue. so they can always just hack this by spending a bit more on R&D
27.12.2024 09:53 β π 0 π 0 π¬ 1 π 0my guess is he thinks of some sort of conscious experience of wanting here...
27.12.2024 08:19 β π 1 π 0 π¬ 0 π 0its behavior is at if it wants to win, same will be true about powerful AI agents. whether it actually wants something in a way that satisfies you doesn't matter
26.12.2024 23:13 β π 0 π 0 π¬ 0 π 0So RL-training the model to achieve some goal such as with constitutional AI can't lead to the model having a goal? do you think AlphaZero wants to win at chess?
26.12.2024 19:25 β π 0 π 0 π¬ 1 π 0π―
22.12.2024 11:35 β π 1 π 0 π¬ 0 π 0Well, we observe computation in superposition
16.12.2024 21:26 β π 0 π 0 π¬ 1 π 0I agree that it doesn't PROVE multiverses. But I don't like the sneering tone, what is superposition? It sure seems like the electron is in many places at once, all interpretations of that seem a bit crazy. Everett's manyworlds is a common position among physicists, including some i know.
16.12.2024 21:15 β π 0 π 0 π¬ 2 π 0The many worlds interpretation is a commonly held view by many physicists. And it is not like other interpretations are less "weird".
16.12.2024 13:49 β π 0 π 0 π¬ 0 π 0The many worlds interpretation is a commonly held view by many physicists. And it is not like other interpretations are less "weird"
16.12.2024 13:47 β π 0 π 0 π¬ 2 π 0I don't understand why we don't have more conferences in countries with easy visa policies
14.12.2024 16:33 β π 1 π 0 π¬ 1 π 0I'll be at the SafeGenAI workshop on Sunday presenting on research I did on safety in AI agents.
I will talk about results from these two blog posts:
www.lesswrong.com/posts/ZoFxTq...
And:
www.lesswrong.com/posts/Lgq2Dc...
I'm very bullish on automated research engineering soon, but even I was surprised that AI agents are twice as good as humans with 5+ years of experience or from a top AGI or safety lab at doing tasks in 2 hours. Paper: metr.org/AI_R_D_Evalu...
22.11.2024 22:21 β π 8 π 1 π¬ 1 π 0