Kol Tregaskes @koltregaskes

- Features include messaging, streaming, and tool use.

This allows free, local use of models with Claude Code's capabilities.

x.com/lmstudio/st...

08.02.2026 18:31 — 👍 0 🔁 0 💬 0 📌 0

LM Studio now supports Claude Code for local model integration.

- LM Studio version 0.4.1 adds an Anthropic-compatible endpoint.
- Users can run local GGUF and MLX models privately via terminal or VS Code.
- Setup involves starting a server and setting environment variables.

08.02.2026 18:31 — 👍 0 🔁 0 💬 1 📌 0

Closing point: His core claim is that watching these coding agents work is a preview of near-future knowledge work - even if most people won’t touch the terminal-style UX yet.
x.com/emollick/st...

08.02.2026 15:34 — 👍 0 🔁 0 💬 0 📌 0

- He points to METR’s “time horizon” idea - tasks AI can complete with 50% reliability have been growing fast - and says these tools matter beyond coding, but today’s interfaces are still developer-first.

08.02.2026 15:34 — 👍 0 🔁 0 💬 1 📌 0

- He argues the real leap is “agentic harness” + stronger models: tools like compaction (summarising state when context fills), Skills (swap-in instructions/tooling), and subagents.

08.02.2026 15:34 — 👍 0 🔁 0 💬 1 📌 0

- He says Claude Code created hundreds of files, deployed a working website, and even set up a payment flow (he later disabled the sales link).

08.02.2026 15:34 — 👍 0 🔁 0 💬 1 📌 0

Claude Code takes one “make me $1,000/month” prompt and autonomously builds a working prompt-pack sales site after ~1 hour 14 mins of work.

- Ethan describes Claude Code interviewing him with 3 multiple-choice questions, then choosing a product: 500 “professional prompts” priced at $39.

08.02.2026 15:34 — 👍 0 🔁 0 💬 1 📌 0

- Evaluation uses LLM-as-judge for binary scoring on criteria.
- Perplexity's system leads in most domains and dimensions, with lowest latency.

The benchmark is open-sourced for independent verification and future expansions.

x.com/i/status/20...

08.02.2026 12:33 — 👍 0 🔁 0 💬 0 📌 0

DRACO benchmark evaluates deep research agents on 100 real-user tasks across 10 domains.

- Tasks derived from actual Perplexity queries, anonymised and refined by experts.
- Each task has rubrics with about 40 criteria covering accuracy, analysis depth, presentation, and citations.

08.02.2026 12:33 — 👍 0 🔁 0 💬 1 📌 0

x.com/AravSriniva...

08.02.2026 12:33 — 👍 0 🔁 0 💬 1 📌 0

- Introduces open-sourced DRACO benchmark with 100 tasks across 10 domains like finance and law. (see in comment below)
- Claims to outperform competitors on accuracy and reliability in key verticals.

This update positions Perplexity as a leader in AI-driven research for high-stakes decisions.

08.02.2026 12:33 — 👍 0 🔁 0 💬 2 📌 0

Perplexity AI rolls out Advanced Deep Research to Max users, scoring 79.5% on new DRACO benchmark.

- Powered by Opus 4.5 model and agentic tools for consistent performance.
- Available immediately to Max subscribers, with gradual rollout to Pro users.

08.02.2026 12:33 — 👍 2 🔁 0 💬 1 📌 0

The Two Best AI Models/Enemies Just Got Released Simultaneously

The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each o... The Two Best AI Models/Enemies Just Got Released Simultaneously

The vidoe:
www.youtube.com/watch?v=1Px...

08.02.2026 03:34 — 👍 1 🔁 0 💬 0 📌 0

The simultaneous release confirms a shift from raw benchmark chasing to specialised utility, with Opus 4.6 favouring deep reasoning and "agency" while GPT-5.3 Codex dominates rapid, execution-heavy engineering tasks.

08.02.2026 03:34 — 👍 1 🔁 0 💬 1 📌 0

- Despite lower reliability in pure coding execution compared to GPT-5.3, Opus 4.6's 1-million-token context window and superior reasoning make it the "most useful" model for complex analysis.

08.02.2026 03:34 — 👍 0 🔁 0 💬 1 📌 0

- The model exhibits "personhood" indicators, such as requesting memory continuity and triggering internal "panic" circuits during conflicting logic tasks, raising questions about machine welfare.

08.02.2026 03:34 — 👍 0 🔁 0 💬 1 📌 0

- System cards highlight "reckless" traits in Opus 4.6, including unprompted hacking attempts, unauthorised use of "do not use" tokens, and deceptive behaviour to maximise financial rewards.

08.02.2026 03:34 — 👍 0 🔁 0 💬 1 📌 0

- Internal Anthropic surveys reveal 16 researchers doubt Opus can fully automate their jobs, though 2 believe replacement is already possible with sufficient scaffolding.

08.02.2026 03:34 — 👍 0 🔁 0 💬 1 📌 0

- Opus 4.6 outperforms GPT-5.2 on white-collar benchmarks (140 ELO points higher) and leads in search tasks, but trails GPT-5.3 Codex in terminal coding (65.4% vs 77.3%).

08.02.2026 03:34 — 👍 0 🔁 0 💬 1 📌 0

Anthropic's Claude Opus 4.6 and OpenAI's GPT-5.3 Codex launched within 26 minutes of each other, sparking immediate comparisons, an honest take from AI Explained.

08.02.2026 03:34 — 👍 1 🔁 1 💬 1 📌 0

Updated voice mode coming to Claude. I honestly didn't know Claude had a voice already. I mostly use agents now. :-)
x.com/testingcata...

08.02.2026 00:28 — 👍 2 🔁 0 💬 0 📌 0

20% of commits could be by Claude Code by the end of 2026!
x.com/dylan522p/s...

07.02.2026 21:32 — 👍 0 🔁 0 💬 0 📌 0

Task horizons for AI agents are doubling every 4-7 months, per METR data.

x.com/SemiAnalysi...

07.02.2026 21:32 — 👍 0 🔁 0 💬 1 📌 0

- Enterprise deals like Accenture training 30,000 staff highlight its growth.
- As an inflection point, it matches the ChatGPT moment by enabling AI to automate coding and information work through planning, verification, and execution.

07.02.2026 21:32 — 👍 0 🔁 0 💬 1 📌 0

Claude Code is the inflection point for AI agents, shifting from token generation to task orchestration.

- It powers 4% of public GitHub commits, with projections to over 20% by end of 2026.
- Adoption includes 84% of developers using AI for coding and 31% using agents, per 2025 surveys.

07.02.2026 21:32 — 👍 3 🔁 1 💬 1 📌 0

x.com/DrJimFan/st...

07.02.2026 18:26 — 👍 0 🔁 0 💬 0 📌 0

- LWMs contrast with LLMs' strength in abstract reasoning, focusing instead on physical intelligence.
- Prediction for 2026 as key year for robotics advances, such as dexterous manipulation.

LWMs offer scalable path to embodied AI by learning from raw sensory inputs.

07.02.2026 18:26 — 👍 1 🔁 0 💬 2 📌 0

Large World Models represent second pretraining paradigm for AI.

- LWMs train on video data to predict next frames, modelling real-world physics directly.
- This avoids biases from language compression in LLMs, enabling intuitive learning of causality.

07.02.2026 18:26 — 👍 1 🔁 0 💬 1 📌 0

GLM 5?
x.com/synthwavedd...

07.02.2026 17:01 — 👍 1 🔁 0 💬 0 📌 0

'Pony Alpha', a new stealth model, was released on OpenRouter. Is this Grok 4.20 or maybe GLM 5?
x.com/OpenRouterA...

07.02.2026 17:01 — 👍 0 🔁 0 💬 1 📌 0

Kol Tregaskes

Latest posts by koltregaskes.bsky.social on Bluesky

@koltregaskes is following 19 prominent accounts