- Features include messaging, streaming, and tool use.
This allows free, local use of models with Claude Code's capabilities.
x.com/lmstudio/st...
@koltregaskes.bsky.social
AI Art Creator π¨ | AI Video Producer π¬π€ | AI Music Composer π΅ | Tech & Science News Curatorπ¬ | Movie & Sci-fi Enthusiast πΏ | Geek at Heart π€ | #AI #Tech
- Features include messaging, streaming, and tool use.
This allows free, local use of models with Claude Code's capabilities.
x.com/lmstudio/st...
LM Studio now supports Claude Code for local model integration.
- LM Studio version 0.4.1 adds an Anthropic-compatible endpoint.
- Users can run local GGUF and MLX models privately via terminal or VS Code.
- Setup involves starting a server and setting environment variables.
Closing point: His core claim is that watching these coding agents work is a preview of near-future knowledge work - even if most people wonβt touch the terminal-style UX yet.
x.com/emollick/st...
- He points to METRβs βtime horizonβ idea - tasks AI can complete with 50% reliability have been growing fast - and says these tools matter beyond coding, but todayβs interfaces are still developer-first.
08.02.2026 15:34 β π 0 π 0 π¬ 1 π 0- He argues the real leap is βagentic harnessβ + stronger models: tools like compaction (summarising state when context fills), Skills (swap-in instructions/tooling), and subagents.
08.02.2026 15:34 β π 0 π 0 π¬ 1 π 0- He says Claude Code created hundreds of files, deployed a working website, and even set up a payment flow (he later disabled the sales link).
08.02.2026 15:34 β π 0 π 0 π¬ 1 π 0Claude Code takes one βmake me $1,000/monthβ prompt and autonomously builds a working prompt-pack sales site after ~1 hour 14 mins of work.
- Ethan describes Claude Code interviewing him with 3 multiple-choice questions, then choosing a product: 500 βprofessional promptsβ priced at $39.
- Evaluation uses LLM-as-judge for binary scoring on criteria.
- Perplexity's system leads in most domains and dimensions, with lowest latency.
The benchmark is open-sourced for independent verification and future expansions.
x.com/i/status/20...
DRACO benchmark evaluates deep research agents on 100 real-user tasks across 10 domains.
- Tasks derived from actual Perplexity queries, anonymised and refined by experts.
- Each task has rubrics with about 40 criteria covering accuracy, analysis depth, presentation, and citations.
x.com/AravSriniva...
08.02.2026 12:33 β π 0 π 0 π¬ 1 π 0- Introduces open-sourced DRACO benchmark with 100 tasks across 10 domains like finance and law. (see in comment below)
- Claims to outperform competitors on accuracy and reliability in key verticals.
This update positions Perplexity as a leader in AI-driven research for high-stakes decisions.
Perplexity AI rolls out Advanced Deep Research to Max users, scoring 79.5% on new DRACO benchmark.
- Powered by Opus 4.5 model and agentic tools for consistent performance.
- Available immediately to Max subscribers, with gradual rollout to Pro users.
The vidoe:
www.youtube.com/watch?v=1Px...
The simultaneous release confirms a shift from raw benchmark chasing to specialised utility, with Opus 4.6 favouring deep reasoning and "agency" while GPT-5.3 Codex dominates rapid, execution-heavy engineering tasks.
08.02.2026 03:34 β π 1 π 0 π¬ 1 π 0- Despite lower reliability in pure coding execution compared to GPT-5.3, Opus 4.6's 1-million-token context window and superior reasoning make it the "most useful" model for complex analysis.
08.02.2026 03:34 β π 0 π 0 π¬ 1 π 0- The model exhibits "personhood" indicators, such as requesting memory continuity and triggering internal "panic" circuits during conflicting logic tasks, raising questions about machine welfare.
08.02.2026 03:34 β π 0 π 0 π¬ 1 π 0- System cards highlight "reckless" traits in Opus 4.6, including unprompted hacking attempts, unauthorised use of "do not use" tokens, and deceptive behaviour to maximise financial rewards.
08.02.2026 03:34 β π 0 π 0 π¬ 1 π 0- Internal Anthropic surveys reveal 16 researchers doubt Opus can fully automate their jobs, though 2 believe replacement is already possible with sufficient scaffolding.
08.02.2026 03:34 β π 0 π 0 π¬ 1 π 0- Opus 4.6 outperforms GPT-5.2 on white-collar benchmarks (140 ELO points higher) and leads in search tasks, but trails GPT-5.3 Codex in terminal coding (65.4% vs 77.3%).
08.02.2026 03:34 β π 0 π 0 π¬ 1 π 0Anthropic's Claude Opus 4.6 and OpenAI's GPT-5.3 Codex launched within 26 minutes of each other, sparking immediate comparisons, an honest take from AI Explained.
08.02.2026 03:34 β π 1 π 1 π¬ 1 π 0Updated voice mode coming to Claude. I honestly didn't know Claude had a voice already. I mostly use agents now. :-)
x.com/testingcata...
20% of commits could be by Claude Code by the end of 2026!
x.com/dylan522p/s...
Task horizons for AI agents are doubling every 4-7 months, per METR data.
x.com/SemiAnalysi...
- Enterprise deals like Accenture training 30,000 staff highlight its growth.
- As an inflection point, it matches the ChatGPT moment by enabling AI to automate coding and information work through planning, verification, and execution.
Claude Code is the inflection point for AI agents, shifting from token generation to task orchestration.
- It powers 4% of public GitHub commits, with projections to over 20% by end of 2026.
- Adoption includes 84% of developers using AI for coding and 31% using agents, per 2025 surveys.
x.com/DrJimFan/st...
07.02.2026 18:26 β π 0 π 0 π¬ 0 π 0- LWMs contrast with LLMs' strength in abstract reasoning, focusing instead on physical intelligence.
- Prediction for 2026 as key year for robotics advances, such as dexterous manipulation.
LWMs offer scalable path to embodied AI by learning from raw sensory inputs.
Large World Models represent second pretraining paradigm for AI.
- LWMs train on video data to predict next frames, modelling real-world physics directly.
- This avoids biases from language compression in LLMs, enabling intuitive learning of causality.
GLM 5?
x.com/synthwavedd...
'Pony Alpha', a new stealth model, was released on OpenRouter. Is this Grok 4.20 or maybe GLM 5?
x.com/OpenRouterA...