Derek Lewis's Avatar

Derek Lewis

@dlewis.io.bsky.social

CTO & Data Scientist at Silex Data Solutions & CODEHR.ai. Opinions expressed are my own.

217 Followers  |  282 Following  |  38 Posts  |  Joined: 20.11.2024  |  1.7895

Latest posts by dlewis.io on Bluesky

Setting up DNS resolution between containers with --name is πŸ’―. Also, the IP per container scheme makes networking a piece of cake.

09.07.2025 21:57 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Native containers in macOS 26 are lightweight & functional. No more Docker or Podman VMs required.

09.07.2025 21:55 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Evaluating Llama‑3.3‑70B Inference on NVIDIAΒ H100 and A100 GPUs Large‑scale language models quickly expose the limits of yesterday’s hardware. To understand how much practical head‑room Hopper offers over Ampere in a production‑style setting, I profiled llama-3.3-...

Worked with a customer on LLM infra sizingβ€”here’s a deep dive on llama-3.3-70b-instruct inference using NVIDIA NIM.

H100 (SXM5) delivered up to 14Γ— more throughput vs A100 (PCIe) with far lower latency.

Full benchmarks + thoughts:

dlewis.io/evaluating-l...

17.04.2025 18:26 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Had to remind myself today that bfloat16 on Apple Silicon in PyTorch with AMP provides a minimal performance increase for model training or inferencing. It is very beneficial on NVIDIA GPUs because of Tensor Cores, which PyTorch uses for bfloat16 matmuls.

16.04.2025 22:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image 16.04.2025 20:20 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Recent Experiences Debugging with LLMs I’m frequently asked by clients what my thoughts are on LLMs and coding. Personal experience has informed me that LLMs cannot solve problems of a certain complexity for a number of reasons. One of the...

Wanted to share some of my recent experiences debugging a real-world problem with LLMs. Problem complexity is an issue for some models. Reasoning models fare better. dlewis.io/recent-exper...

16.04.2025 20:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

With gpt2-xl you can drop the positional encodings entirely and get decent output. The smaller the model the more dependent it is on the positional encodings to generate non-garbage output.

14.04.2025 02:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

While fixing a KV Cache generation bug today in the MLX GPT-2 implementation that I submitted last year, I discovered that the gpt2 (128M) model is much more dependent on positional encodings than the larger gpt2-xl (1.5B). Guess that explains why linear positional encoding layers were dropped.

14.04.2025 02:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Qwen2.5 models are exceptionally strong at tool calling for their size. Definitely stronger than the Llama 3.1/3.2 models.

18.03.2025 02:21 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - SilexDataTeam/ai-foundry-starter: Production-ready starter kit for building and deploying AI applications powered by LangChain/LangGraph β€” complete with streaming chat, robust authentication,... Production-ready starter kit for building and deploying AI applications powered by LangChain/LangGraph β€” complete with streaming chat, robust authentication, a Kong-based multi-model gateway, and m...

β€’ Cloud-native infrastructure: Kong Gateway, Tekton Pipelines, Helm Charts

Check it out and contribute today! Feedback and contributions welcome:

github.com/silexdatatea...

12.03.2025 19:38 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Key Highlights:
β€’ LangChain/LangGraph for powerful, flexible AI agents
β€’ Real-time streaming chat interface with Next.js & SSE
β€’ Multiple RAG patterns with PGVector embeddings
β€’ FastAPI backend, PostgreSQL storage, comprehensive telemetry via OpenTelemetry

12.03.2025 19:37 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

We’re excited to announce the open sourcing of our AI Foundry Starter Template at Silex Data! This production-ready starter kit empowers you to build and deploy AI apps with LangChainAI/LangGraph, featuring streaming chat, robust Keycloak authentication, Kong's multi-model gateway, and OpenShift.

12.03.2025 19:36 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Building the IBKR C++ API Client Library Recently, I wanted to use the C++ API client library that Interactive Brokers provides and experiment with some algorithmitic trading and monitoring of my positions. I had hoped there would be some pr...

Recently, I wanted to experiment with some algorithmic trading. Building the Interactive Brokers C++ API client library on macOS & Linux/aarch64 had a few more barriers than I anticipated. Wrote up a brief blog post with the steps. dlewis.io/ibkr-cpp-api/

11.02.2025 23:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Microsoft Probing If DeepSeek-Linked Group Improperly Obtained OpenAI Data Microsoft Corp. and OpenAI are investigating whether data output from OpenAI’s technology was obtained in an unauthorized manner by a group linked to Chinese artificial intelligence startup DeepSeek, ...

EXCLUSIVE: Microsoft and OpenAI are investigating whether a group linked to China's DeepSeek obtained OpenAI's data.

29.01.2025 03:16 β€” πŸ‘ 176    πŸ” 41    πŸ’¬ 79    πŸ“Œ 100

Would be nice to see the training code (not just inference code) from DeepSeek for the R1 models. One can hope...

27.01.2025 14:01 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Some of the training techniques are novel and some have been around for a while now. The combination of FP8/BF16 at training is novel. MHLA and MTP are both interesting, but MTP is only used for training - not inference, though it can be used for speculative decoding (not new).

27.01.2025 13:52 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

DeepSeek has been out for a week now, so why the panic this morning? I can only assume it is because of the app in the App Store rising to #1. Should be a wake up call to OpenAI/etc., but more efficient models that use less compute at inference-time will just drive larger models.

27.01.2025 13:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Whenever code is made more efficient, other code never expands to fill the available compute... This seems to be logic at play this morning on Wall Street and some tech circles re: DeepSeek. Great model, Sputnik moment for foundation model companies, but it isn't over for Nvidia.

27.01.2025 13:51 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Video thumbnail

Doing some retro programming: Cocoa-based LLM streaming chat application on OS X 10.5 circa 2009 making calls to a FastAPI back-end, which streams events from a LangChain/LangGraph agent capable of making tool calls. 10.5 did not have a NSJSONSerialization, yet.

25.01.2025 17:05 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Interesting data point on adoption. Apple does have a great ecosystem for passkeys. It just works. Most of the concern that I’ve heard expressed is around vendor lock-in. Lose your Apple account and you are screwed. The same issue exists for MS and Google, too.

01.01.2025 15:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Gemini 2.0 Flash does become less effective at >50K token lengths. I’ve had much more success starting over with a new conversation and continuing the task than continuing on with the same one.

14.12.2024 03:17 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Gemini 2.0 Flash from my experience today is definitely stronger on coding tasks than gpt-4o and slightly better than Sonnet 3.5 and o1. The Google-published benchmarks confirm this, as well.

14.12.2024 03:16 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Gemini 2.0 Flash is pretty good for coding. Just solved a problem I have been working on for the last day with streaming LLM inputs/outputs & tool calls working reliably. Claude Sonnet 3.5 & o1 (not pro) failed.

13.12.2024 19:03 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
API, ChatGPT & Sora Facing Issues OpenAI's Status Page - API, ChatGPT & Sora Facing Issues.

Thorough RCA from OpenAI on last night’s outage. Was related to a telemetry rollout that overwhelmed K8S internal DNS and exacerbated by the resulting lockout to the K8S admin API. status.openai.com/incidents/ct...

13.12.2024 04:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Getting Siri to use ChatGPT isn’t 100% and the TeX formatting isn’t available in the Siri output when ChatGPT is used. Markdown formatting seems to there at a glance.

12.12.2024 03:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

And logged into ChatGPT now on iOS 18.2.

12.12.2024 03:56 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

And that global rate limit got hit:

11.12.2024 23:59 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

And now there's this:

11.12.2024 23:48 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Looks like OpenAI is getting overloaded after the iOS 18.2 rollout. Trying to sign into my account on my phone after the upgrade to test out the new integration.

11.12.2024 23:44 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Of course, someone using dataclasses or Pylance helps here a lot, but there always seems to be some reverse engineering required when trying to understand someone else's Python code.

05.12.2024 15:59 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@dlewis.io is following 20 prominent accounts