Setting up DNS resolution between containers with --name is π―. Also, the IP per container scheme makes networking a piece of cake.
09.07.2025 21:57 β π 0 π 0 π¬ 0 π 0@dlewis.io.bsky.social
CTO & Data Scientist at Silex Data Solutions & CODEHR.ai. Opinions expressed are my own.
Setting up DNS resolution between containers with --name is π―. Also, the IP per container scheme makes networking a piece of cake.
09.07.2025 21:57 β π 0 π 0 π¬ 0 π 0Native containers in macOS 26 are lightweight & functional. No more Docker or Podman VMs required.
09.07.2025 21:55 β π 1 π 0 π¬ 1 π 0Worked with a customer on LLM infra sizingβhereβs a deep dive on llama-3.3-70b-instruct inference using NVIDIA NIM.
H100 (SXM5) delivered up to 14Γ more throughput vs A100 (PCIe) with far lower latency.
Full benchmarks + thoughts:
dlewis.io/evaluating-l...
Had to remind myself today that bfloat16 on Apple Silicon in PyTorch with AMP provides a minimal performance increase for model training or inferencing. It is very beneficial on NVIDIA GPUs because of Tensor Cores, which PyTorch uses for bfloat16 matmuls.
16.04.2025 22:26 β π 1 π 0 π¬ 0 π 0Wanted to share some of my recent experiences debugging a real-world problem with LLMs. Problem complexity is an issue for some models. Reasoning models fare better. dlewis.io/recent-exper...
16.04.2025 20:17 β π 2 π 0 π¬ 1 π 0With gpt2-xl you can drop the positional encodings entirely and get decent output. The smaller the model the more dependent it is on the positional encodings to generate non-garbage output.
14.04.2025 02:05 β π 0 π 0 π¬ 0 π 0While fixing a KV Cache generation bug today in the MLX GPT-2 implementation that I submitted last year, I discovered that the gpt2 (128M) model is much more dependent on positional encodings than the larger gpt2-xl (1.5B). Guess that explains why linear positional encoding layers were dropped.
14.04.2025 02:04 β π 0 π 0 π¬ 1 π 0Qwen2.5 models are exceptionally strong at tool calling for their size. Definitely stronger than the Llama 3.1/3.2 models.
18.03.2025 02:21 β π 0 π 0 π¬ 0 π 0 β’ Cloud-native infrastructure: Kong Gateway, Tekton Pipelines, Helm Charts
Check it out and contribute today! Feedback and contributions welcome:
github.com/silexdatatea...
Key Highlights:
β’ LangChain/LangGraph for powerful, flexible AI agents
β’ Real-time streaming chat interface with Next.js & SSE
β’ Multiple RAG patterns with PGVector embeddings
β’ FastAPI backend, PostgreSQL storage, comprehensive telemetry via OpenTelemetry
Weβre excited to announce the open sourcing of our AI Foundry Starter Template at Silex Data! This production-ready starter kit empowers you to build and deploy AI apps with LangChainAI/LangGraph, featuring streaming chat, robust Keycloak authentication, Kong's multi-model gateway, and OpenShift.
12.03.2025 19:36 β π 1 π 0 π¬ 1 π 0Recently, I wanted to experiment with some algorithmic trading. Building the Interactive Brokers C++ API client library on macOS & Linux/aarch64 had a few more barriers than I anticipated. Wrote up a brief blog post with the steps. dlewis.io/ibkr-cpp-api/
11.02.2025 23:16 β π 0 π 0 π¬ 0 π 0EXCLUSIVE: Microsoft and OpenAI are investigating whether a group linked to China's DeepSeek obtained OpenAI's data.
29.01.2025 03:16 β π 176 π 41 π¬ 79 π 100Would be nice to see the training code (not just inference code) from DeepSeek for the R1 models. One can hope...
27.01.2025 14:01 β π 1 π 0 π¬ 1 π 0Some of the training techniques are novel and some have been around for a while now. The combination of FP8/BF16 at training is novel. MHLA and MTP are both interesting, but MTP is only used for training - not inference, though it can be used for speculative decoding (not new).
27.01.2025 13:52 β π 0 π 0 π¬ 0 π 0DeepSeek has been out for a week now, so why the panic this morning? I can only assume it is because of the app in the App Store rising to #1. Should be a wake up call to OpenAI/etc., but more efficient models that use less compute at inference-time will just drive larger models.
27.01.2025 13:51 β π 0 π 0 π¬ 1 π 0Whenever code is made more efficient, other code never expands to fill the available compute... This seems to be logic at play this morning on Wall Street and some tech circles re: DeepSeek. Great model, Sputnik moment for foundation model companies, but it isn't over for Nvidia.
27.01.2025 13:51 β π 0 π 0 π¬ 1 π 0Doing some retro programming: Cocoa-based LLM streaming chat application on OS X 10.5 circa 2009 making calls to a FastAPI back-end, which streams events from a LangChain/LangGraph agent capable of making tool calls. 10.5 did not have a NSJSONSerialization, yet.
25.01.2025 17:05 β π 0 π 0 π¬ 0 π 0Interesting data point on adoption. Apple does have a great ecosystem for passkeys. It just works. Most of the concern that Iβve heard expressed is around vendor lock-in. Lose your Apple account and you are screwed. The same issue exists for MS and Google, too.
01.01.2025 15:16 β π 1 π 0 π¬ 0 π 0Gemini 2.0 Flash does become less effective at >50K token lengths. Iβve had much more success starting over with a new conversation and continuing the task than continuing on with the same one.
14.12.2024 03:17 β π 4 π 0 π¬ 0 π 0Gemini 2.0 Flash from my experience today is definitely stronger on coding tasks than gpt-4o and slightly better than Sonnet 3.5 and o1. The Google-published benchmarks confirm this, as well.
14.12.2024 03:16 β π 4 π 0 π¬ 2 π 0Gemini 2.0 Flash is pretty good for coding. Just solved a problem I have been working on for the last day with streaming LLM inputs/outputs & tool calls working reliably. Claude Sonnet 3.5 & o1 (not pro) failed.
13.12.2024 19:03 β π 1 π 0 π¬ 0 π 0Thorough RCA from OpenAI on last nightβs outage. Was related to a telemetry rollout that overwhelmed K8S internal DNS and exacerbated by the resulting lockout to the K8S admin API. status.openai.com/incidents/ct...
13.12.2024 04:24 β π 1 π 0 π¬ 0 π 0Getting Siri to use ChatGPT isnβt 100% and the TeX formatting isnβt available in the Siri output when ChatGPT is used. Markdown formatting seems to there at a glance.
12.12.2024 03:57 β π 1 π 0 π¬ 0 π 0And logged into ChatGPT now on iOS 18.2.
12.12.2024 03:56 β π 2 π 0 π¬ 0 π 0And that global rate limit got hit:
11.12.2024 23:59 β π 1 π 0 π¬ 0 π 0And now there's this:
11.12.2024 23:48 β π 1 π 0 π¬ 1 π 0Looks like OpenAI is getting overloaded after the iOS 18.2 rollout. Trying to sign into my account on my phone after the upgrade to test out the new integration.
11.12.2024 23:44 β π 1 π 0 π¬ 1 π 0Of course, someone using dataclasses or Pylance helps here a lot, but there always seems to be some reverse engineering required when trying to understand someone else's Python code.
05.12.2024 15:59 β π 2 π 0 π¬ 0 π 0