Martin Görner's Avatar

Martin Görner

@martin-gorner.bsky.social

AI/ML engineer. Previously at Google: Product Manager for Keras and TensorFlow and developer advocate on TPUs. Passionate about democratizing Machine Learning.

376 Followers  |  754 Following  |  772 Posts  |  Joined: 29.11.2024  |  2.2501

Latest posts by martin-gorner.bsky.social on Bluesky

Preview
Evaluating AI Inference Accelerators For Machine Vision Applications — Hot Tech In a head-to-head battle of AI accelerators, the results are in — and Axelera AI didn’t just win, it ran laps around the competition.

Full report here: www.hottech.com/industry-cov...

16.07.2025 15:51 — 👍 1    🔁 0    💬 0    📌 0
Post image

and check out the full report, which has data about more modern models like YOLO8L. For that model, compared to the best NVIDIA card tested, Axelera's Metis is:
- 230% faster
- 330% more power efficient
and also about 3x cheaper

16.07.2025 15:42 — 👍 1    🔁 0    💬 1    📌 0
Preview
Axelera AI Accelerators Smoke Competitors In Machine Vision Research Study Domain-specific accelerators are proving they can compete, and in some cases lead, in the metrics that matter most for real-world deployments.

The secret sauce works!
www.forbes.com/sites/daveal...

16.07.2025 15:32 — 👍 1    🔁 0    💬 1    📌 0
Preview
PCIe AI accelerator card. Powered by 4 quad-core Metis AIPUs | Axelera AI Store Axelera AI’s PCIe card, powered by 4 Metis AIPU, offers the highest performance inference acceleration on the market, combining ease of use, power efficiency, and scalability. Key Benefits: The highes...

4-chip Metis accelerator PCIe card coming soon:
store.axelera.ai/collections/...

30.05.2025 14:47 — 👍 1    🔁 0    💬 0    📌 0
Preview
Metis PCIe AI Inference Acceleration Card | Axelera AI Looking for powerful & energy-efficient AI acceleration hardware that doesn't break budgets? Discover our PCIe AI inference accelerator card.

PCIe and M.2 Metis boards available now:
- PCIe: axelera.ai/ai-accelerat...
- M.2: axelera.ai/ai-accelerat...

30.05.2025 14:46 — 👍 1    🔁 0    💬 1    📌 0

50+ models pre-configured in the model zoo are ready to run:
github.com/axelera-ai-h...

30.05.2025 14:46 — 👍 0    🔁 0    💬 1    📌 0
Preview
Simplifying Model and Pipeline Deployment with the Voyager SDK | Community Axelera AI’s A-Tang Fan and Doug Watt explain how the Voyager SDK simplifies the complex task of deploying AI-powered video pipelines on edge devices. This blog explores how its model compiler, model ...

Blog post by A-Tang Fan and Doug Watt about Axelera.ai's Voyager SDK: community.axelera.ai/product-upda...

30.05.2025 14:35 — 👍 1    🔁 1    💬 1    📌 0
Axelera AI - Extreme Performance, Excellent Efficiency. Accelerating Inference at the Edge. Bring data insights to the edge, increasing the performance of your solutions with a cost-effective and efficient inference chip. Axelera’s AI processing unit is designed to seamlessly integrate into ...

I am impressed and humbled by what the Axelera team was able to bring to market, on only three years, with the Metis chip and Voyager SDK. And it's just a beginning. We have an exciting roadmap ahead! axelera.ai

07.05.2025 14:22 — 👍 2    🔁 0    💬 0    📌 0

The explosion of new AI models and capabilities, in advanced vision, speech recognition, language models, reasoning etc, needs a novel, energy-efficient approach to AI acceleration to deliver truly magical AI experiences, at the edge and in the datacenter.

07.05.2025 14:22 — 👍 2    🔁 0    💬 1    📌 0
Post image

I'm delighted to share that I joined the Axelera team this week to deliver the next generation AI compute platform. axelera.ai

07.05.2025 14:21 — 👍 3    🔁 0    💬 2    📌 0
TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)
On the forefront of deep learning research is a technique called reinforcement learning, which bridges the gap between academic deep learning problems and wa... TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)

You'll see it in this form in Karpathy's original "Pong from pixel" post karpathy.github.io/2016/05/31/rl/ as well as my "RL without a PhD" video from a while ago youtu.be/t1A3NTttvBA.... They also explore a few basic reward assignment strategies.
Have fun, don't despair and RL!

11.03.2025 01:40 — 👍 0    🔁 0    💬 0    📌 0
Post image

It is usually written in vector form, using the cross-entropy function. This time, I use 𝛑̅(sᵢₖ) for the *vector* of all move probabilities predicted from game state sᵢₖ, while 𝒙̅ᵢₖ is the one-hot encoded *vector* representing the move actually played in game i move k.

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0
Post image

One more thing: In modern autograd libraries like PyTorch or JAX, the RL gradient can be computed from the following “pseudo-loss”. Don’t try to find the meaning of this function, it does not have any. It’s just a function that has the gradient we want.

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0
Post image

So in conclusion, math tells us that Reinforcement Learning is possible, even in multi-turn games where you cannot differentiate across multiple moves. But math tells us nothing about how to do it in practice. Which is why it is hard.

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0

What "rewards"? Well the "good" ones, that encourage the "correct" moves!
This is pretty much like a delicious recipe saying you should mix "great" ingredients in the "correct" proportions 😭. NOT HELPFUL AT ALL 🤬 !!!

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0

Now the bad news: what this equation really means is that the gradient we are looking for is the weighted sum of the gradients of our policy network over many individual games and moves, weighted by an unspecified set of “rewards”.

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0

We wanted to maximize the expected reward and managed to approximate the gradients form successive runs of the game, as played by our policy network. We can run backprop after all ... at least in theory 🙁.

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0

... string of zeros followed by the final reward - but we may have finer-grained rewarding strategies. The 1/N constant was folded into the rewards.

The good news: yay, This is computable 🎉🥳🎊! The expression only involves our policy network and our rewards.

11.03.2025 01:40 — 👍 0    🔁 0    💬 1    📌 0
Post image

For a more practical application, let's unroll the log-probabilities into individual moves using eq. (1) and rearrange a little. We use the fact that the log of a product is a sum of logs. I have also split the game reward into separate game steps rewards rᵢₖ - worst case a ...

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

But look, this is a sum of a probability × some value. That's an expectation! Which means that instead of computing it directly, we can approximate it from multiple games gᵢ :

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

And now we can start approximating like crazy - and abandon any pretense of doing exact math 😅.
First, we use our policy network 𝛑 to approximate the move probabilities.

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

Combining the last two equations we get:

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

We now use a mathematical cheap trick based on the fact that the derivative of log(x) is 1/x. With gradients, this cheap trick reads:

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

To maximize the expectation (3) we compute its gradient. The notation ∇ is the "gradient", or list of partial derivatives relatively to parameters θ. Differentiation is a linear operation so we can enter it into the sum Σ. Also, rewards do not depend on θ so:

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0

For example, for a single dice, the possible values are 1, 2, 3, 4, 5, 6, the probability of each outcome is p=⅙ which gives us an expectation of 3.5. And you get roughly the same number by rolling the dice many times and averaging the result. It's the "law of large numbers".

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

In probabilities, the “expectation” of a random variable X is the weighted sum of all possible outcomes xₖ, weighted by their probabilities p(xₖ). The really nice thing about expectations is that you can approximate them: just repeat the experiment and average the outcomes:

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0
Post image

Introducing the Reinforcement Learning ninja 🥷 hack: we can play many games and try and maximize the "expected reward". Yes, bear with me, this will end up being computable!

11.03.2025 01:39 — 👍 0    🔁 0    💬 1    📌 0

We have to sample and play multiple moves before a reward is known. With the reward, we would like to backprop through all the moves and adjust the parameters θ of the policy network but we cannot. This process is not differentiable end-to-end 😫.

11.03.2025 01:39 — 👍 1    🔁 0    💬 1    📌 0

To play, we sample a move from the predicted probabilities (i.e. roll the dice to pick a move, but with skewed probabilities as predicted by the network). And this is a problem because sampling is not a differentiable operation.

11.03.2025 01:39 — 👍 2    🔁 0    💬 1    📌 0

In the "policy gradients" approach to RL, a neural network called "policy network" is used to predict the next move. The network sees the game state and returns next move probabilities. We call 𝛑(xₖ) the probability it predicts for move xₖ. θ is the set of weights of the net.

11.03.2025 01:39 — 👍 1    🔁 0    💬 1    📌 0

@martin-gorner is following 20 prominent accounts