S. Ota's Avatar

S. Ota

@ota.bsky.social

Interests: Reinforcement Learning, Natural Language Processing and Artificial General Intelligence. arXiv papers bot: @paper.bsky.social

454 Followers  |  196 Following  |  434 Posts  |  Joined: 03.03.2023  |  1.6407

Latest posts by ota.bsky.social on Bluesky

Preview
gpt-oss: How to Run & Fine-tune | Unsloth Documentation Run & fine-tune OpenAI's new open-source models!

"Unsloth gpt-oss fine-tuning is 1.5x faster, uses 70% less VRAM, and supports 10x longer context lengths. gpt-oss-20b LoRA training fits on a 14GB VRAM, and gpt-oss-120b works on 65GB VRAM."

docs.unsloth.ai/basics/gpt-o...

10.08.2025 04:50 — 👍 0    🔁 0    💬 0    📌 0
Preview
GitHub - huggingface/gpt-oss-recipes: Collection of scripts and notebooks for OpenAI's latest GPT OSS models Collection of scripts and notebooks for OpenAI's latest GPT OSS models - huggingface/gpt-oss-recipes

"Collection of scripts demonstrating different optimization and fine-tuning techniques for OpenAI's GPT-OSS models (20B and 120B parameters).

...

For full-parameter training on one node of 8 GPUs, ..."

github.com/huggingface/...

10.08.2025 04:46 — 👍 1    🔁 0    💬 0    📌 0
Preview
Agar Mini Agar mini The Agar Mini represents the latest evolution in the esteemed Agar series, distilling its signature design language into a new compact form factor. This model preserves the elegant, curved a...

"The Agar Mini is available in two distinct versions ...

Wired Edition: Powered by QMK firmware, ... is fully compatible with VIA, VIAL, ...

Dual-Mode Wireless Edition: Built on ZMK firmware, ... is fully customizable via the zmk.studio editor."

kbdfans.com/products/aga...

25.07.2025 14:08 — 👍 1    🔁 0    💬 0    📌 0
Preview
Gemini CLI First there was Claude Code in February, then OpenAI Codex (CLI) in April, and now Gemini CLI in June. All three of the largest AI labs now have their own …

My notes on Gemini CLI, including poking around in their system prompt which I've extracted into a more readable rendered Gist simonwillison.net/2025/Jun/25/...

25.06.2025 17:55 — 👍 82    🔁 10    💬 5    📌 1
Preview
Gemini CLI: your open-source AI agent Free and open source, Gemini CLI brings Gemini directly into developers’ terminals — with unmatched access for individuals.

The "secret project" I've been working on at my job has gone public (and open source) today! Check it out!

25.06.2025 15:36 — 👍 4    🔁 1    💬 0    📌 0

Since I had already used the Gemini API, I had to unset the GEMINI_API_KEY in order to authenticate with my Google account.

GEMINI_API_KEY="" gemini

25.06.2025 15:15 — 👍 0    🔁 0    💬 0    📌 0
Skyfeed  settings for the Claude Code | Gemini CLI feed.

Skyfeed settings for the Claude Code | Gemini CLI feed.

I expanded the Claude code feed to include the Gemini CLI.

bsky.app/profile/did:...

25.06.2025 14:25 — 👍 3    🔁 0    💬 0    📌 0

bsky.app/profile/did:...

This feed will be moved to

bsky.app/profile/did:...

23.06.2025 15:16 — 👍 0    🔁 0    💬 0    📌 0
RegEx patterns for Custom Keyboard feed.

RegEx patterns for Custom Keyboard feed.

RegEx patterns for Custom Keyboard feed.

23.06.2025 10:51 — 👍 0    🔁 0    💬 0    📌 0

Posts related to custom keyboard, DIY keyboard, mechanical keyboard, key switch, keycap, etc.
自作キーボード, キースイッチ, キーキャップなどを含むポスト.

bsky.app/profile/did:...

23.06.2025 10:49 — 👍 0    🔁 0    💬 1    📌 0

Posts related to `Claude Code`.
Claude Code を含むポスト。

bsky.app/profile/did:...

23.06.2025 10:46 — 👍 0    🔁 0    💬 0    📌 0
Preview
GitHub - jupyterlab/jupyter-ai: A generative AI extension for JupyterLab A generative AI extension for JupyterLab. Contribute to jupyterlab/jupyter-ai development by creating an account on GitHub.

Jupyter Notebookと生成AIの組み合わせ、あると思います。チャット用のサイドウィンドウを表示したり、%%aiでNotebookの中から生成AIに問い合わせできたり。各種AI利用の他、Ollamaなどにも対応してるのでローカルLLMも利用可。

01.06.2025 06:59 — 👍 2    🔁 1    💬 0    📌 0

Thanks! I fixed the feeds.

08.05.2025 18:11 — 👍 0    🔁 0    💬 0    📌 0
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards.  Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training.  The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining.  Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system.  To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data.  Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning.  Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples.  Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.

[5/30] 396 Likes, 96 Comments, 4 Posts
2505.03335, cs․LG | cs․AI | cs․CL, 06 May 2025

🆕Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, Yiran Wu, Yang Yue, Tong Wu, Quentin Xu, Yang Yue, Matthieu Lin, Shenzhi Wang, Qingyun Wu, Zilong Zheng, Gao Huang

08.05.2025 00:07 — 👍 1    🔁 1    💬 1    📌 0
Preview
GitHub - evan-liu/karabiner.ts: Write Karabiner-Elements configuration in TypeScript Write Karabiner-Elements configuration in TypeScript - evan-liu/karabiner.ts

"Write Karabiner-Elements configuration in TypeScript. ... Easier-to-understand TypeScript/JavaScript syntax, Strong-typed abstractions and key aliases with IDE support, Structured config files instead of one big file"

github.com/evan-liu/kar...

12.04.2025 14:52 — 👍 2    🔁 0    💬 0    📌 0
Screenshot of layers and keymap.

Screenshot of layers and keymap.

Karabiner-Elements でマルチレイヤのキーマップを作った。karabiner.ts というライブラリを使ったらレイヤーが簡単に実装できた!

layer('japanese_eisuu', '英数 + ijkl').manipulators([
map('i').to('↑'),
map('j').to('←'),
map('k').to('↓'),
map('l').to('→'),
]),

それと久しぶりに Deno を使ってみたが、こういう簡単なプログラムならかなり楽。

github.com/susumuota/ka...

12.04.2025 14:36 — 👍 3    🔁 0    💬 2    📌 0

"By serving as an intermediary in user interactions, it can autonomously generate context-aware responses, prefill required information, and facilitate seamless communication with external systems, significantly reducing cognitive load and interaction friction."

11.04.2025 02:57 — 👍 0    🔁 0    💬 0    📌 0
Results of SWE-bench Verified at 2025-04-07.

Results of SWE-bench Verified at 2025-04-07.

SWE-bench Verified 56.0% は 2024-12 頃のモデルとコンパラブル。

www.swebench.com#verified

07.04.2025 08:55 — 👍 0    🔁 0    💬 0    📌 0
Preview
GitHub Copilotでバイブコーディング:エージェントモードとMCPサポートがVS Codeユーザーに提供開始 MCPをサポートしたエージェントモードをすべてのVS Codeユーザーに展開します。また、新しい GitHub Copilot Pro+ プラン (プレミアム リクエスト付き)、Anthropic、Google、OpenAI のモデルの一般提供開始、Next Editコード補完提案、GitHub Copilot コード レビュー エージェントについても発表します。

"Visual Studio Codeのエージェントモードを全ユーザーに提供します。このモードはMCPをサポートしており、必要なあらゆるコンテキストや機能へのアクセスを可能にします。... エージェントモードのモデルは、Claude 3.5と3.7 Sonnet、Google Gemini 2.0 Flash、OpenAI GPT-4oから選択できます。現在、エージェントモードはClaude 3.7 Sonnetを使用した場合、SWE-bench Verifiedで56.0%の合格率を達成しています。"

github.blog/jp/2025-04-0...

07.04.2025 08:51 — 👍 2    🔁 0    💬 1    📌 0

"... the word “writing” no longer refers to this physical act but the higher abstraction of arranging ideas into a readable format. Similarly, once the physical act of coding can be automated, the meaning of “programming” will change to refer to the act of arranging ideas into executable programs."

07.04.2025 08:06 — 👍 0    🔁 0    💬 0    📌 0
Preview
The End of Programming as We Know It

"... if a task can only be done by a handful of those most educated, that task is considered intellectual. One example is writing, the physical act of copying words onto paper. In the past, when only a small portion of the population was literate, writing was considered intellectual."

07.04.2025 08:05 — 👍 1    🔁 0    💬 1    📌 0
Post image

Well the latest DeepSeek is very satisfying from an humanities perspective. The trick to generalize RL is replacing scalar grades with… source criticism (qualitative principles and critiques). arxiv.org/pdf/2504.02495

04.04.2025 08:01 — 👍 46    🔁 5    💬 1    📌 2

"A key challenge of RL is to obtain accurate reward signals for LLMs in various domains beyond verifiable questions or artificial rules. ... we investigate how to improve reward modeling (RM) with more inference compute for general queries, i.e. the inference-time scalability of generalist RM"

05.04.2025 04:05 — 👍 0    🔁 0    💬 0    📌 0
image alt text can be 2000 characters long.

image alt text can be 2000 characters long.

I just realised that image alt text can be 2000 characters long.

I have fixed @paper.bsky.social to allow 2000 chars.

05.04.2025 03:58 — 👍 0    🔁 0    💬 0    📌 0

"This survey provides a comprehensive overview, framing intelligent agents within a modular, brain-inspired architecture that integrates principles from cognitive science, neuroscience, and computational research."

05.04.2025 03:23 — 👍 1    🔁 0    💬 0    📌 0
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build --config Release -j 4

aria2c --header="Authorization: Bearer $(cat ~/.cache/huggingface/token)" -x 4 https://huggingface.co/google/gemma-3-4b-it-qat-q4_0-gguf/resolve/main/gemma-3-4b-it-q4_0.gguf -o gemma-3-4b-it-q4_0.gguf -d models

./build/bin/llama-server -t 4 -ngl 0 -c 8192 -n -1 --host 0.0.0.0 --port 8000 --no-webui -m models/gemma-3-4b-it-q4_0.gguf

# on another terminal
uv tool install -p 3.11 open-webui@latest
open-webui serve --port 8080

# open http://localhost:8080/ with web browser
# add "http://localhost:8000/v1" as an OpenAI API

git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp cmake -B build cmake --build build --config Release -j 4 aria2c --header="Authorization: Bearer $(cat ~/.cache/huggingface/token)" -x 4 https://huggingface.co/google/gemma-3-4b-it-qat-q4_0-gguf/resolve/main/gemma-3-4b-it-q4_0.gguf -o gemma-3-4b-it-q4_0.gguf -d models ./build/bin/llama-server -t 4 -ngl 0 -c 8192 -n -1 --host 0.0.0.0 --port 8000 --no-webui -m models/gemma-3-4b-it-q4_0.gguf # on another terminal uv tool install -p 3.11 open-webui@latest open-webui serve --port 8080 # open http://localhost:8080/ with web browser # add "http://localhost:8000/v1" as an OpenAI API

gemma-3-4b-it-qat-q4_0-gguf runs on Raspberry Pi 5 8GB at 2.8 tokens / sec.
huggingface.co/google/gemma...

04.04.2025 14:57 — 👍 5    🔁 1    💬 0    📌 0
Large Language Models (LLMs) have achieved remarkable success in natural language processing.  Recent advances have led to the developing of a new class of reasoning LLMs; for example, open-source DeepSeek-R1 has achieved state-of-the-art performance by integrating deep thinking and complex reaso...

Large Language Models (LLMs) have achieved remarkable success in natural language processing. Recent advances have led to the developing of a new class of reasoning LLMs; for example, open-source DeepSeek-R1 has achieved state-of-the-art performance by integrating deep thinking and complex reaso...

[27/30] 107 Likes, 2 Comments, 1 Posts
2503.18878, cs․CL, 24 Mar 2025

🆕I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Andrey Galichin, Alexey Dontsov, Polina Druzhinina, Anton Razzhigaev, Oleg Y. Rogov, Elena Tutubalina, Iva...

29.03.2025 00:05 — 👍 3    🔁 1    💬 1    📌 0
Preview
ハルシネーションの少ない算数・数学instructionデータセットを合成する 要約Magpieを用いて算数・数学のinstructionを生成LLMがそのまま解いた結果とLLMがPythonで解いた結果を比較し、両者の結果が一致していたデータのみを残す今回の方法で算数・数学の計算問題のデータを約10万件生成したhttps://huggingface.co/datasets/Kendamarron/magpie-easy-math-instruction-88k-qwen2.

今日のAI関連記事

ハルシネーションの少ない算数・数学instructionデータセットを合成する | Zennの「LLM」のフィード
Magpieを用いて算数・数学のinstructionデータを生成し、LLMが自力で解いた結果とPythonで解いた結果を比較。
両者が一致したデータのみを選別し、ハルシネーションの少ないデータセットを構築。
計算問題データ不足の課題を解決し、扱いやすいデータ提供を目指す。
約10万件のデータセットが公開されている。

21.03.2025 07:15 — 👍 3    🔁 1    💬 0    📌 0

They are based on the selenized color palette.

github.com/jan-warchol/...

20.03.2025 11:26 — 👍 0    🔁 0    💬 0    📌 0

@ota is following 19 prominent accounts