Marco's Avatar

Marco

@mcognetta.bsky.social

Language and keyboard stuff at Google + PhD student at Tokyo Institute of Technology. I like computers and Korean and computers-and-Korean and high school CS education. Georgia Tech → 연세대학교 → 東京工業大学. https://theoreticallygoodwithcomputers.com/

2,897 Followers  |  1,619 Following  |  884 Posts  |  Joined: 01.03.2023  |  1.9561

Latest posts by mcognetta.bsky.social on Bluesky

Can you explain what you mean by "you need to account for all the token boundaries"? Are you labeling spans at a granularity finer than words (at least for English)?

14.02.2026 00:02 — 👍 0    🔁 0    💬 1    📌 0

Thanks! Lichess isn't strictly necessary, but it makes the pipeline slightly easier. I'll DM you soon!

13.02.2026 22:25 — 👍 0    🔁 0    💬 0    📌 0

Thanks! I will dm you soon, I'm still deploying some stuff!

13.02.2026 22:25 — 👍 1    🔁 0    💬 0    📌 0
A Korean passage that begins "아편 전쟁阿片戰爭은...", meaning "the opium wars...".

A Korean passage that begins "아편 전쟁阿片戰爭은...", meaning "the opium wars...".

Unusual hanja formatting in this passage I read.

I usually see hanja in brackets/parentheses or as sub/superscript or at least bolded, so seeing it just sort of nakedly attached to the hangul is a bit jarring. Especially because there is a space between 아편 and 전쟁 but not in 阿片戰爭.

#한국어 #한자

13.02.2026 22:12 — 👍 1    🔁 0    💬 0    📌 1
Video thumbnail

I downloaded something like 300GB of open models and wrote a bunch of map-reduce style processing scripts to make this graph.

It's plotting the distribution of weight values across a variety of popular open models, to show that models are almost entirely made up of small floats.

13.02.2026 11:56 — 👍 52    🔁 3    💬 10    📌 0
peon-ping — Stop babysitting your terminal Warcraft III Peon voice lines as Claude Code notifications. Never miss when Claude needs you.

I need this but for SCVs.

12.02.2026 07:26 — 👍 1    🔁 0    💬 0    📌 0

I'm looking for 5-10 #chess players to test out a tool I'm building. Preferably who play on @lichess.org and are 1200+ in rapid or blitz.

And if you coach chess at all, I'd be extra grateful to have you test it!

NOTE: it is _NOT_ an "LLM chess coach" tool, I promise!

🙏

11.02.2026 22:37 — 👍 6    🔁 5    💬 2    📌 1

For reference, iirc he was bronze league with more than 1000 games. In other words, not very good.

12.02.2026 02:32 — 👍 2    🔁 0    💬 1    📌 0

I'm looking for 5-10 #chess players to test out a tool I'm building. Preferably who play on @lichess.org and are 1200+ in rapid or blitz.

And if you coach chess at all, I'd be extra grateful to have you test it!

NOTE: it is _NOT_ an "LLM chess coach" tool, I promise!

🙏

11.02.2026 22:37 — 👍 6    🔁 5    💬 2    📌 1

What is the timeline of the departures?

11.02.2026 03:42 — 👍 3    🔁 0    💬 1    📌 0
Post image

If you think labeling text spans with LLMs is easy, you probably have not tried it yourself (we have! 🙃).

Any method you can think of – be it tagging, matching, or indexing – has flaws.

In our new preprint, we tested them all 💪We also proposed how to improve one of them.

arxiv.org/abs/2601.16946

29.01.2026 14:20 — 👍 39    🔁 6    💬 2    📌 3

Oh hey you know what? I should probably have just read the paper.

10.02.2026 23:58 — 👍 0    🔁 0    💬 0    📌 0

The model is then forced to output exactly the same surface form text as the input.

With multiple tagged spans/types, this gets harder, but with mild constraints it becomes regular pretty quickly, and then it's quite easy to encode for a constrained generation tool.

10.02.2026 23:56 — 👍 0    🔁 0    💬 1    📌 0

Could the copying issue in tagging be addressed with constrained generation? Like, in the simple case of tagging a single span with open/close tags, convert the input into a regular language where we allow one open tag followed by any amount of text and one closed tag.

10.02.2026 23:54 — 👍 0    🔁 0    💬 2    📌 0
Preview
cars are parked on the side of the road in front of houses ALT: cars are parked on the side of the road in front of houses
10.02.2026 23:26 — 👍 0    🔁 0    💬 0    📌 0

Don't give Elon Musk any ideas

10.02.2026 22:14 — 👍 3    🔁 0    💬 0    📌 0

I propose modern frontier models should be classified as "Really Quite Big Language Models" (RQBLM).

Let's just follow radio astronomy in their naming schemes.

10.02.2026 22:09 — 👍 19    🔁 1    💬 3    📌 0
Video thumbnail

This week on Overcommitted, we got to sit down with Bluesky's favorite tech blogger @samwho.dev and it did not disappoint!

Sam makes some of the coolest tech content on the internet, and if you haven't heard from him yet, you should! Full episode out now: overcommitted.dev/interactive-...

10.02.2026 17:54 — 👍 20    🔁 5    💬 0    📌 1
Post image

For bsky's burgeoning AI scene.

10.02.2026 09:35 — 👍 62    🔁 6    💬 0    📌 0

Oh there are some Korean songs mixed in. It just gets better and better.

09.02.2026 21:17 — 👍 0    🔁 0    💬 0    📌 0
Thai Psych, Molam (หมอลำ), Luk Thung & Soul [Vinyl Studio Session] with Diana Ratsamee
YouTube video by Humano Studios Thai Psych, Molam (หมอลำ), Luk Thung & Soul [Vinyl Studio Session] with Diana Ratsamee

My work rotation today. Unbelievably good.

09.02.2026 21:14 — 👍 0    🔁 0    💬 1    📌 0

Up the chain all the way to the president of NLP

09.02.2026 20:06 — 👍 0    🔁 0    💬 0    📌 0

deep olympics lore

08.02.2026 22:22 — 👍 3    🔁 0    💬 0    📌 0

@aclmeeting.bsky.social @aclrollingreview.bsky.social what is the right way to send a complaint about another review from a paper I am reviewing? One in my batch is absolutely terrible, and I am not confident in leaving it to the AC/etc to catch and properly handle it.

08.02.2026 20:13 — 👍 3    🔁 0    💬 1    📌 0

This is exactly what I want my desk space to look like.

08.02.2026 05:47 — 👍 9    🔁 0    💬 0    📌 0
07.02.2026 06:51 — 👍 5    🔁 0    💬 0    📌 0

TBH I am almost equally impressed that 1) we have the computing power and technical ability to compute this table and 2) that 63TB fit on just 3 hard drives and was cheap enough for someone to just purchase.

07.02.2026 06:14 — 👍 2    🔁 0    💬 1    📌 0
Index of /tables/

It's available for download in case you are interested lol

op1.lichess.ovh/tables/

07.02.2026 06:13 — 👍 1    🔁 0    💬 1    📌 0
Post image

The subset of positions are those where there is some file that has both a black and white pawn that haven't passed each other (like they are blocking each other from promotion).

07.02.2026 06:12 — 👍 1    🔁 0    💬 0    📌 0
Preview
Op1 - Partial 8-piece tablebase available 63 TiB of chess knowledge sent across the Atlantic and now available on the Lichess analysis board

@lichess.org announced a partial 8 piece tablebase that takes up 63TB on disk. They couldn't get a network to help them transfer it, so they just shipped it via plane on hard drives.

"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." -Andy Tanenbaum

07.02.2026 06:12 — 👍 12    🔁 1    💬 3    📌 0

@mcognetta is following 20 prominent accounts