Joe Barrow's Avatar

Joe Barrow

@jbarrow.bsky.social

NLP @ Pattern Data Prev: Adobe Research, PhD UMD

101 Followers  |  190 Following  |  22 Posts  |  Joined: 08.11.2024  |  2.002

Latest posts by jbarrow.bsky.social on Bluesky

Now, some acknowledgments: this work was made possible thanks to a generous compute grant from Lambda!

And I've got a hosted version of the model that I'll be sharing in a couple days hosted on @modal-labs.bsky.social, which makes it basically free for me to host and scale

24.09.2025 17:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - jbarrow/commonforms: CommonForms dataset and models CommonForms dataset and models. Contribute to jbarrow/commonforms development by creating an account on GitHub.

As part of the paper, I'm working on releasing the dataset and FFDNet models on HuggingFace.

Those will be out in the coming days, you can follow along here: github.com/jbarrow/comm...

๐Ÿค—Paper: huggingface.co/papers/2509....
arXiv: arxiv.org/abs/2509.16506

24.09.2025 17:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Now, just because we filtered for the cleanest forms doesn't mean we got _perfect_ forms. There are still a lot of inconsistencies in how people prepare forms! In future work I'll be looking at mitigating data quality issues like these.

24.09.2025 17:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

(Note, this doesn't _just_ apply to Acrobat, it's also better than Apple Preview -- neither Acrobat nor Preview even make an attempt at checkboxes, and they're often fooled by any straight, horizontal line. Left: Acrobat, Right: FFDNet)

24.09.2025 17:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

If we train object detectors to find the form fields on these pages, we get a much cleaner set of forms than if you used Acrobat to automatically prepare your form. (Left: Acrobat, Right: FFDNet).

24.09.2025 17:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Step 1 is to filter out for the cleanest forms possible. We start with 8MM PDFs from Common Crawl, and work our way down to ~60k of the cleanest forms we can find. The results is a ~500k page dataset, called CommonForms.

24.09.2025 17:51 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

Paper thread of some work Iโ€™m *incredibly* proud of, my first single author paper!

Converting a PDF to a fillable form is a hard problem, and a lot of solutions donโ€™t work very well! In CommonForms, I show that you can train models that outperform Adobe Acrobat for <$500! ๐Ÿงต

24.09.2025 17:51 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Yeah I wonder if that statistic is flipped between the cities (though operated by the same provider โ€” Lyft โ€” I assume?)

No way that 99 out of every 100 riders in Boston have visited more than 27 stations?

31.05.2025 05:07 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Pretty sure you want that number to be lower. :p (my stats for DC ridership)

31.05.2025 05:04 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Horseshoes (and Hand Grenades) - LLM Localization is not Close, but not Close Enough - Joe Barrow TL;DRLarge Multimodal Models (LMMs) can now output bounding boxes when given images as inputs. The results are impressive, but for documents they aren't good enough for real world use, yet. The Problโ€ฆ

In which I argue that LLM-generated bounding boxes are impressive, but not that useful (yet): notes.penpusher.app/Misc/Horsesh...

23.04.2025 17:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Would absolutely love that!

11.03.2025 12:28 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

"AI TOPS our stock price"
- Nvidia, today

07.01.2025 21:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Ah, yes, that ol' familiar unit of measure "AI TOPS"

07.01.2025 20:13 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Palma as a FreeWrite

Agree, my ideal would be if you could type into an old, cheap, refurb kindle personally.

Hereโ€™s a video of a person typing into the Palma: www.reddit.com/r/Onyx_Boox/...

My experience (tablet) is that itโ€™s maybe 10s from pickup to writing โ€” wake up (3s), navigate to apps (2s), open app (3-5s)?

22.12.2024 08:56 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Iโ€™ve got one of the older, larger eInk tablets and use it for reading books/papers and taking notes. Battery after several years lasts about a week of average use, longer if I keep WiFi off.

21.12.2024 21:29 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Not necessarily hitting the price point but there are eInk mini tablets (e.g. Boox Palma at around $200) that have Android, no sim (so no phone distractions), and long battery life (thanks to the eInk and being generally underpowered). They accept keyboards, too.

21.12.2024 21:27 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Google Gemini 101 - Object Detection with Vision and Structured Outputs - Joe Barrow - Obsidian Publish This is a missing manual for how to get a simple working prototype up and running with Gemini's vision mode and structured outputs. I'm confident that manual exists elsewhere, but I haven't been ableโ€ฆ

I put together a little guide on getting started with Google Gemini -- how to make multimodal calls, get structured outputs, and image bounding boxes to build an object detector.

notes.penpusher.app/Misc/Google+...

20.12.2024 09:51 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Drake meme template.
No to: clear, concise prose
Yes to: negative vspace

Drake meme template. No to: clear, concise prose Yes to: negative vspace

18.12.2024 10:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Drake meme template.
No to: clear, concise prose
Yes to: negative vspace

Drake meme template. No to: clear, concise prose Yes to: negative vspace

18.12.2024 10:37 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Holy moly that created an extra half page of space!

18.12.2024 10:09 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A picture of a teapot to the right of a teacup, both in a flat-bottomed basket. The teacup has eta and flowers in it. There are 2 blue bounding boxes on the image, one labeled "Teapot" and one labeled "Teacup" that are over the teapot and teacup respectively.

A picture of a teapot to the right of a teacup, both in a flat-bottomed basket. The teacup has eta and flowers in it. There are 2 blue bounding boxes on the image, one labeled "Teapot" and one labeled "Teacup" that are over the teapot and teacup respectively.

Gemini 2.0 Flash is pretty good at localization in images. for an LMM (much better than GPT-4o in my experiments).

18.12.2024 09:26 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

ML history question: is there an earlier reference to pixel-only in-context (i.e. no fine-tuning) DocVQA performance than the GPT-4 announcement from OpenAI?

09.12.2024 09:45 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Aged white tea, the kind that comes in a compressed disk or ball. My favorite kind of tea, imo they taste naturally quite sweet. Yunnan Sourcing has a bunch!

25.11.2024 05:58 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

@jbarrow is following 18 prominent accounts