There is no best VLM OCR model - rankings can flip completely by document type.
I built ocr-bench: run open OCR models on YOUR documents, get a per-collection leaderboard.
VLM-as-judge with Bradley-Terry ELO, all running on @hf.co. No local GPU needed.
i'm trying out the novel writing project with Claude in Claude Code, using Pangram to break it out of writing in a clearly identifiable AI-writing style. it's going... interesting so far. i despaired at the beginning but am now cautiously optimistic. not so much at the structural level though.
this is cool
tbh all i want is an LLM that sits atop my Zotero library and lets me talk to it tho
Final CFA for the 8th Scientific Understanding and Representation (SURe) annual workshop, which will take place May 27-29, 2026, at the IFIS PAN in Warsaw.
Submission deadline: 20 January 2026.
More info: shorturl.at/AUoye
@philsci.bsky.social @eenphilsci.bsky.social @epsaphilsci.bsky.social
Analytic philosophy can be distinguished from literary criticism with 90-95% accuracy via syntax alone. Moreover, a classifier trained to separate them in early C20 does better predicting future separations than a C21 one predicts past ones, suggesting philosophy syntax narrows/specializes in ~C21.
OpenAlex intégré au Web of Science, ou la capture du travail des “commoners” | carnetist.hypotheses.org/2572
Three different ways to represent colo(u)r. Work in progress, inspired by an old post by Kat Zhang / The Poet Engineer.
"there is a part of human intelligence which operates in a continuous generalization of the space of words, and other parts entirely which do things which are less well understood" is a perfectly reasonable position which apparently has no adherents
Excited to share my latest publication, "Generative Aesthetics: On formal stuckness in AI verse." It's published in a special issue in the Journal of Cultural Analytics, expertly edited by Tess McNulty and Laura Chapot, on "Computation and Form, Reconsidered."
culturalanalytics.org/article/1448...
Tomorrow we will have a keynote from Charles Pence (UC Louvain).
Thanks to the Dutch Philosophy Research School (OZSW) for supporting this event, and @mnoichl.bsky.social for organizing this with me!
Gregor Betz (KIT) kicking off our "Data Driven Philosophy" Hackathon in Utrecht with his talk: "Doing Philosophy with and for LLMs". Besides input about the state of research and new directions, we're spending three days kicking off new projects.
i am going to try to give a framework of my own understanding which laypeople can understand.
Updated & turned my Big LLM Architecture Comparison article into a video lecture.
The 11 LLM archs covered in this video:
1. DeepSeek V3/R1
2. OLMo 2
3. Gemma 3
4. Mistral Small 3.1
5. Llama 4
6. Qwen3
7. SmolLM3
8. Kimi 2
9. GPT-OSS
10. Grok 2.5
11. GLM-4.5/4.6
www.youtube.com/watch?v=rNlU...
For the first episode of Ping Pong Philosophy I had the absolute pleasure to speak with Greg Restall, one of the most renowned philosophical logicians and absolutely great guy to have a chat with. Thank you for your time, Greg, I had a blast.
We are also on Spotify!
Christopher Colón Lugo uses 3D U-net to capture patterns in the Game of Life
#DistributedCiphers
#ALIFE2025
#Postdoc at Technische Universität Berlin in digital humanities & history/philosophy/sociology of science #philsci #STS. ERC project investigates digital communication within the ATLAS collaboration at CERN
Deadline: October 13, 2025
www.jobs.tu-berlin.de/en/job-posti...
#PhilJobs
Upshot:
NNES report to need twice as long to read English-language papers and to prepare English presentations. Even among highly proficient NNES (C1–C2 level), ~60% report having avoided asking questions at events due to concerns about their English (compared to 16% of NES). #philsky
How do literary communities actually form?
@maria-lev.bsky.social analyzes the networks of collaboration and aesthetic affinity that are documented through cultural events — e.g. readings, book launches, festivals. These real-world networks often remain invisible in text-based literary history.
In a new work with Joseph Rich and Conrad Oakes we tackle the problem of how to best organize alluvial plots. We formalize two optimization problems and develop a solution for them based on the neighbornet algorithm, implemented in the program wompwomp: github.com/pachterlab/w...
Had a great time last week at #epsa2025! I've put the poster up here, if anyone wants to take a closer look: maxnoichl.eu/blog/2025/ep...
I’m especially proud of this article I wrote about Gaussian Processes for the Recast blog! 🥳
GPs are super interesting, but it’s not easy to wrap your head around them at first 🤔
This is a medium level (more intuition than math) introduction to GPs for time series.
getrecast.com/gaussian-pro...
Last year I met a bunch of great researchers who work with high-dimensional data at a Dagstuhl seminar. This week we put out a preprint about the history and philosophy of low-dimensional embedding methods, their applications, their challenges, and their possible future arxiv.org/abs/2508.15929
Updated edition (August 2025) of the coverage table of the major bibliometric databases (millions of records).
GS reindexing period
"Personally, I found this hyperstimulating," he said exultingly.
@mnoichl.bsky.social and I are organizing two workshops where you can learn about and try out digital methods for philosophy:
12th-13th September in Düsseldorf, Keynotes @cherfeld.bsky.social & Adrian Wüthrich
16-18th October in Utrecht, Keynotes Gregor Betz & Charles Pence. Register until 31.8.
What are your favorite recent papers on using LMs for annotation (especially in a loop with human annotators), synthetic data for task-specific prediction, active learning, and similar?
Looking for practical methods for settings where human annotations are costly.
A few examples in thread ↴
New preprint! Have you ever tried to cluster text embeddings from different sources, but the clusters just reproduce the sources? Or attempted to retrieve similar documents across multiple languages, and even multilingual embeddings return items in the same language?
Turns out there's an easy fix🧵
Ich habe ein gewisses Interesse daran, dass diese Stelle gut besetzt wird. Bewerbt Euch!
https://stellen.uni-muenster.de/jobposting/aa2e6b033a1691c1c9bccfd7af876d06a24ff1690