I love this!
08.08.2025 07:43 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0@matteodic.bsky.social
Researcher in Corpus Linguistics and Digital Humanities @ UniMoRe. Corpus and Cognitive Linguist, Python & R user. Overall nerd (posts not representative of employers). Website: https://infogrep.it Online materials: https://catlism.github.io
I love this!
08.08.2025 07:43 โ ๐ 3 ๐ 1 ๐ฌ 0 ๐ 0Screenshot of the app showing a page from a book + different views of existing and new ocr.
Many VLM-based OCR models have been released recently. Are they useful for libraries and archives?
I made a quick Space to compare VLM OCR with "traditional" OCR using 11k Scottish exam papers from @natlibscot.bsky.social
huggingface.co/spaces/davanstrien/ocr-time-capsule
rvya redeemed, thanks
26.07.2025 17:10 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0b83m redeemed, thanks a lot
26.07.2025 17:08 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0fsdc redeemed, thanks!
26.07.2025 17:07 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 04-panel comic. (1) [Person 1 with ponytail flanked by person with short hair and another person speaking into microphone at podium] PERSON 1: In the early 2010s, researchers found that many major scientific results couldnโt be reproduced. (2) PERSON 1: Over a decade into the replication crisis, we wanted to see if todayโs studies have become more robust. (3) PERSON 1: Unfortunately, our replication analysis has found exactly the same problems that those 2010s researchers did. (4) [newspaper with image of speakers from previous panels] Headline: Replication Crisis Solved
Replication Crisis
xkcd.com/3117/
I believe it is worth interrogating the fundamental forces re-shaping our information spheres away from liberal democracy towards myth, manipulation and magical thinking empowering autocracy and nihilism.
Hereโs how it all falls apartโa ๐งต in 6 figures โฌ๏ธ
www.protagonist-science.com/p/how-social...
Stuffing ai into everything โisnโt just a forecast, itโs a libidinal fantasy โ a capitalist dream of replacing relationships with code and scalable software, while public institutions are gutted in the name of โinnovation.โโ
06.07.2025 14:30 โ ๐ 180 ๐ 51 ๐ฌ 4 ๐ 6"The problem with AI isn't that it can do your job. It can't. The problem with AI is that your MBA-brained boss's boss doesn't know how your job works and thinks AI can do your job at fractions of a penny on the dollar, and hears the siren song of 'maximize shareholder value'."
MBA-brain is real.
Is ๐ตโ๐ซ one token or two?
To a human, it's one. To a corpus tool, itโs often split (๐ต + ๐ซ).
And ๐๐๐๐๐๐ โ online.
This preprint shows how emojis & homoglyphs challenge tokenisation and distort linguistic evidence.
๐ arxiv.org/abs/2507.01764
#Emoji #Homoglyphs #CorpusLinguistics #AcademicSky #LangSky
wow, many thanks!
02.07.2025 14:11 โ ๐ 1 ๐ 0 ๐ฌ 0 ๐ 0Fellow academics, can anyone help with obtaining an #endorsement on arXiv?
I have a preprint I'd like to upload to Computer Science > Computation and Language (cs.CL), but need someone to endorse my account.
Here's the endorsement link: arxiv.org/auth/endorse...
#corpuslinguistics #linguistics
3jkl redeemed, thanks
24.06.2025 07:08 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0y4ha claimed, thanks!
24.06.2025 07:06 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Memes can serve as strong indicators of coming mass violence
15.06.2025 18:22 โ ๐ 2 ๐ 1 ๐ฌ 0 ๐ 1Finally, a Replacement for BERT (Blog about ModernBert)
huggingface.co/blog/modernb...
๐The scientific program is out!
Click on the link below to have a look at the speakers and the workshops of our Summer School!
โฌ๏ธโฌ๏ธโฌ๏ธ
www.summerschooldigitalhumanities.unimore.it/2025-edition...
Our Summer School is beginning now with the institutional greetings.
03.06.2025 07:32 โ ๐ 0 ๐ 1 ๐ฌ 0 ๐ 0Happy Graduation to all students and staff!
29.05.2025 10:16 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0Postdoc position open in Zurich -- Prof. Martin Tomasik and I have a joint SNF project on interpretable neural network approaches for large scale, complex item / temporal structure, online learning / cognitive development data.
Please retweet.
tinyurl.com/PostdocGNNSNF
xsds redeemed.
now I just need to finish packing and I'm ready for the woods!
fy8n redeemed, perfect scent!
16.05.2025 16:26 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 03eyg redeemed.
the healing begins
โน๏ธ Ending the experiment too early
๐ฏ Running experiments until you get a hit
๐ Cherry-picking your results
๐ง Tweaking your data
โ Not adjusting for multiple comparisons
www.nature.com/articles/d41...
Academics and Universities have got to formulate a coherent approach to AI and guide their students. Cries of despair and digitally illiterate pronouncements will not reverse the effects of technical innovations.
09.05.2025 06:22 โ ๐ 8 ๐ 2 ๐ฌ 0 ๐ 0Designing a Good Research Practice Roadmap
- a presentation by Fiona Ramage @fionar.bsky.social
youtu.be/dnzRQPOxz1o?...
The event was organised by Edinburgh ReproducibiliTea
With Canadaโs election just days away, the continued appearance of deepfake ads reveals a serious flaw in Metaโs ad-review system. If fraudsters can repeatedly bypass detection, it suggests the platformโs current safeguards are not equipped to catch even basic forms of manipulation...
26.04.2025 12:27 โ ๐ 20 ๐ 13 ๐ฌ 2 ๐ 0"Twitter became 4chan, then the 4chanified Twitter became the United States government. Its usefulness as an ammo dump in the culture war was diminished when they were saying things you would now hear every day on Twitter," @bencollins.bsky.social told WIRED.
22.04.2025 15:30 โ ๐ 464 ๐ 114 ๐ฌ 5 ๐ 4Delighted to share my newly published review of "Data Analytics for Discourse Analysis with Python" (Tay 2024). The work addresses urgent, disciplineโwide concerns in #linguistics, and writing about it was both a privilege and a joy.
authors.elsevier.com/a/1kyHu1L-nh...