Kohei Watanabe 's Avatar

Kohei Watanabe

@koheiw.bsky.social

I analyze textual data for a living and fun using R packages that I develop. Visit https://blog.koheiw.net for more details.

61 Followers  |  16 Following  |  4 Posts  |  Joined: 12.11.2024  |  1.3706

Latest posts by koheiw.bsky.social on Bluesky

Preview
Align word vectors of multiple Word2vec models I have been developing a new R package called wordvector since last year. I started it as a fork of the Word2vec package but made several important changes to make it fully compatible with quanteda…

I released the wordvector package v0.5.0. It is rapidly getting better and different from the original Word2vec package. Please read "Align word vectors of multiple Word2vec models" about the new function blog.koheiw.net?p=2299 #rstats #quanteda

24.05.2025 02:50 — 👍 2    🔁 0    💬 0    📌 0

Nice paper showing just *how* irreprodroducible research with proprietary generative LLMs is. Luckily there are open source alternatives (and they are very easy to use too!)

19.12.2024 07:23 — 👍 22    🔁 9    💬 1    📌 0
Preview
A new topic model for analyzing imbalanced corpora I have been developing and testing a new topic model called Distributed Asymmetric Allocation (DAA) because latent Dirichlet allocation (LDA) takes a long time to fit to a large corpus, but does no…

You tired seededLDA already, but its recent version can capture less frequent topics better with adjust Dirichlet priors. I am curious how it works. Please read blog.koheiw.net?p=2233

19.12.2024 00:30 — 👍 1    🔁 0    💬 1    📌 0

A few days ago, I received an email from a researcher asking if text analysis is becoming irrelevant because of AI... blog.koheiw.net?p=2254 #text-as-data #quanteda

13.12.2024 00:16 — 👍 1    🔁 0    💬 0    📌 0
Preview
A new topic model for analysis imbalanced corpus I have been developing and testing a new topic model called model Distributed Asymmetric Allocation (DAA) because latent Dirichlet allocation (LDA) takes a long time to fit to a large corpus but do…

If you think the number of topics, k, is the only important parameter for topic models, you need to read this post and the research paper. blog.koheiw.net?p=2233 I created a new model to optimize the Dirichlet priors to analyze imbalanced corpus more accurately. #rstats #quanteda

23.11.2024 10:02 — 👍 12    🔁 2    💬 0    📌 0

@koheiw is following 16 prominent accounts