Aaron Tay's Avatar

Aaron Tay

@aarontay.bsky.social

I'm librarian + blogger from Singapore Management University. Social media, bibliometrics, analytics, academic discovery tech.

3,183 Followers  |  327 Following  |  1,972 Posts  |  Joined: 05.07.2023  |  1.8846

Latest posts by aarontay.bsky.social on Bluesky

Preview
Implications of AI-powered academic search - Recorded Talk Quick catch-up to what I have been writing and thinking about

[Blogged] Implications of AI powered academic search open.substack.com/pub/aarontay...

15.10.2025 17:00 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Hmm

15.10.2025 12:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We get copilot and gemini 2.5 pro. Libraries in university

15.10.2025 12:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Yes.

14.10.2025 14:44 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Wiley Launches Interoperable Platform to Power Scientific Discovery in World's Leading AI Technologies Wiley AI Gateway brings trusted scholarly content directly into Anthropic's Claude, AWS Marketplace, Mistral AI's Le Chat, and Perplexity Wiley (NYSE: WLY), a global leader in authoritative content an...

"Unlike closed ecosystems that require researchers to adopt proprietary tools, Wiley AI Gateway prioritizes intentional interoperability, seamlessly integrating scholarly content and data subscriptions with today’s leading AI platforms." newsroom.wiley.com/press-releas...

14.10.2025 13:34 β€” πŸ‘ 0    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

Because of this, you can be assured the views in my blog post are truly mine. That said, I generally try not to be too negative, so if a product doesn't meet my satisfaction, I generally don't mention it.

14.10.2025 13:33 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Just to clarify (since it comes up now and then): I don’t accept compensation or sponsorships for product mentions on my blog. If I write about something, it’s because I genuinely find it interesting or valuable for my readers. I value my independence too much to do otherwise.

14.10.2025 13:27 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Wild guess is besides query expansion with boolean it also does NER so if you type in natural language find me BOOK with TITLE XYZ it will automatically turn on title search and filter to book?

14.10.2025 12:11 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Found this on mailing list. Still hard to visualize what's going on

14.10.2025 11:29 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

To be more exact everyone will ingest the free set like openalex, semantic scholar which is broad but lacks depth as it lacks full text and increasingly abstracts . Competition will be around the rest...

14.10.2025 11:26 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Oh. Yeah then might not be comparable. But even then how did you get Google Scholar figure?

14.10.2025 04:14 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I suppose you could do complicated models to predict % chances peer reviewer might accept... then model probably ends up recommending junior peeps?

14.10.2025 04:12 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I am just wildly speculating. I dont understand MCP that well and even more the dynamics between discovery service providers and content providers. I assume most of the free open stuff EG crossref/openalex will be locally indexed and won't be used via MCP in most academic discovery tools

13.10.2025 15:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

For others who don't have such big indexes i think chances are slim content providers will allow them to ingest content into their systems as this risks losing control (a much greater threat in 2020s vs 2010s). MCP might be a way around it (3)

13.10.2025 15:48 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 2    πŸ“Œ 1

As of 2025 academic "ai discovery" is essentially RAG over localised central indexes. The question is whether the existing central index eg Exlibris CDI, EDS will secure agreement from content providers to allow RAG over. Already Elsevier + others have opted out of Summon/Primo RA (2)

13.10.2025 15:46 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

Looking at MCP (Model Context Protocol) again. It occurs to me this is kinda of the repeat of the 2010s Academic central index vs Federated real time search debate with the former camp led by Summon winning out. This time though I think we unlikely to see a repeat (1)

13.10.2025 15:39 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
How Prophy Matches Manuscripts to Expert Reviewers: The Core Recommendation Engine Discover how Prophy’s core recommendation engine rapidly matches manuscripts to expert reviewers using advanced semantic analysis and a database of 87M researchers.

blog.prophy.ai/how-prophy-m... interesting

13.10.2025 15:02 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
Get it in, track it down, follow up Three essential moves to using AI for verification and contextualization, with a bit of LLM-specific guidance

People always aske me "What does SIFT for AI look like?" Meaning, what is the minimal set of habits you need to teach students to use it effectively for exploration of claims online? It's taken a couple years to get here but this is a start at a response mikecaulfield.substack.com/p/get-it-in-...

13.10.2025 02:27 β€” πŸ‘ 23    πŸ” 6    πŸ’¬ 0    πŸ“Œ 1

Any more details? Eg. Methods of estimation etc

13.10.2025 12:02 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our periodic review of the coverage of the major bibliographic databases (October 2025)
GS no longer the largest due to the huge increase of OpenAlex. New data for Xueshu

13.10.2025 07:33 β€” πŸ‘ 20    πŸ” 15    πŸ’¬ 2    πŸ“Œ 2

Note that this analysis applies to before the latest openalex "Walden" rebuild update still in beta. Quick check shows quite different results.

11.10.2025 10:19 β€” πŸ‘ 5    πŸ” 3    πŸ’¬ 1    πŸ“Œ 0

As i learn more on the nuts and bolts of IR eg HNSW, ivf/pq its interesting but for most end users it isnt useful except maybe it makes you understand why its somewhat tricky to implement prefilter + dense embeddings particularly if it isnt setup initially for it. (5)

11.10.2025 09:37 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

It also makes a subtle distinction between sparse vector vs sparse "representation". A sparse vector is as you expect most values are zero and usually high dimensional. The sparse representation according to the book refers to the way you store the vector. Eg inverted index/COO/CSR formats. (4)

11.10.2025 09:34 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Also a very nice way of decomposing user intent such that system needs (a) content understanding (b) domain understanding and (c) user understanding (3)

11.10.2025 09:26 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

For example I was always somewhat confused when it comes to search vs recommendations but the book frames it as a spectrum which is very nice way to look at it (2)

11.10.2025 09:22 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Finished the first 3 chapters on lexical search and the last 3 on LLM embeddings + RAG. Mostly covering things i knew but I like some of the overall conceptual framework (1)

11.10.2025 09:20 β€” πŸ‘ 8    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Next piece. Things i still dont quite fully grasp about the topic.

11.10.2025 09:16 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Really curious about the new natural language search in Primo NDE (not Primo Research Assistant). Hopefully they account for the fact a large proportion of queries in Primo are known item searchs not subject searches

10.10.2025 15:18 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

It's ironic to see 2025 publications talking about academic ai search engines saying things like Elicit uses GPT3 and Undermind.ai uses arxiv. (Might want to check if there are more updated sources).

08.10.2025 14:49 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Sorry. All virtual seats for Mike's session are now over. But we still have seats for other events in this series. eventregistration.smu.edu.sg/event/TTT202...

07.10.2025 09:13 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@aarontay is following 20 prominent accounts