I must also add that Iβm assuming thereβs no breakthrough architecture/pre-training/post-training method that pushes us to start everything from scratch. Iβm simply asking about the decision factors in greenlighting such a full restart in the current status quo.
07.01.2025 02:47 β
π 1
π 0
π¬ 0
π 0
Are there any good pointers on when/why one would decide to run pre-training from scratch (and follow it with post-training ofc) to create a fresh LLM? Is it simply about shifting the knowledge cutoff or more than that? Do we know how/if that happens nowadays? What are the deciding factors?
07.01.2025 02:40 β
π 0
π 0
π¬ 1
π 0
i was annoyed at having many chrome tabs with PDF papers having uninformative titles, so i created a small chrome extension to fix it.
i'm using it for a while now, works well.
today i put it on github. enjoy.
github.com/yoavg/pdf-ta...
05.01.2025 22:22 β
π 98
π 22
π¬ 5
π 1
Given how bad I am at it, itβs out of my league too; still fun though π
06.12.2024 05:54 β
π 1
π 0
π¬ 0
π 0
Were you doing the NYTβs crossword? Thatβs how it happened for me. Also, if you want a bonus one, βdoeβ :)
05.12.2024 01:33 β
π 1
π 0
π¬ 1
π 0
fβ as in fine-tuned from f, not the derivative of f π
03.12.2024 04:53 β
π 1
π 0
π¬ 0
π 0
I got confused there yoo. Maybe something like βfurther condition the modelβs outputβ (instead of update the model)?
So if the model is f(x), before the dashed line itβs fβ(x), and after that itβs f(x|prompt/context).
03.12.2024 04:49 β
π 2
π 0
π¬ 1
π 0
USC NLP folks are on Bluesky!
Follow my amazing colleagues here
go.bsky.app/KUwSZ6W
12.11.2024 17:44 β
π 17
π 5
π¬ 3
π 2