How do I get Bluesky to show me less politics and more AI/ML things? I have followed mostly people who work in AI/ML
09.03.2025 11:12 β π 3 π 0 π¬ 0 π 0@scottcondron.bsky.social
Working at wandb on Weave, helping teams ship AI applications
How do I get Bluesky to show me less politics and more AI/ML things? I have followed mostly people who work in AI/ML
09.03.2025 11:12 β π 3 π 0 π¬ 0 π 0Maybe they could tell you what theyβve learned like βit seems youβre interested in staying up to date with recommender systems, want to add that to your feed?β
09.03.2025 10:31 β π 0 π 0 π¬ 0 π 0Thanks Scott! Very exciting
09.03.2025 09:34 β π 0 π 0 π¬ 0 π 0Prompts within a complex system are brittle
I have seen some teams be successful by replacing prompts with smaller, more deterministic components and improved reliability with fine-tuning. Anyone else have success with this approach?
Seems to help a lot with agents
I collected some folk knowledge for RL and stuck them in my lecture slides a couple weeks back: web.mit.edu/6.7920/www/l... See Appendix B... sorry, I know, appendix of a lecture slide deck is not the best for discovery. Suggestions very welcome.
27.11.2024 13:36 β π 113 π 17 π¬ 3 π 3If youβre taking time to enjoy your family and not building with LLMs, youβre ngmi.
America is cooked
LLM app dev broke our comparison tools because tiny diffs can cause large behaviour change.
At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores..
wandb.me/weave
The art of how to refer to model behaviour with tasteful non-person metaphors. Say βstochasticβ youβre in one camp, say βemergentβ youβre in another.
Itβs a minefield out there people
People ask for an iOS app but maybe we shouldnβt as it would cause more misery on-the-go
23.11.2024 10:03 β π 0 π 0 π¬ 0 π 0Being logged into wandb on your phone is a recipe for misery
20.11.2024 04:09 β π 74 π 4 π¬ 10 π 0Would be happy to schedule a chat to hear more about your experience with W&B
23.11.2024 09:48 β π 0 π 0 π¬ 0 π 0hey, sorry to hear your complaints about wandb. Have you seen the big response in that issue with options? Tables is built on parquet so itβs difficult from an architectural perspective. With the recent release Weave, there may be a path forward by using the weave backend instead of parquetβ¦
23.11.2024 09:48 β π 0 π 0 π¬ 1 π 0Agreed, fellow competitor.
Itβs the biggest hurdle I see from teams trying to build GenAI features
We need tools to lower the barrier to entry with LLM judges, existing benchmarks, manual annotation as eval collection, synthetic data⦠anything else?
I think these small models are not for day to day use but instead, theyβre for b2c applications of LLMs, where itβs cost/latency prohibitive to use anything else
22.11.2024 08:57 β π 0 π 0 π¬ 0 π 0- it really works to teach an LLM about your tool, thank you long context!
Link for the curious:
github.com/wandb/weave/...
- it's much better for scraping if the links included are .md files
- you need to be clear which files to include and which are optional because context blows up quickly
- automating creating your docs' llms.txt is pretty easy
Lessons from creating an llms.txt file
An llms.txt file is a way to tell a LLM about your website. In the .txt file, you include links to other files with info to learn more.
- the llms.txt file isn't the file you send to an LLM, you use it to generate a llms .md file
from @hamel.bsky.socialβs hamel.dev/blog/posts/llmβ¦
We're building LLM / Human "scorers" in @weightsbiases.bsky.social to have the same data model for this reason
Your human and LLM judges should follow the same criteria.
Then, you can transition from manual to automated evaluation once you have inter-annotator agreement between LLM & human. You now have a faster iteration speed and the annotator can focus on finding edge cases!
Put glue on pizza
20.11.2024 08:53 β π 1 π 0 π¬ 1 π 0The most bizarre AI interview I've ever done was at wandb when as usual I asked a candidate to build an AI classifier in any language/framework of their choice..
And they nonchalantly said "I'll write it in Redstone", to which I almost let loose a chuckle until...
Claude defaults to concise responses when there's high demand, clever way to smooth peaks
19.11.2024 20:21 β π 5 π 0 π¬ 0 π 0We've been working on just that at @weightsbiases.bsky.social with Weave!
Weave is a lightweight llm tracing and evaluations toolkit, that focuses on letting you iterate fast and make sure that your production LLM based application is not degrading when you change prompts or models!