's Avatar

@ysawej.bsky.social

Deep Learning, NLP, multimodal models and all things neural network. Also, math guy and computer scientist.

122 Followers  |  1,243 Following  |  10 Posts  |  Joined: 30.12.2023  |  1.8902

Latest posts by ysawej.bsky.social on Bluesky

Post image Post image

A city of drivers expands, relentlessly and inevitably, into sprawl.

A city of pedestrians, cyclists, and transit-users settles, gradually and elegantly, into itself.

06.01.2025 20:57 โ€” ๐Ÿ‘ 170    ๐Ÿ” 29    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 2
Post image

Still the best two sentences on climate action.

29.12.2024 17:42 โ€” ๐Ÿ‘ 7591    ๐Ÿ” 1565    ๐Ÿ’ฌ 50    ๐Ÿ“Œ 62
Post image

An illustrated guide to never learning anything

25.12.2024 00:26 โ€” ๐Ÿ‘ 146    ๐Ÿ” 20    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 3

โ€œThe best part of waking up, is your house being quiet and your kids not tearing shit upโ€

29.11.2024 23:22 โ€” ๐Ÿ‘ 7    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

The next coffee drinker whom awakens, understands the joy, of a hot pot ready and waiting.

29.11.2024 11:34 โ€” ๐Ÿ‘ 11    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

There is no greater joy than everyone in the house being asleep as you drink your morning coffee.

29.11.2024 11:00 โ€” ๐Ÿ‘ 7305    ๐Ÿ” 442    ๐Ÿ’ฌ 304    ๐Ÿ“Œ 85
Code written with box characters used on old old software to make fake UIs

Code written with box characters used on old old software to make fake UIs

Youโ€™re still arguing about tabs vs. spaces? May I presentโ€ฆ

25.12.2024 18:37 โ€” ๐Ÿ‘ 5327    ๐Ÿ” 1293    ๐Ÿ’ฌ 157    ๐Ÿ“Œ 149

I didn't embark in machine learning thinking of it as an ideological project to disenfranchise human beings.

But we need to face reality, machine learning can easily become the driver of such change, shifting power structures.

We, actors of tech, can modulate this effect.
1/3

09.12.2024 23:55 โ€” ๐Ÿ‘ 32    ๐Ÿ” 7    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

๐Ÿ“Œ

21.12.2024 01:42 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Cartoon by singer sarcastically showing that people wrongly perceive car dependent infrastructure as โ€œpublic investmentโ€ (it isnโ€™t) and investment in public transit as a โ€œwasteful subsidyโ€ (it isnโ€™t, it has an excellent return on investment and actually saves public money).

Cartoon by singer sarcastically showing that people wrongly perceive car dependent infrastructure as โ€œpublic investmentโ€ (it isnโ€™t) and investment in public transit as a โ€œwasteful subsidyโ€ (it isnโ€™t, it has an excellent return on investment and actually saves public money).

"Public investment" vs "Wasteful Subsidy." The only problem with this clever Singer cartoon is that some people might actually not get that itโ€™s sarcastically illustrating the perception problem, NOT telling the truth. Just in case it actually needs to be said, the truth is the opposite.

18.12.2024 08:21 โ€” ๐Ÿ‘ 7019    ๐Ÿ” 923    ๐Ÿ’ฌ 127    ๐Ÿ“Œ 33
Preview
Twirling Towards Freedom Avoiding dynamic programming with dynamic programming

Dynamic programming alternatives to dynamic programming for optimal control. Replace your Bellman equation with backpropagation. www.argmin.net/p/twirling-t...

21.11.2024 15:36 โ€” ๐Ÿ‘ 41    ๐Ÿ” 3    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Bookmark

16.12.2024 17:08 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Bookmark

16.12.2024 06:33 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Bookmark

16.12.2024 06:31 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
the gordian knot of finance | special issue This special issue is hosted by the "Finance and Fiction" dossier of the b2o Review, which is edited by Arne De Boever and Mikkel Krause Frantzen.ย ย  Volume 6, Issue 1 (December 2024) Special Issue:ย Th...

boundary2 just published a forum on the gordian knot of finance, where @stefeich.bsky.social, @aminsamman.bsky.social, @thisblue.bsky.social, Janet Roitman, Dick Bryan, and myself reflect on the infuriating hold of finance on economic policy (and how to break it)

www.boundary2.org/the-gordian-...

12.12.2024 22:49 โ€” ๐Ÿ‘ 80    ๐Ÿ” 39    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 6

The tech bro fascination with eugenics is so on brand. The idea that you could make accurate, actionable predictions from individual genotypes (with all their complex, non-linear interactions) from averaged, linearly modeled population-level genomic statistics is just another big data fantasy

12.12.2024 16:13 โ€” ๐Ÿ‘ 3070    ๐Ÿ” 483    ๐Ÿ’ฌ 130    ๐Ÿ“Œ 101
jacob sansbury
@jsnnsa
having kids in the next 5 years might be a tragic mistake

every smart bio founder/scientist iโ€™ve talked to seems to think embryo editing for things like short sleeper, reduced cancer risk, etc is possible on a near term horizon

imagine having two kids a couple years apart, one is a super human and the other isnโ€™t

โ€œsorry jim, little timmy wonโ€™t get cancer, only needs 4 hours of sleep and youโ€™re normal you were just born in the wrong order ๐Ÿคท๐Ÿป โ€œ

jacob sansbury @jsnnsa having kids in the next 5 years might be a tragic mistake every smart bio founder/scientist iโ€™ve talked to seems to think embryo editing for things like short sleeper, reduced cancer risk, etc is possible on a near term horizon imagine having two kids a couple years apart, one is a super human and the other isnโ€™t โ€œsorry jim, little timmy wonโ€™t get cancer, only needs 4 hours of sleep and youโ€™re normal you were just born in the wrong order ๐Ÿคท๐Ÿป โ€œ

Tech bros know what every parent wants: children who sleep less

11.12.2024 14:39 โ€” ๐Ÿ‘ 8650    ๐Ÿ” 1211    ๐Ÿ’ฌ 515    ๐Ÿ“Œ 720

noice!

09.12.2024 10:49 โ€” ๐Ÿ‘ 14    ๐Ÿ” 5    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Insane lack of transparency from OpenAI. Saying the models people use are different than the "final" evals they released. Do better. Especially enterprises will move elsewhere because of this stuff.

06.12.2024 22:00 โ€” ๐Ÿ‘ 56    ๐Ÿ” 5    ๐Ÿ’ฌ 5    ๐Ÿ“Œ 0

Zuck is developing 2GW+ data center.

"Last big AI update of the year:
โ€ขโ  โ Meta AI now has nearly 600M monthly actives
โ€ขโ  โ Releasing Llama 3.3 70B text model that performs similarly to our 405B
โ€ขโ  โ Building 2GW+ data center to train future Llama models
Next stop: Llama 4. Let's go!"

06.12.2024 18:31 โ€” ๐Ÿ‘ 11    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Old quant types (some base model types require these):
- Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M
- Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L
- Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M
- Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M

New quant types (recommended):
- Q2_K: smallest, extreme quality loss - not recommended
- Q3_K: alias for Q3_K_M
- Q3_K_S: very small, very high quality loss
- Q3_K_M: very small, very high quality loss
- Q3_K_L: small, substantial quality loss
- Q4_K: alias for Q4_K_M
- Q4_K_S: small, significant quality loss
- Q4_K_M: medium, balanced quality - recommended
- Q5_K: alias for Q5_K_M
- Q5_K_S: large, low quality loss - recommended
- Q5_K_M: large, very low quality loss - recommended
- Q6_K: very large, extremely low quality loss
- Q8_0: very large, extremely low quality loss - not recommended
- F16: extremely large, virtually no quality loss - not recommended
- F32: absolutely huge, lossless - not recommended

Old quant types (some base model types require these): - Q4_0: small, very high quality loss - legacy, prefer using Q3_K_M - Q4_1: small, substantial quality loss - legacy, prefer using Q3_K_L - Q5_0: medium, balanced quality - legacy, prefer using Q4_K_M - Q5_1: medium, low quality loss - legacy, prefer using Q5_K_M New quant types (recommended): - Q2_K: smallest, extreme quality loss - not recommended - Q3_K: alias for Q3_K_M - Q3_K_S: very small, very high quality loss - Q3_K_M: very small, very high quality loss - Q3_K_L: small, substantial quality loss - Q4_K: alias for Q4_K_M - Q4_K_S: small, significant quality loss - Q4_K_M: medium, balanced quality - recommended - Q5_K: alias for Q5_K_M - Q5_K_S: large, low quality loss - recommended - Q5_K_M: large, very low quality loss - recommended - Q6_K: very large, extremely low quality loss - Q8_0: very large, extremely low quality loss - not recommended - F16: extremely large, virtually no quality loss - not recommended - F32: absolutely huge, lossless - not recommended

Learning about quantization suffixes while `ollama pull llama3.3` download completes (fyi, quantization for the default 70b is q4_K_M)

โ€ข make-ggml .py: github.com/ggerganov/ll...
โ€ข pull request: github.com/ggerganov/ll...

07.12.2024 01:09 โ€” ๐Ÿ‘ 24    ๐Ÿ” 5    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 0

Bookmark

06.12.2024 02:49 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Chain of Thoughtlessness? An Analysis of CoT in Planning Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution. Previous work has claimed that this can be mitigated with chain of thought...

Will be at #NeurIPS2024 Dec 10-13. Looking forward to run into everyone in #AI all at once๐Ÿคž The 2019 Vancouver one was the largest conf I ever attended--not sure how they plan to cram even more this time.. ๐Ÿ˜ฑ

[Will be at our poster on 12/11 morning openreview.net/forum?id=kPB... ]

30.11.2024 14:35 โ€” ๐Ÿ‘ 7    ๐Ÿ” 2    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ“ฃI'm hiring PhD interns for combined theory+empirical projects in: exploration in post-training, multi-task learning in autoregressive models, distillation, reasoning beyond CoT.

Apply on the link below. If you're at #NeurIPS2024, message me to chat.

jobs.careers.microsoft.com/global/en/jo...

05.12.2024 15:42 โ€” ๐Ÿ‘ 29    ๐Ÿ” 8    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions Shu-Tong Niu, Jun Du, Ruo-Yu Wang, Gao-Bin Yang, Tian Gao, Jia Pan, Yu Hu

The paper proposes a single-channel Deep Cascade Fusion of Diarization and Separation (DCF-DS) framework for back-end speech recognition, achieving first place in the realistic single-channel track of the CHiME-8 NOTSOFAR-1 challenge.

12.11.2024 10:03 โ€” ๐Ÿ‘ 2    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

๐Ÿ“ Summary:

The project is about Speaker Diarization using OpenAI Whisper, combining ASR capabilities with VAD and Speaker Embedding to identify speakers in transcriptions. The pipeline corrects timestamps, extracts vocals, and aligns timestamps for accurate speaker identification. (1/3)

10.08.2024 14:00 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I'm excited to share that we've released v1.0 of our podcast corpus, SPoRC, led by my PhD student Ben Litterer! This first dataset is a slice of time, comprising over one million episodes from May and June 2020, including transcripts, diarization, and extracted audio features.

15.11.2024 15:03 โ€” ๐Ÿ‘ 52    ๐Ÿ” 16    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 4
Post image

Thrilled to share our NeurIPS spotlight on uncertainty disentanglement! โœจ We study how well existing methods disentangle different sources of uncertainty, like epistemic and aleatoric. While all tested methods fail at this task, there are promising avenues ahead. ๐Ÿงต ๐Ÿ‘‡ 1/7

๐Ÿ“–: arxiv.org/abs/2402.19460

03.12.2024 13:38 โ€” ๐Ÿ‘ 57    ๐Ÿ” 7    ๐Ÿ’ฌ 4    ๐Ÿ“Œ 1
Preview
A Decomposable Attention Model for Natural Language Inference We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it trivially pa...

It wasn't even the transformer paper that was first to show attention was all you need. Everyone forgets how aggressively folks were working on faster alternatives to RNNs ~2016, and another paper from Google did a pure attention model first: arxiv.org/abs/1606.01933

02.12.2024 03:48 โ€” ๐Ÿ‘ 23    ๐Ÿ” 3    ๐Ÿ’ฌ 3    ๐Ÿ“Œ 1

@ysawej is following 20 prominent accounts