RJ Skerry-Ryan's Avatar

RJ Skerry-Ryan

@rjsr.bsky.social

๐ŸŒฎ๐Ÿค– Speech and language modeling researcher. Principal SWE @ Google Deepmind. โ™Š๐ŸŒŠ Gemini Audio and Astra core team. http://rjryan.me/ https://google.github.io/tacotron

172 Followers  |  429 Following  |  16 Posts  |  Joined: 09.09.2023  |  1.9173

Latest posts by rjsr.bsky.social on Bluesky

Relational Cognition Lab

I have recently launched the relational cognition lab at UC Irvine: relcoglab.org!
We study learning and memory in mind, brains and machines. I am open to collaborations and hiring a lab technician (lab manager/junior specialist). Job ad & application here: recruit.ap.uci.edu/JPF09400.

12.01.2025 20:46 โ€” ๐Ÿ‘ 97    ๐Ÿ” 27    ๐Ÿ’ฌ 6    ๐Ÿ“Œ 1

Huh, interesting

01.01.2025 05:57 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

(small explosives)

01.01.2025 05:44 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Explosives Delivered by Drone โ€“ DRONE DELIVERY OF CBNRECy โ€“ DEW WEAPONS Emerging Threats of Mini-Weapons of Mass Destruction and Disruption ( WMDD)Share on Twitter

I'm just wondering how you weaponize them (I assume) how small a payload they can carry, and mounting a gun on them would probably never work.

Sounds like dropping explosives is the easiest way to weaponize: kstatelibraries.pressbooks.pub/drone-delive...

01.01.2025 05:43 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0

Earnest Q: What's the connection between this (very impressive) blinkenlight drone swarm (srsly, I am dying with envy of whoever got to build this) and military applications of drones? The drones the US has been bombing people with for decades now have nothing to do with this type of drone, no?

01.01.2025 05:36 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
A die photo of the Pentium chip. An arrow points to a location on the die, with the text "FDIV bug". The chip itself has a complex pattern of circuitry with brownish rectangles and lines of various sizes.

A die photo of the Pentium chip. An arrow points to a location on the die, with the text "FDIV bug". The chip itself has a complex pattern of circuitry with brownish rectangles and lines of various sizes.

In 1994, a math professor discovered that Intel's Pentium chip sometimes gave the wrong answer when dividing. Fixing this "FDIV" bug cost Intel $475 million. I analyzed the Pentium chip and found the bug. 1/N

28.12.2024 18:57 โ€” ๐Ÿ‘ 366    ๐Ÿ” 85    ๐Ÿ’ฌ 7    ๐Ÿ“Œ 5

Se Jin Park, Julian Salazar, Aren Jansen, Keisuke Kinoshita, Yong Man Ro, RJ Skerry-Ryan
Long-Form Speech Generation with Spoken Language Models
https://arxiv.org/abs/2412.18603

25.12.2024 05:15 โ€” ๐Ÿ‘ 8    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
State-of-the-art video and image generation with Veo 2 and Imagen 3 Weโ€™re rolling out a new, state-of-the-art video model, Veo 2, and updates to Imagen 3. Plus, check out our new experiment, Whisk.

Here's Veo 2, the latest version of our video generation model, as well as a substantial upgrade for Imagen 3 ๐Ÿง‘โ€๐Ÿณ๐Ÿšข

(Did I mention we are hiring on the Generative Media team, btw ๐Ÿ‘€)

blog.google/technology/g...

16.12.2024 17:35 โ€” ๐Ÿ‘ 60    ๐Ÿ” 17    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1
Preview
DeepMind

๐Ÿšจ๐ŸšจMy team @GoogleDeepMind in Tokyo is looking for a talented research scientist to work on audio generative models! ๐Ÿ”Š
Please consider applying if you have expertise in the domain or related areas such as multimodal models, video generation ๐Ÿ“น, etc.
boards.greenhouse.io/deepmind/job...

06.12.2024 07:09 โ€” ๐Ÿ‘ 5    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Nice! I'm surprised because the training window size is only 2.5 seconds, and the left context of the transformer is much longer than that, right?

03.12.2024 06:07 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the a...

Super cool! We did something similar for speech synthesis: arxiv.org/abs/2011.03568

03.12.2024 05:53 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Very nice! Does it generalize to arbitrary length inputs?

03.12.2024 05:40 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Don't forget text-to-speech!

28.11.2024 21:49 โ€” ๐Ÿ‘ 4    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
noteuclaise/bluesky_1M_metaposts ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Hm I think I was confused and thought the author of this dataset was banned:

huggingface.co/datasets/not...

That dataset is the instance of "You object to the use of your posts as data? I will make a dataset specifically of you."

28.11.2024 20:42 โ€” ๐Ÿ‘ 3    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Maybe a bad analogy, IIUC a bunch of people said "I don't want you to do X.", some subset of those people were uncivil about it, and the response was to do X.

28.11.2024 19:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Totally true for domains where you are a good enough verifier (and you don't get lulled into a false sense of security with it), but a problem I've seen is where you end up trusting it in domains you're not a verifier because it tends to be correct in domains you are a verifier in.

28.11.2024 17:48 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
A picture of Alexander Fleming.

A picture of Alexander Fleming.

Thanksgiving shout out to this legend, as I wait in line at the pharmacy to pick up antibiotics for the second time in 2 weeks.

28.11.2024 17:39 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Seems like bullying tbh.

I have to work hard to teach my kids that just because someone hits you doesn't mean you get to hit them back.

28.11.2024 17:35 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defi...

f-GAN is an absolute banger: arxiv.org/abs/1606.00709

The theory that developed around GANs is rich and (for me) was transformative.

28.11.2024 05:37 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, Soroosh Mariooryad, Matt Shannon, Julian Salazar, David Kao
Very Attentive Tacotron: Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech
https://arxiv.org/abs/2410.22179

30.10.2024 09:30 โ€” ๐Ÿ‘ 0    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Arxiv sharing reminder

pdf โŒ
abs โœ…

26.11.2024 08:42 โ€” ๐Ÿ‘ 250    ๐Ÿ” 41    ๐Ÿ’ฌ 9    ๐Ÿ“Œ 2

Related: bsky.app/profile/rjsr...

26.11.2024 19:32 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image 24.11.2024 01:23 โ€” ๐Ÿ‘ 4    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 1

@rjsr is following 20 prominent accounts