Rachel Wicks's Avatar

Rachel Wicks

@rewicks.bsky.social

PhD student @jhuclsp I work on multilingual data for training and evaluation. rewicks.github.io

662 Followers  |  521 Following  |  6 Posts  |  Joined: 16.11.2024  |  1.7731

Latest posts by rewicks.bsky.social on Bluesky

For anybody in the mid-atlantic region, the annual conference MASC is looking for a host next year. It's a great chance for your university to meet other researchers (and potential collaborators) in our region!

16.12.2024 21:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Could you give an example of the input/output you're looking for on which function call (encode, tokenize, etc)? And maybe which tokenizer it's inheriting from ๐Ÿ˜… (looks like maybe the OPT models inherit from a GPT2Tokenizer?)

26.11.2024 19:32 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
an compilation of adorable dog photos referencing a Simpson's meme ("Do it for her")

an compilation of adorable dog photos referencing a Simpson's meme ("Do it for her")

Happy to talk about any of these topics and more!

I will also likely end up talking a lot about my pride and joy (my dog).

20.11.2024 00:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - rewicks/ctxpro: Data and annotation toolkit for finding translation ambiguities in bitext Data and annotation toolkit for finding translation ambiguities in bitext - rewicks/ctxpro

And if you think sentence-level machine translation is good-enough, I encourage you to run your systems on our evaluation data (ctxpro, an extension to ContraPro and other similar evaluation datasets)

github.com/rewicks/ctxpro

20.11.2024 00:03 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
jhu-clsp/paradocs ยท Datasets at Hugging Face Weโ€™re on a journey to advance and democratize artificial intelligence through open source and open science.

Most recently I've released the ParaDocs dataset which reconstructs document annotations on large, parallel machine translation datasets. Contextual information is integral to machine translation, but often overlooked!

Data: huggingface.co/datasets/jhu...

20.11.2024 00:03 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

Since we're all new here, an introduction:

I'm a final-year PhD student at Johns Hopkins University (in @jhuclsp.bsky.social working with Philipp Koehn and Matt Post.

I'm largely interested in the creation and processing of high-quality, multilingual datasets for both training and evaluation.

20.11.2024 00:03 โ€” ๐Ÿ‘ 19    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
CLSP Join the conversation

Putting together a JHU Center for Language and Speech Processing starter pack!

Please reply or DM me if you're doing research at CLSP and would like to be added - I'm still trying to find out which of us are on here so far.

go.bsky.app/JtWKca2

19.11.2024 15:37 โ€” ๐Ÿ‘ 22    ๐Ÿ” 9    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 1

Cool work by @jhuclsp colleagues Rafael Rivera Soto and Nick Andrews on how AI-generated text carries unique stylistic fingerprints, enabling the detection and identification of specific language models.

Based on ICLR paper: arxiv.org/pdf/2401.06712
hub.jhu.edu/2024/11/18/a...

19.11.2024 18:17 โ€” ๐Ÿ‘ 15    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@rewicks is following 20 prominent accounts