It could also have been short for ่ๅค็ฏ
09.11.2025 12:33 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0@iseeaswell.bsky.social
TUSL discord link: https://discord.gg/z3ya9EUS2U
It could also have been short for ่ๅค็ฏ
09.11.2025 12:33 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0In 1443, king Sejongโs announcement said that Hangul imitates โๅค็ฏโ script; however there is no reference to that script anywhere else so no one knows what it is. Butโฆ doesnโt it look suspiciously like โ่ๅคโ (Mongol), giving more evidence that he was referring to โPhags-pa script?
08.11.2025 12:54 โ ๐ 2 ๐ 0 ๐ฌ 2 ๐ 0updated link: discord.gg/kDNWDhHv
06.08.2025 08:53 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0All are welcome. Please make this space your own, and add channels at will.
17.06.2025 17:46 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Our first task is to massively expand SMOL through community contribution. Anyone who contributes significant volunteer translations or post-edits will get on the Arxiv paper in the next refresh!
17.06.2025 17:46 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0this is a space for grassroots collaboration. It doubles as a directory of speakers of such languages, so you can directly talk with and collaborate with community members.
17.06.2025 17:46 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0Working on Low Resource Languages? Want to help with SMOL? join our new discord! discord.gg/YFTv7tkh
17.06.2025 17:46 โ ๐ 2 ๐ 0 ๐ฌ 2 ๐ 0@colinacherry.bsky.social
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0By the way, GATITOS has now officially moved to the SMOL Huggingface repo
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Finally, if you are a speaker of any SMOL languages, please take a look at the data and tell me what you think. Despite the quality checks, I am sure that some of the deliveries have quality issues, and I would love to understand and/or fix any affected sources. We are in this together!
19.02.2025 17:36 โ ๐ 2 ๐ 0 ๐ฌ 1 ๐ 0I would also like to thank FAIR for being an academic leader in open-sourcing work with low-resource languages, including NLLB and Flores. Thank you for helping make the academic community feel collaborative!
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0I would like to thank our native-language consultants and translators -- too numerous to name -- for their invaluable help along the way. Several entire languages in SMOL only exist because of volunteer contributions!
19.02.2025 17:36 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0SMOL also provides factuality ratings for 671 documents, with well-researched justifications.
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0SMOL has two sub-sources: SMOL-Doc, a document-level set, and SMOL-Sent, a sentence-level source. They join the token-level GATITOS to hit at three levels of granularity!
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0And thatโs just OOTB finetuningโwe know that the community can think of more clever ways to train on SMOL. Multiway parallel data is tricky to deal with without overfitting.
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0Finetuning of Gemini 2.0 Flash on SMOL yields average improvements of about +4.0 ChrF, with some languages -- including Ewe, Kokborok, Manipuri, Ga, and Dombe -- seeing gains of over +20 ChrF.
19.02.2025 17:36 โ ๐ 0 ๐ 0 ๐ฌ 1 ๐ 0SMOL comprises sentences and documents carefully selected for the biggest โBang for Buckโ ratio. It includes 6.1M translated tokensโand if youโve been in this field a while you know thatโs a lot!
19.02.2025 17:36 โ ๐ 1 ๐ 0 ๐ฌ 1 ๐ 0๐ผSMOL DATA ALERT! ๐ผAnouncing SMOL, a professionally-translated dataset for 115 very low-resource languages! Paper: arxiv.org/pdf/2502.12301
Huggingface: huggingface.co/datasets/goo...
Is Dravidian negation illa (เฎเฎฒเฏเฎฒเฏ etc.) cognate to Semitic *lฤ (ููุง etc.?) there was a lot of trade in that region so it seems likely to me.
04.02.2025 21:58 โ ๐ 0 ๐ 0 ๐ฌ 0 ๐ 0I want a โDuolingo for linguistsโthat doesnโt attempt to teach you useful everyday language but just speed runs you through the grammar and so on
24.01.2025 08:40 โ ๐ 4 ๐ 0 ๐ฌ 0 ๐ 1Google translate gives โๅจ่ฟ้๏ผ่ฏดๅพๅพๅค๏ผไฝๅฌๅฐ็ๅดๅพๅฐโ, which has the suspicious property that the passive is marked with ๅพ in the first clause and ็ in the second. More credence to the theory that they are cognate with all Irish, specifically the Cork dialect, which is the oldest and purest form
27.12.2024 18:14 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0Interesting, my brain didnโt consider that because that construction feels like an adjective rather than a verb, but it does seem to have more or less the same meaning!
27.12.2024 18:11 โ ๐ 2 ๐ 0 ๐ฌ 0 ๐ 0I often find myself wanting to use the Irish impersonal aspect in Chinese, a.k.a. ่ฏดtar/ๅฌtear for "it is spoken"/"it is heard", so โๅจ่ฟ้่ฏดtarๅพๅค๏ผ่ๅฌtearๅพๅฐโ โhere, much is said, but little is heard". (While we're at it we could give Irish a nice Rechtschreibreform imported from Cyrl: "ๅฌtear" --> "ๅฌtัar")
27.12.2024 06:54 โ ๐ 4 ๐ 0 ๐ฌ 1 ๐ 0