Christoph Minixhofer's Avatar

Christoph Minixhofer

@cdminix.bsky.social

Post-doc @ University of Edinburgh. Working on Synthetic Speech Evaluation at the moment. ๐Ÿ‡ณ๐Ÿ‡ด Oslo ๐Ÿด๓ ง๓ ข๓ ณ๓ ฃ๓ ด๓ ฟ Edinburgh ๐Ÿ‡ฆ๐Ÿ‡น Graz

104 Followers  |  205 Following  |  121 Posts  |  Joined: 25.08.2023  |  2.1169

Latest posts by cdminix.bsky.social on Bluesky

Currently on three different papers using BWS with three different methods to run the listening tests due to different Universities/first authors - itโ€™s time we had an open-source framework for listening tests that is well maintained and easy to use. If you know any let me know!

10.02.2026 20:21 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Turns out itโ€™s an oral. Looking forward to Rio ๐Ÿ‡ง๐Ÿ‡ท

10.02.2026 20:10 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
GitHub - ttsds/ttsdb: A database for modern, open-source TTS systems. A database for modern, open-source TTS systems. Contribute to ttsds/ttsdb development by creating an account on GitHub.

A pre-release of *ttsdb*, my collection of SOTA TTS models, is out now - github.com/ttsds/ttsdb

The aim is to provide a simple cli and collection of python packages to make it easy to synthesise speech across a variety of models. Docs and website coming soon!

04.02.2026 06:25 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
TTSDS2: Resources and Benchmark for Evaluating Human-Quality Text... Evaluation of Text to Speech (TTS) systems is challenging and resource-intensive. Subjective metrics such as Mean Opinion Score (MOS) are not easily comparable between works. Objective metrics are...

๐Ÿงช My paper on Text-to-Speech evaluation using distributional measures has been accepted to ICLR 2026! ๐ŸŽ‰
openreview.net/forum?id=uGa...
In my opinion, we should focus much more on the distributions of synthetically generated speech, and we showed this correlates highly with human ratings.

26.01.2026 15:02 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
List of pangrams There used to be a page on Wikipedia listing pangrams in various languages. This was deleted yesterday. Pangrams can be occasioanlly useful for designers, so Iโ€™ve resurrected the page of here, pretty ...

Just came across this wonderful blogpost on pangrams in many languages. If only there was a similar collection full of phonetic pangrams!
clagnut.com/blog/2380/#P...

25.01.2026 13:10 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Preview
Text-to-speech voices as human remains โ€” Centre for Technomoral Futures Speech technology researchers worldwide are working on improving the smoothness and fidelity of text-to-speech towards the goal of accessible communication for all. However, TTS models are also being ...

www.technomoralfutures.uk/news-databas...
Happy Monday! Here's me thinking about speech tech, voices, and death thanks to the lovely @technomoralfutures.bsky.social

content notes: discussion of death, grief, online abuse

24.11.2025 12:24 โ€” ๐Ÿ‘ 9    ๐Ÿ” 4    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Quantifying the Distributional Distance between Synthetic and Real Speech (Pre-Viva Talk)
YouTube video by Christoph Minixhofer Quantifying the Distributional Distance between Synthetic and Real Speech (Pre-Viva Talk)

Passed my viva yesterday ๐Ÿฅณ
Here's the pre-viva talk if anyone's interested, my work was/is about quantifying the distributional distance between real and synthetic speech.
youtu.be/Ii-6buwAoCg

19.11.2025 11:18 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

First time going to a big gym in the UK, and somehow the practice of saying a little โ€œsorryโ€ as you go past someone cracks me up in that setting.

11.11.2025 10:23 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Fill in the blank:

"My p-value is smaller than 0.05, so..."

Wrong answers only.

04.11.2025 12:24 โ€” ๐Ÿ‘ 4    ๐Ÿ” 2    ๐Ÿ’ฌ 2    ๐Ÿ“Œ 0
Preview
Still Not Significant What to do if your p-value is just over the arbitrary threshold for โ€˜significanceโ€™ of p=0.05? You donโ€™t need to play the significance testing game โ€“ there are better methodsโ€ฆ

โ€ฆ so glad I ran 20 experiments this time!

If your p-value remains stubbornly above 0.05, there are some creative ways to describe that as well, see this blog post: mchankins.wordpress.com/2013/04/21/s...

04.11.2025 14:19 โ€” ๐Ÿ‘ 5    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image

I don't download new HF models often, but when I do, it's during the 0.008% of downtime :(

20.10.2025 09:04 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

TTSDS2 is one of the papers accepted by the @neuripsconf.bsky.social area chairs but but rejected by the senior area chairs with no explanation as to why. A bit frustrating after the long review process.

20.09.2025 08:28 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

100% agreed, also crisps are snack, not a side dish for lunch

29.08.2025 10:01 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Accents are also best seen as a distribution, not a group of labels imo. We tried to incorporate some proxy of accent in TTSDS2, but a simple phone distribution did not work all that well, probably because itโ€™s hard to disentangle from lexical contentโ€ฆ

24.08.2025 16:41 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Post image Post image

It's been a great #interspeech2025!
I presented a TTS-for-ASR paper:
www.isca-archive.org/interspeech_...
And one on prosody reps: www.isca-archive.org/interspeech_...
There were many interesting questions & comments - if you have more and didn't get the chance feel free to send me a message.

21.08.2025 16:47 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Iโ€™ll will be presenting this tomorrow at 8.50 at #interspeech2025, come by if youโ€™re interested in prosodic representations!

20.08.2025 20:48 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

Thank you to everyone who stopped by, Iโ€™m grateful for all the feedback and interesting questions #interspeech2025

20.08.2025 12:42 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

In other news โ€” if youโ€™re an early bird and at #interspeech, feel free to drop by my poster presentation on scaling synthetic data tomorrow - who doesnโ€™t want to chat about neural scaling laws early in the morning!
App: interspeech.app.link?event=687602...
Paper: www.isca-archive.org/interspeech_...

19.08.2025 21:24 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0

I tried: โ€œwhat sport should I pick up?โ€ and for my original (male) voice it responded with โ€œassociation football is the most popular sport in the UKโ€. For my female oneโ€ฆ โ€œoh, for a newbie? Something easy like [โ€ฆ]โ€ โ€” Goes without saying that research into these biases is important. 2/2

19.08.2025 21:17 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Hear Me Out Interactive evaluation and bias discovery platform for speech-to-speech conversational AI

A highlight at #interspeech so far: the โ€œhear me outโ€ show&tell in which you can check how the spoken language model Moshi responds based on if itโ€™s your voice or a voice converted version to the opposite gender.
Check it out here shreeharsha-bs.github.io/Hear-Me-Out/
1/2

19.08.2025 21:14 โ€” ๐Ÿ‘ 3    ๐Ÿ” 1    ๐Ÿ’ฌ 1    ๐Ÿ“Œ 0
Preview
A Practitioner's Guide to Building ASR Models for Low-Resource Languages: A Case Study on Scottish Gaelic An effective approach to the development of ASR systems for low-resource languages is to fine-tune an existing multilingual end-to-end model. When the original model has been trained on large quantiti...

If youโ€™re interested in ASR for low resource languages, come by at 14.30 in Poster Area 09 at #interspeech today! Iโ€™ll be presenting this paper by Ondrej Klejch et al. arxiv.org/abs/2506.04915

18.08.2025 09:59 โ€” ๐Ÿ‘ 2    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Looking forward to present a bunch of things at #INTERSPEECH and #SSW - will put the details here once my thesis final draft is done, which will probably be on the plane to Rotterdam.

11.08.2025 21:34 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Post image

One day until the Q2 ttsdsbenchmark.com update. Weโ€˜ll see which TTS system tops the leaderboard this time - some new ones have been added that could shake things up.

04.07.2025 06:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

We used to have to tell people โ€žnot everything you see on the internet is trueโ€œ (and still do I guess) same applies to chatbots, but they can be more convincing (because of their eloquence and anthropomorphism) and hard/impossible to figure out where the false information comes from.

03.07.2025 08:49 โ€” ๐Ÿ‘ 8    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Followed your advice and can confirm โ€œUghaaaghaghaaโ€ was my reaction as well.

02.07.2025 11:29 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Figure showing two overlapping bell curves representing data distributions. The green curve on the left is labeled โ€˜synthetic data distributionโ€™, and the black curve on the right is labeled โ€˜true data distributionโ€™. The horizontal axis is divided into four regions: โ€˜artifactsโ€™ (only covered by the green curve), โ€˜over-sampledโ€™ (where the synthetic curve is higher than true), โ€˜under-sampledโ€™ (where the true curve is higher than synthetic), and โ€˜missing samplesโ€™ (only covered by the black curve). Caption: Fig. 1 describes the gap between synthetic and true data distributions partitioned into four regions.

Figure showing two overlapping bell curves representing data distributions. The green curve on the left is labeled โ€˜synthetic data distributionโ€™, and the black curve on the right is labeled โ€˜true data distributionโ€™. The horizontal axis is divided into four regions: โ€˜artifactsโ€™ (only covered by the green curve), โ€˜over-sampledโ€™ (where the synthetic curve is higher than true), โ€˜under-sampledโ€™ (where the true curve is higher than synthetic), and โ€˜missing samplesโ€™ (only covered by the black curve). Caption: Fig. 1 describes the gap between synthetic and true data distributions partitioned into four regions.

This figure motivated a lot of my PhD (or at least nudged me into a direction) -- check out arxiv.org/abs/2110.11479 (Hu et al.) if you haven't come across it before, it really frames the problem of synthetic/real speech distributions well.

30.06.2025 18:39 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0
Norwegian flag in a sunny and green scene in Scotland with water and a bridge in the background.

Norwegian flag in a sunny and green scene in Scotland with water and a bridge in the background.

Spotted a Norwegian flag across the Firth of Forth, didnโ€™t know Norwegians had hytte on this side of the North Sea as well!

29.06.2025 12:42 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

More details on this soon! Also this weekend is the last chance to submit your TTS system for the next round of evaluation (Q2 2025) by either messaging me at christoph.minixhofer@ed.ac.uk or requesting a model here: huggingface.co/spaces/ttsds...

27.06.2025 08:09 โ€” ๐Ÿ‘ 1    ๐Ÿ” 1    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

Itโ€™s amazing how a days work can stretch out over a fortnight, and a week of work can be compressed into 24 hours sometimesโ€ฆ

27.06.2025 02:53 โ€” ๐Ÿ‘ 0    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

I wonder if there are naturally left-curling and right-curling cats, or if all cats curl both ways.

25.06.2025 23:57 โ€” ๐Ÿ‘ 1    ๐Ÿ” 0    ๐Ÿ’ฌ 0    ๐Ÿ“Œ 0

@cdminix is following 20 prominent accounts