Give it a ♥️ at: huggingface.co/datasets/Rap...
08.05.2025 13:59 — 👍 1 🔁 0 💬 0 📌 0@jason.rapidata.ai
Founder of Rapidata
Give it a ♥️ at: huggingface.co/datasets/Rap...
08.05.2025 13:59 — 👍 1 🔁 0 💬 0 📌 0These are scales that are just not feasible with traditional methods, let alone as a startup.
But for us it was a relatively easy task.
Its already Trending on 4th place on Hugging Face, give it some love so that we can get to the first place!
We beat one of Googles most famous Datasets!
We just released a new dataset with over 32k images annotated with over 3 Million (!) human responses.
So this is apparently a thing in Palo Alto now. (I'm guessing it was hacked)
12.04.2025 16:42 — 👍 2 🔁 0 💬 0 📌 0We uploaded the results to @hf.co , check them out:
huggingface.co/datasets/Rap...
(4/4)
We used our own annotation service, Rapidata, to gather over 51k votes from native speakers and realized, that the translation quality from DeepL was much better.
Translation quality is super important for the perceived quality of your product, so we are sticking to the premium option.
(3/4)
It shows that their models have a better understanding of the structures of language and the relationship of words.
We use DeepL to translate tasks for our annotators, its a costly service and we could use our cloud credits to use LLMs like Deepseek-R1, Llama or Mistral for free.
(2/4)
Europe can do more than just unremovable bottlecaps!
Our AI industry is often belittled, but one of our earliest players, DeepL from Germany, is sticking it to the bajillion $ tech giants.
Their models outperform the top LLMs at translating text, one of the most fundamental task for any AI. (1/4)
He had wasted a bunch of our time, so we thought it would be funny to send him an invoice for USD 9.99 for "Emotional Damages"
AND HE FUCKING PAYED IT
This is how you generate revenue as a startup.
The payment receipt now hangs on our door.
(6/6)
This went back and forth for a few hours until we finally managed to cut him off. Trolling at its finest, but then again, we learnt a lot for next time.
Anyhow, we noticed that he had started a payment process at some point with us, so he was registered in our Stripe account.
(5/?)
Then he started to ask very inappropriate things like: "Do you think you are safe?" or "are you aware that you are not safe right now?"
(These were all caught by our automated audit, so no harm done)
So we blocked his IP, but he turned on a VPN.
(4/?)
Tons of non-sensical orders started flooding in all at once.
He tried to advertise his discord to our annotators through his orders.
We blocked his account, but he just kept making new ones.
(3/?)
As long as your early customers data is safe (this is a must, obviously) but your system is a bit wobbly, that is ok.
Until a few days ago we never had anyone really try to fuck with our software.
But then some dude in Czechia randomly decided to test us.
(2/?)
a paid stripe invoice for 9.99 usd for "Emotional Damage"
9.99 USD for Emotional Damages
As a startup, you need to be fast, that's really your only advantage over the big boys. (They got more $ than you)
When you iterate and prototype and build, you don't want to waste time on building all of the safety systems first, if no one is using it.. (1/?)
We also wrote a blog article about how we collected and analyzed this dataset on huggingface for those who are interested in the details: huggingface.co/blog/Rapidat...
10.01.2025 19:10 — 👍 4 🔁 1 💬 0 📌 0Screenshot of the dataset on the Hugging Face Hub
🔍 Massive human feedback dataset for text-to-image models from RapidData
- 1.5M human responses from 152K participants
- Evaluates image coherence, style & prompt alignment
- Includes detailed error heatmaps
- Covers DALL-E, Midjourney, Imagen outputs
Available on @hf.co
What a view...
(Trending page of Huggingface Image Datasets)
For the UI demo (app.rapidata.ai ):
We just threw out the login all together.
Of course there are risks, but until those risks actually cause you harm (and the level of harm here is, it will cost us a hundred dollars) you can just ignore them.
... So, we set up and authentication flow that automatically pops up a browser window to log you in using your existing google account.
We ask no questions and there is not setup process. The code will instantly run after you hit the button.
.....
For the python API (docs.rapidata.ai ):
Basically every API out there you will need to set an api key to use it (e.g. openAI's api). That's a lot of work, it will take you at least 15-20 seconds. NOT ACCEPTABLE!
An customer should literally just copy some code into the vs code and run it...
You have to reduce all friction to getting started, so people can experience that fucking awesomeness that is your product while jumping through as few hoops as possible.
At Rapidata we took it to an extreme, pushing the industry standard for brain rot compatibility.....
8.25 seconds
That's the average adult attention span these days. Our brains have been cooked by tiktok, instagram reels and youtube shorts.
As a startup founder, this has to be acknowledged when building a product now.... 🧵
I feel like community notes on Twitter is actually a really good feature. Obviously the post should be taken down in a bad case of misinformation, but not everything is straight up misinformation but could really benefit from some added context from reliable sources.
06.12.2024 10:14 — 👍 1 🔁 0 💬 0 📌 0With the US doing US things, Europe should make sure we give startups all the tools necessary to be able to compete with their US counterparts. For large companies the EU already acts as a single market, but for startups it feels like 27 different markets. www.eu-inc.org/petition
15.11.2024 09:15 — 👍 2 🔁 0 💬 0 📌 0My startup asked 2M real humans which text-2-image model is better. Check out our findings:
arxiv.org/abs/2409.11904
How do we move the whole tech community from Twitter to here 🤔
13.11.2024 10:39 — 👍 4 🔁 0 💬 0 📌 0