.txt's Avatar

.txt

@dottxtai.bsky.social

We make AI speak the language of every application

609 Followers  |  20 Following  |  163 Posts  |  Joined: 19.11.2024  |  2.0914

Latest posts by dottxtai.bsky.social on Bluesky

Post image

We changed the way we handle multimodal inputs in Outlines.

Most libraries will give the prompt a special role, separate it from the other inputs. We don’t anymore.

13.07.2025 17:11 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - aastroza/tvtxt: [WIP] AI that "reads" live TV and writes it as a movie script in real-time. [WIP] AI that "reads" live TV and writes it as a movie script in real-time. - aastroza/tvtxt

Check out this cool project by @aastroza.bsky.social to automatically convert any TV stream into a real-time movie script! Amazingly cool project using Outlines to enforce movie script formatting, and generally an impressive project.

Here's the entire JSON schema he uses.

github.com/aastroza/tvtxt

03.06.2025 19:40 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Another example from our internal context-free grammar hackathon. You can force a model to generate an acrostic, i.e. make each line start with a letter in a word -- HELLO in this case.

Grammars are cool.

30.05.2025 18:01 β€” πŸ‘ 5    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

We did make the machines in our own image

29.05.2025 20:57 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

During our internal hackathon, one of our teams (named "Too Many Cooks") wrote a grammar-powered recipe generator. The language model can ONLY generate text consistent with this recipe format.

Here's a recipe for the universe.

29.05.2025 16:25 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 0    πŸ“Œ 2

🎡 Cameron with your eyes so bright
🎡 won't you join our Discord tonight

14.05.2025 19:54 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Join the .txt Discord Server! Check out the .txt community on Discord - hang out with 1565 other members and enjoy free voice and text chat.

We've got a Discord channel if you want to talk about AI engineer, compound AI systems, low overhead agents (though who knows what that means), constrained decoding, or whatever you're building.

Come on by!

discord.gg/ErZ8XnCmkQ

14.05.2025 19:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1

Loved this paper. Smart idea.

Basically they "guess and check" whether a token would appease the compiler when an LLM is doing code generation.

13.05.2025 16:26 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0
Preview
Type-Constrained Code Generation with Language Models Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model…

Very cool paper. arxiv.org/abs/2504.09246

13.05.2025 16:06 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Here's a good example of a case where type-constrained decoding ensures that the program is semantically valid at generation time. Left is the unconstrained model, right is the type-constrained approach. Missing arguments, missing return values, missing type annotations. They fixed it for you.

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The authors here use rejection methods (guess and check), rather than the constrained decoding approach that modifies the token probabilities by disabling invalid tokens, like Outlines. Type checking is relatively cheap, and the model's first guess is correct 99.4% of the time.

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

The authors propose a novel solution to this, by rejecting tokens that would invalidate a type check. In the following example, num is known to be a number. Only one of these completions is valid, which is to convert the number to a string (as required by parseInt)

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

As an example, a context-free grammar cannot express the following type checking constraint, as there is no way of knowing ahead of time what the type of `x` is. Constrained decoding using a CFG cannot enforce the validity of "x+10".

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Context-free grammars are interesting in their own right. We'll do a thread about them at some point. For now, you should think of them as a "programming language for language" -- they describe the "shape" of text, such as code or natural language.

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Addressing this is an infamously difficult problem. Most programming languages are not "context-free", meaning that constraining a language model's output to be semantically valid cannot be done in advance using what is called a "context-free grammar".

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Most LLM-based failures in code generation are not syntactic errors (incorrect formatting). They are typically semantic failures (wrong types, missing arguments, etc). The paper notes that 94% of their compilation errors are due to incorrect types, with the remaining 6% due to syntax errors.

13.05.2025 16:06 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Type-Constrained Code Generation with Language Models Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model…

"Type-Constrained Code Generation with Language Models" is a relatively new paper that addresses a common challenge with LLM-generated code. The researchers are from ETH Zurich and UC Berkeley.

13.05.2025 16:06 β€” πŸ‘ 14    πŸ” 3    πŸ’¬ 1    πŸ“Œ 1
Alonso Silva - Building Knowledge Graph-Based Agents with Structured Text Generation
YouTube video by PyData Alonso Silva - Building Knowledge Graph-Based Agents with Structured Text Generation

My talk "Building Knowledge Graph-Based Agents with Structured Text Generation" at PyData Global 2024 is now available on YouTube:
www.youtube.com/watch?v=94yu...

#PyData #PyDataGlobal @dottxtai.bsky.social @pydata.bsky.social

23.04.2025 08:49 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1

Do people like this type of thread? They're fun to write.

28.04.2025 20:40 β€” πŸ‘ 12    πŸ” 1    πŸ’¬ 2    πŸ“Œ 0

Grammar prompting is similar to chain-of-thought prompting, but with formal grammar as the intermediate reasoning step rather than natural language.

It's a cool paper, go take a look.

28.04.2025 19:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

At inference time, the model generates a specialized grammar for the input, then generates output constrained by those grammar rules - creating a two-stage process of grammar selection + code generation.

Rather than input -> output, the model does input -> grammar -> output.

28.04.2025 19:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

For each input/output example, they provide a context-free grammar that's minimally-sufficient for generating that specific output, teaching the LLM to understand grammar as a reasoning tool.

28.04.2025 19:00 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The gist: LLMs struggle with DSLs because these formal languages have specialized syntax rarely seen in training data. Grammar prompting addresses this by teaching LLMs to work with Backus-Naur Form (BNF) grammars as an intermediate reasoning step.

28.04.2025 19:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
Grammar Prompting for Domain-Specific Language Generation with Large Language Models Large language models (LLMs) can learn to perform a wide range of natural language tasks from just a handful of in-context examples. However, for generating strings from highly structured languages…

"Grammar Prompting for Domain-Specific Language Generation with Large Language Models", by researchers from MIT and Google arxiv.org/abs/2305.19234

28.04.2025 19:00 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

We recently came across an interesting paper that helps LLMs be better at handling domain-specific languages like database queries or probabilistic programming languages, using an approach called "grammar prompting".

Link + brief thread below.

28.04.2025 19:00 β€” πŸ‘ 14    πŸ” 2    πŸ’¬ 1    πŸ“Œ 1
Preview
GitHub - MantisAI/sieves: NLP tasks with zero- and few-shot models. NLP tasks with zero- and few-shot models. . Contribute to MantisAI/sieves development by creating an account on GitHub.

Check out the sieves library from Mantis NLP!

sieves is a library for structured document processing. It works with Outlines!

Go check out the library, it's pretty cool.

Repo: github.com/mantisai/sie...

14.04.2025 20:00 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

llms.txt is an increasingly popular standard for making websites easily accessible to machine reading. We would provide provide an llms.txt file at the root of the docs page containing a condensed version of the documentation or website in Markdown format.

11.04.2025 19:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Provide llms.txt file Β· Issue #1531 Β· dottxt-ai/outlines Related to the v1 documentation refactoring #1528. llms.txt is an increasingly popular standard for making websites easily accessible to machine reading. Website owners provide a llms.txt file at t...

We're considering adding an llms.txt file to the upcoming Outlines v1.0 documentation to make it more machine readable! Opinions welcome.

github.com/dottxt-ai/ou...

11.04.2025 19:01 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We're delighted to sponsor the AI User Conference!

It's next week, April 15th-17th. Find us at online or at San Francisco's delightful Fort Mason.

Tickets: www.aiuserconference.com

@cameron_pfiffer will be presenting a design for Minerva, a frontier space colony resource management system.

10.04.2025 19:00 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Check out our advocacy team @willkurt.bsky.social and @cameron.pfiffer.org on the Weaviate Podcast!

09.04.2025 16:36 β€” πŸ‘ 3    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

@dottxtai is following 20 prominent accounts