Morris Alper's Avatar

Morris Alper

@malper.bsky.social

PhD student researching multimodal learning (language, vision, ...). Also a linguistics enthusiast. morrisalp.github.io

1,020 Followers  |  646 Following  |  46 Posts  |  Joined: 17.11.2024  |  2.0711

Latest posts by malper.bsky.social on Bluesky

Now accepted to #NeurIPS2025!

18.09.2025 15:54 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
WildCAT3D: Appearance-Aware Multi-View Diffusion in the Wild We present a framework for generating novel views of scenes learned from diverse 2D scene image data captured in the wild.

Check out our project page and paper for more info:
Project page: wildcat3d.github.io
Paper: arxiv.org/abs/2506.13030
(5/5)

17.06.2025 16:16 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

At inference time, we inject the appearance of the observed view to get consistent novel views. This also enables cool applications like appearance-conditioned NVS! (4/5)

17.06.2025 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

To learn from this data, we use a novel multi-view diffusion architecture adapted from CAT3D, modeling appearance variations with a bottleneck encoder applied to VAE latents and disambiguating scene scale via warping. (3/5)

17.06.2025 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Photos like the ones below differ in global appearance (day vs. night, lighting), aspect ratio, and even weather. But they give clues to how scenes are build in 3D. (2/5)

17.06.2025 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image Post image

πŸ’₯New preprint! WildCAT3D uses tourist photos in-the-wild as supervision to learn to generate novel, consistent views of scenes like the one shown below. h/t Tom Monnier and all collaborators (1/5)

17.06.2025 16:16 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

Disappointing that arXiv doesn't allow XeLaTex/LuaLaTex submissions, which have the least broken multilingual support of LaTeX compilers. The web shouldn't be limited to English in 2025!

13.06.2025 23:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
AI models make precise copies of cuneiform characters | Cornell Chronicle Researchers from Cornell and Tel Aviv University have developed an approach to use artificial intelligence for reading the ancient tablets.

More coverage of our work on AI for ancient cuneiform! news.cornell.edu/stories/2025...

31.03.2025 15:31 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
ProtoSnap: Prototype Alignment for Cuneiform Signs The cuneiform writing system served as the medium for transmitting knowledge in the ancient Near East for a period of over three thousand years. Cuneiform signs have a complex internal structure which...

See our paper, project page, and GitHub for more details and a full implementation!
ArXiv: arxiv.org/abs/2502.00129
Project page: tau-vailab.github.io/ProtoSnap/
GitHub: github.com/TAU-VAILab/P...

04.02.2025 18:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Finally we show that ProtoSnap-aligned skeletons can be used as conditions for a ControlNet model to generate synthetic OCR training data. By controlling the shapes of signs in training, we can achieve SOTA on cuneiform sign recognition. (Bottom: synthetic generated sign images)

04.02.2025 18:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.

04.02.2025 18:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!

04.02.2025 18:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.

04.02.2025 18:24 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)

04.02.2025 18:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Arguably the most ancient writing system in the world (since ~3300 BCE), cuneiform inscriptions in ancient languages (e.g. Sumerian, Akkadian) are numerous but hard to read due to the complex writing system, wide variation in sign shapes, and physical nature as imprints in clay.

04.02.2025 18:24 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Cuneiform at #ICLR2025! ProtoSnap finds the configuration of wedges in scanned cuneiform signs for downstream applications like OCR. A new tool for understanding the ancient world!
tau-vailab.github.io/ProtoSnap/
h/t Rachel Mikulinsky @ShGordin @ElorHadar and all collaborators.
πŸ§΅πŸ‘‡

04.02.2025 18:24 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Post image

Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.

04.02.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!

04.02.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.

04.02.2025 18:13 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)

04.02.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Arguably the most ancient writing system in the world (since ~3300 BCE), cuneiform inscriptions in ancient languages (e.g. Sumerian, Akkadian) are numerous but hard to read due to the complex writing system, wide variation in sign shapes, and physical nature as imprints in clay.

04.02.2025 18:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Thrilled to announce our new work TestGenEval, a benchmark that measures unit test generation and test completion capabilities. This work was done in collaboration with the FAIR CodeGen team.

Preprint: arxiv.org/abs/2410.00752
Leaderboard: testgeneval.github.io/leaderboard....

19.12.2024 20:59 β€” πŸ‘ 17    πŸ” 8    πŸ’¬ 1    πŸ“Œ 1

Great news! BERT-like models are extremely useful and imo unfairly overlooked in the recent GenAI hype cycle. Looking forward to playing with this

20.12.2024 07:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
WAFFLE: Multimodal Floorplan Understanding in the Wild

We look forward to progress on architectural tasks being benchmarked and accelerated by WAFFLE!
See our project page for more details and links to our paper, code, and data: tau-vailab.github.io/WAFFLE/

10.12.2024 16:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

We show that our dataset serves as a new, challenging benchmark for common floorplan understanding tasks such as semantic segmentation. We also show it can be used to enable new tasks such as floorplan generation conditioned on building type and boundary.

10.12.2024 16:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We use modern foundation models (LLMs, vision-language models) to filter and structure raw, noisy open data to identify floorplan images and extract structured metadata, including global properties (e.g. floorplan type) and grounded architectural features within images.

10.12.2024 16:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

WAFFLE (WikipediA-Fueled FLoorplan Ensemble) is a multimodal dataset of ~20K diverse floorplans, of many building types (e.g. homes, churches, hospitals, schools, ...), regions, eras, and data formats, along with structured metadata.

10.12.2024 16:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
WAFFLE: Multimodal Floorplan Understanding in the Wild Buildings are a central feature of human culture and are increasingly being analyzed with computational methods. However, recent works on computational building understanding have largely focused on n...

Project page: tau-vailab.github.io/WAFFLE/
Paper: arxiv.org/abs/2412.00955

Architecture is complicated and automatic methods could help design and maintain buildings. But current datasets are very limited (e.g. apartments from one country). That's where WAFFLE comes in!

10.12.2024 16:19 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Bite into WAFFLE πŸ§‡, our new multimodal floorplan dataset and paper - now accepted to #WACV2025!
Work with Keren Ganon, Rachel Mikulinsky, Hadar Elor.
More info belowπŸ‘‡

10.12.2024 16:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Preview
WAFFLE: Multimodal Floorplan Understanding in the Wild

We look forward to progress on architectural tasks being benchmarked and accelerated by WAFFLE!
See our project page for more details and links to our paper, code, and data: tau-vailab.github.io/WAFFLE/

10.12.2024 16:16 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

@malper is following 20 prominent accounts