𝚐π”ͺ𝟾𝚑𝚑𝟾's Avatar

𝚐π”ͺ𝟾𝚑𝚑𝟾

@gm8xx8.bsky.social

☺︎

595 Followers  |  226 Following  |  108 Posts  |  Joined: 02.03.2023  |  2.1497

Latest posts by gm8xx8.bsky.social on Bluesky

Preview
jiuhai/florence-vl-8b-sft Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion

ckpt 8B: huggingface.co/jiuhai/flore...
demo: huggingface.co/spaces/jiuha...
code: github.com/JiuhaiChen/F...
paper: arxiv.org/abs/2412.04424

06.12.2024 10:26 β€” πŸ‘ 11    πŸ” 1    πŸ’¬ 0    πŸ“Œ 1
VILA on TinyChat

demo: vila.mit.edu
paper: arxiv.org/abs/2412.04468
models πŸ”œ: huggingface.co/collections/...

06.12.2024 06:47 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

NVILA, a VLM, enhances VILA by scaling spatial and temporal resolutions before compressing visual tokens, enabling efficient high-resolution image & long video processing. Cuts training costs by 4.5X, improves memory & latency, and outperforms top VLMs on benchmarks. Code & models will be released πŸ”œ

06.12.2024 06:47 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 1    πŸ“Œ 0
Preview
ClearerVoice-Studio (Speech Enhancement, Separation and Extraction) - a Hugging Face Space by alibabasglab Better AI powered platform to purify your speech signal

ClearerVoice-Studio by Tongyi Lab is a versatile voice processing framework offering noise removal, speech separation, audio-video speaker extraction, and tools for model fine-tuning and optimization.

git: github.com/modelscope/C...
demo: huggingface.co/spaces/aliba...

06.12.2024 06:32 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

PaliGemma 2: A Family of Versatile VLMs for Transfer

paper: arxiv.org/abs/2412.03555

05.12.2024 03:24 β€” πŸ‘ 14    πŸ” 4    πŸ’¬ 1    πŸ“Œ 0
Preview
GitHub - hello-robot/stretch_ai Contribute to hello-robot/stretch_ai development by creating an account on GitHub.

Home robotics just got a boost.

Stretch AI - a new open-source suite of tools, tutorials, and reference code to explore and build AI-enabled home robot applications.

03.12.2024 19:34 β€” πŸ‘ 9    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Liquid AI introduces synthesis of tailored architectures (STAR) a new approach to automate neural network design tailored to various tasks and hardware setups.

πŸ”—: www.liquid.ai/research/aut...

02.12.2024 23:45 β€” πŸ‘ 2    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
JetFormer: An Autoregressive Generative Model of Raw Images and Text Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on ma...

JetFormer: An Autoregressive Generative Model of Raw Images and Text

paper: arxiv.org/abs/2411.19722

JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.

02.12.2024 06:09 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Post image

Marconi: Prefix Caching for the Era of Hybrid LLMs

paper: arxiv.org/abs/2411.19379

Marconi improves caching for hybrid LLMs with policies optimizing reuse likelihood and compute savings, achieving 34.4Γ— higher token hit rates and significantly reducing latency.

02.12.2024 09:35 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

DeMo: Decoupled Momentum Optimization

code: github.com/bloc97/DeMo
paper: arxiv.org/abs/2411.19870

02.12.2024 09:28 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - PKU-HMI-Lab/LIFT3D Contribute to PKU-HMI-Lab/LIFT3D development by creating an account on GitHub.

Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation

code: github.com/PKU-HMI-Lab/...
paper: arxiv.org/abs/2411.18623
project: lift3d-web.github.io

02.12.2024 09:17 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Efficient Track Anything Segment Anything Model 2 (SAM 2) has emerged as a powerful tool for video object segmentation and tracking anything. Key components of SAM 2 that drive the impressive video object segmentation perform...

Efficient Track Anything

paper: arxiv.org/abs/2411.18933
project page: yformer.github.io/efficient-tr...

02.12.2024 08:13 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 1
Preview
OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation In this paper, we present a systematic effort to design, evaluate, and implement a realistic conversational recommender system (CRS). The objective of our system is to allow users to input free-form t...

OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation

Roblox paper: arxiv.org/abs/2411.19352

A CRS enhancing LLMs with 10+ tools improves recommendations and shares insights from design, evaluation, and deployment.

02.12.2024 07:37 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Post image

Training Agents with Weakly Supervised Feedback from Large Language Models

paper: arxiv.org/abs/2411.19547

02.12.2024 07:35 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image Post image

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

paper: arxiv.org/abs/2411.19943

02.12.2024 06:11 β€” πŸ‘ 9    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
Reverse Thinking Makes LLMs Stronger Reasoners Reverse thinking plays a crucial role in human reasoning. Humans can reason not only from a problem to a solution but also in reverse, i.e., start from the solution and reason towards the problem. Thi...

Reverse Thinking Makes LLMs Stronger Reasoners

paper: arxiv.org/abs/2411.19865

RevThink improves LLM reasoning by 13.53% using structured forward-backward reasoning, ensuring strong generalization and data efficiency.

02.12.2024 06:10 β€” πŸ‘ 20    πŸ” 4    πŸ’¬ 0    πŸ“Œ 1
Preview
JetFormer: An Autoregressive Generative Model of Raw Images and Text Removing modeling constraints and unifying architectures across domains has been a key driver of the recent progress in training large multimodal models. However, most of these models still rely on ma...

JetFormer: An Autoregressive Generative Model of Raw Images and Text

paper: arxiv.org/abs/2411.19722

JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.

02.12.2024 06:09 β€” πŸ‘ 7    πŸ” 3    πŸ’¬ 0    πŸ“Œ 0
Preview
Q-learning-based Model-free Safety Filter Ensuring safety via safety filters in real-world robotics presents significant challenges, particularly when the system dynamics is complex or unavailable. To handle this issue, learning-based safety ...

Q-learning-based Model-free Safety Filter

paper: arxiv.org/abs/2411.19809

A plug-and-play model-free safety filter uses Q-learning to ensure safe actions in robotics, integrating easily with RL algorithms. Simulations and real-world tests confirm its effectiveness.

02.12.2024 06:09 β€” πŸ‘ 5    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0
Preview
KV Shifting Attention Enhances Language Modeling The current large language models are mainly based on decode-only structure transformers, which have great in-context learning (ICL) capabilities. It is generally believed that the important foundatio...

KV Shifting Attention Enhances Language Modeling

paper: arxiv.org/abs/2411.19574

KV shifting attention enhances induction heads in LLMs improving efficiency, in-context learning, and faster convergence, even in models with over 10 billion parameters.

02.12.2024 06:08 β€” πŸ‘ 49    πŸ” 6    πŸ’¬ 0    πŸ“Œ 0
Preview
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning Utilizing large language models (LLMs) for tool planning has emerged as a promising avenue for developing general AI systems, where LLMs automatically schedule external tools (e.g. vision models) to t...

CATP-LLM is a cost-efficient tool planning framework for LLMs, using a planning language for concurrent execution and offline reinforcement learning to balance performance and cost. It outperforms GPT-4 on OpenCATP, with up to 30.2% better performance and 45.8% lower costs.

30.11.2024 21:09 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Directed Token Sliding Reconfiguration problems involve determining whether two given configurations can be transformed into each other under specific rules. The Token Sliding problem asks whether, given two different set o...

Directed Token Sliding

The Token Sliding problem examines transforming one token configuration into another on a graph by sliding tokens while keeping independence. It is PSPACE-complete for various graph types but solvable in polynomial time for oriented cycles and cographs.

30.11.2024 20:59 β€” πŸ‘ 1    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

We’re here with good intentions. A lot of the researchers here are genuinely helpfulβ€”take the time to follow them and explore their work. Our aim is to contribute, grow, and make things better for everyone.

29.11.2024 06:58 β€” πŸ‘ 14    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

I feel the same way. Glad to have you back @alpindale.bsky.social.

29.11.2024 04:12 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
a close up of a man 's face with a black stripe on it ALT: a close up of a man 's face with a black stripe on it

bluesky datasets:
πŸ”—: zenodo.org/records/1108...
πŸ”—: huggingface.co/datasets/alp...
πŸ”—: huggingface.co/datasets/inf...
πŸ”—: huggingface.co/datasets/not...

28.11.2024 02:38 β€” πŸ‘ 19    πŸ” 1    πŸ’¬ 1    πŸ“Œ 2
Preview
GitHub - NVIDIA/Star-Attention: Efficient LLM Inference over Long Sequences Efficient LLM Inference over Long Sequences. Contribute to NVIDIA/Star-Attention development by creating an account on GitHub.

Star Attention: Efficient LLM Inference over Long Sequences

πŸ”—: github.com/NVIDIA/Star-...
paper: arxiv.org/abs/2411.17116

28.11.2024 00:44 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
Skywork-o1-Open - a Skywork Collection Skywork o1 open model collections

Skywork-o1-Open

27.11.2024 23:14 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
showlab/ShowUI-2B Β· Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

ShowUI-2B: huggingface.co/showlab/Show...
paper: arxiv.org/abs/2411.17465
πŸ”—: github.com/showlab/ShowUI

27.11.2024 23:13 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Preview
A Survey on LLM-as-a-Judge Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language Models ...

A Survey on LLM-as-a-Judge

paper: arxiv.org/abs/2411.15594
πŸ”—: github.com/IDEA-FinAI/L...

26.11.2024 20:30 β€” πŸ‘ 31    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0
Preview
GitHub - llm-as-a-judge/Awesome-LLM-as-a-judge Contribute to llm-as-a-judge/Awesome-LLM-as-a-judge development by creating an account on GitHub.

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

πŸ”—: github.com/llm-as-a-jud...
paper: arxiv.org/abs/2411.16594
project page: llm-as-a-judge.github.io

26.11.2024 04:33 β€” πŸ‘ 18    πŸ” 2    πŸ’¬ 0    πŸ“Œ 0

@gm8xx8 is following 19 prominent accounts