Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
ckpt 8B: huggingface.co/jiuhai/flore...
demo: huggingface.co/spaces/jiuha...
code: github.com/JiuhaiChen/F...
paper: arxiv.org/abs/2412.04424
@gm8xx8.bsky.social
βΊοΈ
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
ckpt 8B: huggingface.co/jiuhai/flore...
demo: huggingface.co/spaces/jiuha...
code: github.com/JiuhaiChen/F...
paper: arxiv.org/abs/2412.04424
demo: vila.mit.edu
paper: arxiv.org/abs/2412.04468
models π: huggingface.co/collections/...
NVILA, a VLM, enhances VILA by scaling spatial and temporal resolutions before compressing visual tokens, enabling efficient high-resolution image & long video processing. Cuts training costs by 4.5X, improves memory & latency, and outperforms top VLMs on benchmarks. Code & models will be released π
06.12.2024 06:47 β π 4 π 1 π¬ 1 π 0ClearerVoice-Studio by Tongyi Lab is a versatile voice processing framework offering noise removal, speech separation, audio-video speaker extraction, and tools for model fine-tuning and optimization.
git: github.com/modelscope/C...
demo: huggingface.co/spaces/aliba...
PaliGemma 2: A Family of Versatile VLMs for Transfer
paper: arxiv.org/abs/2412.03555
Home robotics just got a boost.
Stretch AI - a new open-source suite of tools, tutorials, and reference code to explore and build AI-enabled home robot applications.
Liquid AI introduces synthesis of tailored architectures (STAR) a new approach to automate neural network design tailored to various tasks and hardware setups.
π: www.liquid.ai/research/aut...
JetFormer: An Autoregressive Generative Model of Raw Images and Text
paper: arxiv.org/abs/2411.19722
JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.
Marconi: Prefix Caching for the Era of Hybrid LLMs
paper: arxiv.org/abs/2411.19379
Marconi improves caching for hybrid LLMs with policies optimizing reuse likelihood and compute savings, achieving 34.4Γ higher token hit rates and significantly reducing latency.
DeMo: Decoupled Momentum Optimization
code: github.com/bloc97/DeMo
paper: arxiv.org/abs/2411.19870
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
code: github.com/PKU-HMI-Lab/...
paper: arxiv.org/abs/2411.18623
project: lift3d-web.github.io
Efficient Track Anything
paper: arxiv.org/abs/2411.18933
project page: yformer.github.io/efficient-tr...
OMuleT: Orchestrating Multiple Tools for Practicable Conversational Recommendation
Roblox paper: arxiv.org/abs/2411.19352
A CRS enhancing LLMs with 10+ tools improves recommendations and shares insights from design, evaluation, and deployment.
Training Agents with Weakly Supervised Feedback from Large Language Models
paper: arxiv.org/abs/2411.19547
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
paper: arxiv.org/abs/2411.19943
Reverse Thinking Makes LLMs Stronger Reasoners
paper: arxiv.org/abs/2411.19865
RevThink improves LLM reasoning by 13.53% using structured forward-backward reasoning, ensuring strong generalization and data efficiency.
JetFormer: An Autoregressive Generative Model of Raw Images and Text
paper: arxiv.org/abs/2411.19722
JetFormer unifies text and image modeling with a normalizing flow, enabling strong text-to-image generation and image understanding.
Q-learning-based Model-free Safety Filter
paper: arxiv.org/abs/2411.19809
A plug-and-play model-free safety filter uses Q-learning to ensure safe actions in robotics, integrating easily with RL algorithms. Simulations and real-world tests confirm its effectiveness.
KV Shifting Attention Enhances Language Modeling
paper: arxiv.org/abs/2411.19574
KV shifting attention enhances induction heads in LLMs improving efficiency, in-context learning, and faster convergence, even in models with over 10 billion parameters.
CATP-LLM is a cost-efficient tool planning framework for LLMs, using a planning language for concurrent execution and offline reinforcement learning to balance performance and cost. It outperforms GPT-4 on OpenCATP, with up to 30.2% better performance and 45.8% lower costs.
30.11.2024 21:09 β π 1 π 0 π¬ 0 π 0Directed Token Sliding
The Token Sliding problem examines transforming one token configuration into another on a graph by sliding tokens while keeping independence. It is PSPACE-complete for various graph types but solvable in polynomial time for oriented cycles and cographs.
Weβre here with good intentions. A lot of the researchers here are genuinely helpfulβtake the time to follow them and explore their work. Our aim is to contribute, grow, and make things better for everyone.
29.11.2024 06:58 β π 14 π 0 π¬ 1 π 0I feel the same way. Glad to have you back @alpindale.bsky.social.
29.11.2024 04:12 β π 7 π 0 π¬ 0 π 0bluesky datasets:
π: zenodo.org/records/1108...
π: huggingface.co/datasets/alp...
π: huggingface.co/datasets/inf...
π: huggingface.co/datasets/not...
Star Attention: Efficient LLM Inference over Long Sequences
π: github.com/NVIDIA/Star-...
paper: arxiv.org/abs/2411.17116
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
ShowUI-2B: huggingface.co/showlab/Show...
paper: arxiv.org/abs/2411.17465
π: github.com/showlab/ShowUI
A Survey on LLM-as-a-Judge
paper: arxiv.org/abs/2411.15594
π: github.com/IDEA-FinAI/L...
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
π: github.com/llm-as-a-jud...
paper: arxiv.org/abs/2411.16594
project page: llm-as-a-judge.github.io