Harit Vishwakarma's Avatar

Harit Vishwakarma

@harit7.bsky.social

Ph.D. Candidate at UW-Madison https://harit7.github.io/

19 Followers  |  15 Following  |  8 Posts  |  Joined: 07.12.2024  |  1.597

Latest posts by harit7.bsky.social on Bluesky

@srinathnamburi.bsky.social

11.12.2024 18:04 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Join us in the evening poster session (#1906) to learn more about it and chat about auto-labeling and data-centric AI.

Thanks to the amazing co-authors: Yi (Reid) Chen, Sui Jiet Tay, Srinath Namburi, @fredsala.bsky.social, Ramya Korlakai Vinayak.

11.12.2024 17:53 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Our method learns confidence functions tailored for efficient and reliable auto-labeling. Using these in TBAL boosts the no. of auto-labeled points by up to 60% (while making < 5% auto-labeling errors) compared to baselines like softmax and several training-time and post-hoc calibration techniques.

11.12.2024 17:53 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Introducing Colander, our framework for learning optimal confidence functions for TBAL! We formulate the auto-labeling objective as an optimization problem over the space of confidence functions and thresholds.

11.12.2024 17:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

We systematically study the limitations of popular confidence functions like softmax outputs and off-the-shelf calibration techniques. The result? Too few auto-labeled points or large auto-labeling errors.

11.12.2024 17:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

The choice confidence function is crucial in TBAL – if it's not aligned with the auto-labeling objective, it can be detrimental to performance. We show commonly used confidence functions fall short.

11.12.2024 17:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

TBAL is a promising auto-labeling technique. It iteratively acquires human labels for small data chunks, trains a model, and auto-labels points where the model's confidence is above a threshold. The goal? Maximize coverage (proportion of auto-labeled points) with bounded auto-labeling error.

11.12.2024 17:53 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Excited to present Colander at #NeurIPS2024, our new framework for optimizing confidence functions to make auto-labeling more efficient and reliable. Check out our poster #1906 at today's evening poster session.

Wed, Dec 11, 4:30–7:30 p Poster #1906

Project: harit7.github.io/colander

11.12.2024 17:53 β€” πŸ‘ 4    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

@harit7 is following 15 prominent accounts