Open‑YOLO 3D replaces costly SAM/CLIP steps with 2D detection, LG label‑maps, and parallelized visibility, enabling fast and accurate 3D OV segmentation.Open‑YOLO 3D replaces costly SAM/CLIP steps with 2D detection, LG label‑maps, and parallelized visibility, enabling fast and accurate 3D OV segmentation.

Drop the Heavyweights: YOLO‑Based 3D Segmentation Outpaces SAM/CLIP

2025/08/26 16:20

Abstract and 1 Introduction

  1. Related works
  2. Preliminaries
  3. Method: Open-YOLO 3D
  4. Experiments
  5. Conclusion and References

A. Appendix

3 Preliminaries

Problem formulation: 3D instance segmentation aims at segmenting individual objects within a 3D scene and assigning one class label to each segmented object. In the open-vocabulary (OV) setting, the class label can belong to previously known classes in the training set as well as new class labels. To this end, let P denote a 3D reconstructed point cloud scene, where a sequence of RGB-D images was used for the reconstruction. We denote the RGB image frames as I along with their corresponding depth frames D. Similar to recent methods [35, 42, 34], we assume that the poses and camera parameters are available for the input 3D scene.

\

3.1 Baseline Open-Vocabulary 3D Instance Segmentation

We base our approach on OpenMask3D [42], which is the first method that performs open-vocabulary 3D instance segmentation in a zero-shot manner. OpenMask3D has two main modules: a class-agnostic mask proposal head, and a mask-feature computation module. The class-agnostic mask proposal head uses a transformer-based pre-trained 3D instance segmentation model [39] to predict a binary mask for each object in the point cloud. The mask-feature computation module first generates 2D segmentation masks by projecting 3D masks into views in which the 3D instances are highly visible, and refines them using the SAM [23] model. A pre-trained CLIP vision-language model [55] is then used to generate image embeddings for the 2D segmentation masks. The embeddings are then aggregated across all the 2D frames to generate a 3D mask-feature representation.

\ Limitations: OpenMask3D makes use of the advancements in 2D segmentation (SAM) and vision-language models (CLIP) to generate and aggregate 2D feature representations, enabling the querying of instances according to open-vocabulary concepts. However, this approach suffers from a high computation burden leading to slow inference times, with a processing time of 5-10 minutes per scene. The computation burden mainly originates from two sub-tasks: the 2D segmentation of the large number of objects from the various 2D views, and the 3D feature aggregation based on the object visibility. We next introduce our proposed method which aims at reducing the computation burden and improving the task accuracy.

\

4 Method: Open-YOLO 3D

Motivation: We here present our proposed 3D open-vocabulary instance segmentation method, Open-YOLO 3D, which aims at generating 3D instance predictions in an efficient strategy. Our proposed method introduces efficient and improved modules at the task level as well as the data level. Task Level: Unlike OpenMask3D, which generates segmentations of the projected 3D masks, we pursue a more efficient approach by relying on 2D object detection. Since the end target is to generate labels for the 3D masks, the increased computation from the 2D segmentation task is not necessary. Data Level: OpenMask3D computes the 3D mask visibility in 2D frames by iteratively counting visible points for each mask across all frames. This approach is time-consuming, and we propose an alternative approach to compute the 3D mask visibility within all frames at once.

\

4.1 Overall Architecture

\

4.2 3D Object Proposal

\

4.3 Low Granularity (LG) Label-Maps

\

4.4 Accelerated Visibility Computation (VAcc)

In order to associate 2D label maps with 3D proposals, we compute the visibility of each 3D mask. To this end, we propose a fast approach that is able to compute 3D mask visibility within frames via tensor operations which are highly parallelizable.

\ Figure 3: Multi-View Prompt Distribution (MVPDist). After creating the LG label maps for all frames, we select the top-k label maps based on the 2D projection of the 3D proposal. Using the (x, y) coordinates of the 2D projection, we choose the labels from the LG label maps to generate the MVPDist. This distribution predicts the ID of the text prompt with the highest probability.

\

\

\

4.5 Multi-View Prompt Distribution (MVPDist)

\ Table 1: State-of-the-art comparison on ScanNet200 validation set. We use Mask3D trained on the ScanNet200 training set to generate class-agnostic mask proposals. Our method demonstrates better performance compared to those that generate 3D proposals by fusing 2D masks and proposals from a 3D network (highlighted in gray in the table). It outperforms state-of-the-art methods by a wide margin under the same conditions using only proposals from a 3D network.

\

4.6 Instance Prediction Confidence Score

\

:::info Authors:

(1) Mohamed El Amine Boudjoghra, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) (mohamed.boudjoghra@mbzuai.ac.ae);

(2) Angela Dai, Technical University of Munich (TUM) (angela.dai@tum.de);

(3) Jean Lahoud, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) ( jean.lahoud@mbzuai.ac.ae);

(4) Hisham Cholakkal, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) (hisham.cholakkal@mbzuai.ac.ae);

(5) Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Aalto University (rao.anwer@mbzuai.ac.ae);

(6) Salman Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University (salman.khan@mbzuai.ac.ae);

(7) Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University (fahad.khan@mbzuai.ac.ae).

:::


:::info This paper is available on arxiv under CC BY-NC-SA 4.0 Deed (Attribution-Noncommercial-Sharelike 4.0 International) license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

BitMine koopt $44 miljoen aan ETH

BitMine koopt $44 miljoen aan ETH

De grootste Ethereum (ETH) treasury ter wereld, BitMine Immersion Technologies, heeft weer toegeslagen op de crypto markt. Uit on-chain data blijkt dat BitMine, ook bekend onder het ticker symbool BMNR, voor $44 miljoen aan ETH munten heeft gekocht. Wat betekent dit voor de grootste altcoin? Check onze Discord Connect met "like-minded" crypto enthousiastelingen Leer gratis de basis van Bitcoin & trading - stap voor stap, zonder voorkennis. Krijg duidelijke uitleg & charts van ervaren analisten. Sluit je aan bij een community die samen groeit. Nu naar Discord BitMine verdubbelt inzet op Ethereum Om precies te zijn koopt BitMine 14.618 ETH munten erbij, goed voor dus $44 miljoen. Zo blijkt uit on-chain gegevens gedeeld door Lookonchain op X. Daarmee tilt de grote Ethereum treasury zijn voorraad naar maar liefst 3,63 miljoen ETH ter waarde van ruim $11 miljard, aldus data van StrategicETHReserve. Daarmee controleert het bedrijf nu 3% van alle Ethereum in omloop. Tom Lee(@fundstrat)’s #Bitmine just bought another 14,618 $ETH($44.34M) 4 hours ago.https://t.co/P684j5Yil8 pic.twitter.com/LHOpDto1R5 — Lookonchain (@lookonchain) November 28, 2025 De ambities liggen desondanks een stuk hoger: BitMine wil uiteindelijk 5% van de volledige ETH voorraad bezitten. Oftewel, we kunnen nog flink wat Ethereum aankopen verwachten van het bedrijf in de komende maanden. Door de aggresssieve ETH strategie van het bedrijf zijn ze bij uitstek de grootste Ethereum reserve. De nummer twee, SharpLink Gaming, bezit ongeveer 859.400 ETH munten ter waarde van zo’n $2,62 miljard. Deze agressieve uitbreiding volgt een duidelijke strategie. BitMine verwacht dat Ethereum een grotere rol in de tokenisatie. Bedrijven bezitten samen al bijna 5,01% van alle ETH, een signaal dat corporates zich voorbereiden op een toekomst waarin Ethereum een basislaag wordt voor financiële infrastructuur. Waarom BitMine zijn treasury blijft uitbreiden BitMine bouwt zijn treasury verder uit omdat het een dominante positie in het Ethereum netwerk wil innemen. Meer ETH geeft BitMine straks hogere staking-opbrengsten en meer invloed op de liquiditeit binnen het netwerk. Ook gelooft BMNR sterk in de rol van Ethereum in de toekomst van financiële infrastructuur. Bestuurslid Tom Lee verwacht dat ETH een dominante speler zal zijn in de stablecoin en tokenisatie markt. Beide sectoren zijn hard aan het groeien, mede dankzij duidelijke wet- en regelgeving onder de Trump administratie zoals de GENIUS Act. Daarnaast gelooft Tom Lee in een zogeheten supercycle voor ETH. Volgens de bekende top analist kan de grootste altcoin zelfs Bitcoin (BTC) voorbijstreven, allemaal dankzij grootschalige adoptie door tokenisatie. Als Ethereum de huidige marketcap van BTC wil evenaren dan zou de ETH koers al op ruim $15.000 komen. ETH en BMNR krabbelen langzaam op uit diepe dip De ethereum prijs reageerde vandaag beperkt op het nieuws. De altcoin steeg over de afgelopen 24 uur met 0,8% tot een huidige koers van $3.050. Daarmee zet de munt samen met de rest van de crypto markt een stijgende trend voort. Na een heftige crash in de afgelopen weken zakte de ETH koers vorige week vrijdag tot onder de $2.700. Ook het BMNR aandeel is langzaam aan het terugkrabbelen. Het ETH treasury bedrijf zakte vorige week tot $26. Een flinke crash ten opzichte van de all time high van $135 dat het bedrijf in juli van dit jaar nog wist te realiseren. De sterke daling van het BMNR aandeel valt samen met een algehele neerwaartse trend onder crypto treasury bedrijven. Ook Strategy, de grootste publieke Bitcoin houder, is ook flink lager aan het handelen vanaf zijn all time. Zo staat het MSTR aandeel momenteel op $175 tegenover een prijs record van $457 in juli. Ethereum (ETH) kopen op Bitvavo Bitvavo - grootste crypto exchange in Nederland Meer dan 340 beschikbare cryptocurrencies Lage transactiekosten Gemakkelijk via iDeal geld storten Professionele traders dashboard Bitvavo review Koop ETH op Bitvavo Let op: cryptocurrency is een zeer volatiele en ongereguleerde investering. Doe je eigen onderzoek. Het bericht BitMine koopt $44 miljoen aan ETH is geschreven door Thomas van Welsenes en verscheen als eerst op Bitcoinmagazine.nl.
Share
Coinstats2025/11/28 20:31
Upbit hack sparks altcoin season in Korea? Thailand targets WLD

Upbit hack sparks altcoin season in Korea? Thailand targets WLD

The post Upbit hack sparks altcoin season in Korea? Thailand targets WLD appeared on BitcoinEthereumNews.com. Korean crypto bros are pumping altcoins after Upbit’s $36M exploit Korean crypto traders are having an outsize effect on local altcoin prices following a major hack at South Korean exchange Upbit, according to CryptoQuant CEO Ki Young Ju. (Ki Young Ju) “Upbit got hacked and paused withdrawals, but Koreans are pumping alts since arbitrage bots are no longer running,” Ju said in an X post on Thursday, shortly after the exchange halted transaction activity after detecting an “abnormal transaction” with a value of around $36 million. With arbitrage activity suspended, local buy orders are having more significant pressure on prices, allowing Korean-listed altcoins to surge, as the selling pressure that typically puts a ceiling on price increases has disappeared. Crypto trader R2D2 said, “Unbelievable scenes here.” Crypto analyst A79 said, “Hack happens, and Koreans just flip it into a rally.” Upbit announced on Thursday that it had suspended deposits and withdrawals after identifying an unauthorized transaction worth approximately 54 billion won ($36 million), involving mainly Solana-based assets that were transferred to an unidentified wallet address. Assets reportedly affected by the hack include BONK (BONK), Official Trump (TRUMP), MOODENG (MOODENG), and Render (RENDER). Upbit to cover loss to prevent “any damage” to user assets The exchange clarified that while the hot wallet was impacted, its cold wallets — where the majority of user funds are stored — were not compromised. Dunamu CEO Oh Kyung-seok said: “We immediately identified the extent of the digital asset outflow caused by the abnormal withdrawals and will cover the entire amount with Upbit assets to prevent any damage to our members’ assets.” Some industry participants were confused by the fact that all the red numbers Ju shared were positive. StarkWare ecosystem lead Brother Odin was quick to ask the obvious question, before Ju explained that red…
Share
BitcoinEthereumNews2025/11/28 21:20