Aljosa Osep, Ph.D.

In Colombia, my fave

Hi, I'm Aljosa! I come from the Alpine side of Slovenia. I am a Senior Research Scientist at NVIDIA, alum University of Bonn, RWTH Aachen University, TU Munich & Robotics Institute, Carnegie Mellon University.

I started this journey during my Ph.D. at RWTH Aachen University with development of joint, 3D stereo-based geometry, ego pose and object tracking, starting with canonical objects, and pushed the frontier towards tracking and reconstruction of any object — demonstrating these pipelines can power data auto-labeling. I am continuing this journey at NVIDIA, turning years of my foundational academic work into real-world systems at scale.

Research

My research focuses on enabling AI systems to robustly understand the dynamic, 3D world from raw sensor streams, such as video and LiDAR. Key areas include learning directly from raw data, tracking and segmenting objects, understanding complex spatiotemporal scenes, and predicting future events in open-world environments. Hover over each topic below to explore related publications.

Selected Papers

S. Elflein, R. Li, S. Agostinho, Ž. Gojčič, L. Leal-Leal-Taixé, A. Ošep: VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale, CVPR, 2026. paper page


A. Ošep, T. Meinhardt, F. Ferroni, N. Peri, D. Ramanan, L. Leal-Taixe: Better Call SAL: Towards Learning to Segment Anything in Lidar, ECCV, 2024. paper video page


A. Takmaz, C. Saltori, N. Peri, T. Meinhardt, R. de Lutio, L. Leal-Taixé, A. Ošep: Towards Learning to Complete Anything in Lidar, ICML, 2025. paper page


A. Ošep, P. Voigtlaender, M. Weber, J. Luiten, B. Leibe: 4D Generic Video Object Proposals, ICRA, 2020. paper video code teaser


Academic Engagements

Invited Talks

Service

Awards & Recognition

All Publications

S. Elflein, R. Li, S. Agostinho, Ž. Gojčič, L. Leal-Leal-Taixé, A. Ošep: VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale, CVPR, 2026. paper page


G. Brasó, A. Ošep, L. Leal-Leal-Taixé: Native Segmentation Vision Transformers, NeurIPS, 2025. paper page


A. Takmaz, C. Saltori, N. Peri, T. Meinhardt, R. de Lutio, L. Leal-Taixé, A. Ošep: Towards Learning to Complete Anything in Lidar, ICML, 2025. paper page


Y. Zhang, A. Ošep, L. Leal-Taixé, T. Meinhardt: Zero-Shot 4D Lidar Panoptic Segmentation, CVPR, 2025. paper page


A. Ošep, T. Meinhardt, F. Ferroni, N. Peri, D. Ramanan, L. Leal-Taixe: Better Call SAL: Towards Learning to Segment Anything in Lidar, ECCV, 2024. paper video page


A. Chakravarthy, M. Ganesina, P. Hu, L. Leal-Taixe, S. Kong, D. Ramanan, A. Ošep: Lidar Panoptic Segmentation in an Open World, IJCV, 2024. paper code


J. Seidenschwarz, A. Ošep, F. Ferroni, S. Lucey, L. Leal-Taixe: What Moves Together Belongs Together, CVPR, 2024. paper page code


C. Saltori, A. Ošep, E. Ricci, L. Leal-Taixé: Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation, ICCV, 2023. paper video code


A. Agarwalla, X. Huang, J. Ziglar, F. Ferroni, L. Leal-Taixé, J. Hays, A. Ošep, D. Ramanan: Lidar Panoptic Segmentation and Tracking without Bells and Whistles, IROS, 2023. paper page


X. Wu, K. Lau, F. Ferroni, A. Ošep, D. Ramanan: Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images, CVPR, 2023. paper video poster page


V. Fomenko, I. Elezi, D. Ramanan, L. Leal-Taixé, A. Ošep: Learning to Discover and Detect Objects, NeurIPS, 2022. paper video poster page code


P. Dendorfer, V. Yugay, A. Ošep, L. Leal-Taixé: Quo Vadis: Is Trajectory Forecasting the Key Towards Long-Term Multi-Object Tracking?, NeurIPS, 2022. paper video code


A. Kim, G. Brasó, A. Ošep, L. Leal-Taixé: PolarMOT: How far can geometric relations take us in 3D multi-object tracking?, ECCV, 2022. paper video poster code


Q. Zhou, S. Agostinho, A. Ošep, L. Leal-Taixé: Is Geometry Enough for Matching in Visual Localization?, ECCV, 2022. paper code


L. Nunes, X. Chen, R. Marcuzzi, A. Ošep, L. Leal-Taixé, C. Stachniss, J. Behley: Unsupervised Class-Agnostic Instance Segmentation of 3D LiDAR Data for Autonomous Vehicles, RA-L, 2022. paper code


M. Gladkova, N. Korobov, N. Demmel, A. Ošep, L. Leal-Taixé, D. Cremers: DirectTracker: 3D Multi-Object Tracking Using Direct Image Alignment and Photometric Bundle Adjustment, IROS, 2022. paper video page


N. Peri, J. Luiten, M. Li, A. Ošep, L. Leal-Taixé, D. Ramanan: Forecasting from LiDAR via Future Object Detection, CVPR, 2022. paper code


M. Kolmet, Q. Zhou, A. Ošep, L. Leal-Taixé: Text2Pos: Text-to-point-cloud cross-modal localization, CVPR, 2022. paper code


Y. Liu, I. Zulfikar, J. Luiten, A. Dave, D. Ramanan, B. Leibe, A. Ošep, L. Leal-Taixé: Opening up Open-World Tracking, CVPR (oral), 2022. paper code


S. Agostinho, A. Ošep, A. Del Bue, L. Leal-Taixé: (Just) A Spoonful of Refinements Helps the Registration Error Go Down, ICCV (oral), 2021. paper code


M. Fabbri, G. Brasó, G. Maugeri, A. Ošep, R. Gasparini, O. Cetintas, S. Calderara, L. Leal-Taixé, R. Cucchiara: MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?, ICCV, 2021. paper video


M. Aygün, A. Ošep, M. Weber, M. Maximov, C. Stachniss, J. Behley, L. Leal-Taixé: 4D Panoptic LiDAR Segmentation, CVPR, 2021. paper video poster code


A. Kim, A. Ošep, L. Leal-Taixé: EagerMOT: 3D Multi-Object Tracking via Sensor Fusion, ICRA, 2021. paper video code


M. Weber, J. Xie, M. Collins, Y. Zhu, P. Voigtlaender, H. Adam, B. Green, A. Geiger, B. Leibe, D. Cremers, A. Os̆ep, L. Leal-Taixé, L. Chen: STEP: Segmenting and Tracking Every Pixel, NeurIPS Benchmarks and Datasets, 2022. paper code


P. Dendorfer, A. Ošep, L. Leal-Taixé: Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation, ACCV, 2020. paper video page code


P. Dendorfer, A. Ošep, A. Milan, K. Schindler, D. Cremers, I. Reid, S. Leal-Taixé: MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking, IJCV, 2020. paper


J. Luiten, A. Ošep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking, IJCV, 2020. paper code blog


S. Mahadevan*, A. Athar*, A. Ošep, S. Hennen, L. Leal-Taixé, B. Leibe: Making a Case for 3D Convolutions for Object Segmentation in Videos, BMVC, 2020. paper video code


A. Athar*, S. Mahadevan*, A. Ošep, L. Leal-Taixé, B. Leibe: STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos, ECCV, 2020. paper video code


Y. Xu, A. Ošep, Y. Ban, R. Horaud, L. Leal-Taixé, X. Alameda-Pineda: How To Train Your Deep Multi-Object Tracker, CVPR, 2020. paper video code


J. Gross, A. Ošep, B. Leibe: AlignNet-3D for Fast Point Cloud Registration of Partially Observed Objects, International Conference on 3D Vision (3DV), 2019. paper video poster code


P. Voigtlaender, M. Krause, A. Ošep, J. Luiten, B. Sekar, A. Geiger, B. Leibe: {MOTS}: Multi-Object Tracking and Segmentation, CVPR, 2019. paper video code


A. Ošep, P. Voigtlaender, M. Weber, J. Luiten, B. Leibe: 4D Generic Video Object Proposals, ICRA, 2020. paper video code teaser


A. Ošep, P. Voigtlaender, J. Luiten, S. Breuers, B. Leibe: Large-Scale Object Mining for Object Discovery from Unlabeled Video, ICRA, 2019. paper video


A. Ošep, W. Mehner, P. Voigtlaender, B. Leibe: Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking, ICRA, 2018. paper video code


A. Ošep, P. Voigtlaender, J. Luiten, S. Breuers, B. Leibe: Towards Large-Scale Video Object Mining, ECCV 2018 Workshop on Interactive and Adaptive Learning in an Open World, 2018. paper


A. Ošep, W. Mehner, M. Mathias, B. Leibe: Combined Image- and World-Space Tracking in Traffic Scenes, ICRA, 2017. paper video code teaser


D. Klostermann, A. Ošep, J. Stueckler, B. Leibe: Unsupervised Learning of Shape-Motion Patterns for Objects in Urban Street Scenes, BMVC, 2016. paper video


D. Kochanov, A. Ošep, J. Stueckler, B. Leibe: Scene Flow Propagation for Semantic Mapping and Object Discovery in Dynamic Street Scenes, IROS, 2016. paper video


A. Ošep, A. Hermans, F. Engelmann, D. Klostermann, M. Mathias, B. Leibe: Multi-Scale Object Candidates for Generic Object Tracking in Street Scenes, ICRA, 2016. paper


D. Mitzel, J. Diesel, A. Ošep, U. Rafi, B. Leibe: A Fixed-Dimensional 3D Shape Representation for Matching Partially Observed Objects in Street Scenes, ICRA, 2015. paper


M. Weinmann, A. Ošep, R. Ruiters, R. Klein: Multi-View Normal Field Integration for 3D Reconstruction of Mirroring Objects, ICCV, 2013. paper


M. Weinmann, R. Ruiters, A. Ošep, C. Schwartz, R. Klein: Fusing Structured Light Consistency and Helmholtz Normals for 3D Reconstruction, BMVC, 2012. paper