Hi, I'm Aljosa! I come from the Alpine side of Slovenia. I am a Senior Research Scientist at NVIDIA, alum University of Bonn, RWTH Aachen University, TU Munich & Robotics Institute, Carnegie Mellon University.
Looking ahead, I believe the next frontier is memory: future agents will need to operate not for seconds, but over a lifetime. I lay out this vision in my research statement (2023), establishing (implicit) visual tracking as the key mechanism for building structured, queryable memory at test time. Our recent work on scalable feed-forward 3D reconstruction is a step in this direction.
Research
My research focuses on enabling AI systems to robustly understand the dynamic, 3D world from raw sensor streams, such as video and LiDAR. Key areas include learning directly from raw data, tracking and segmenting objects, understanding complex spatiotemporal scenes, and predicting future events in open-world environments. Hover over each topic below to explore related publications.
S. Elflein, R. Li, S. Agostinho, Ž. Gojčič, L. Leal-Leal-Taixé, A. Ošep: VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale, CVPR, 2026. paperpage
A. Ošep, T. Meinhardt, F. Ferroni, N. Peri, D. Ramanan, L. Leal-Taixe: Better Call SAL: Towards Learning to Segment Anything in Lidar, ECCV, 2024. papervideopage
A. Takmaz, C. Saltori, N. Peri, T. Meinhardt, R. de Lutio, L. Leal-Taixé, A. Ošep: Towards Learning to Complete Anything in Lidar, ICML, 2025. paperpage
A. Ošep, P. Voigtlaender, M. Weber, J. Luiten, B. Leibe: 4D Generic Video Object Proposals, ICRA, 2020. papervideocodeteaser
Academic Engagements
Invited Talks
October 2025: GRASP Seminar, University of Pennsylvania, invited talk: Segmenting What We Cannot (Directly) See, link
June 2024: CVPR 2024 Area Chair Panel, invited talk: Learning To Understand The World From Video
June 2023: CVPR 2023, Visual Perception via Learning in an Open World, invited talk: Learning To Understand The World From Video, Slides
June 2023: University of Ljubljana, invited talk: Learning To Understand The World From Video, Slides
October 2022: ECCV’22 Workshop on 3D Perception in Autonomous Driving, Details
October 2022: ECCV’22 Workshop on Cross-Modal Human-Robot Interaction, Details
April 2022: UT Austin AI colloquium, Unifying Segmentation, Tracking, and Forecasting, Slides
September 2021: ICCV’21 Workshop on 3D Object Detection from Images, 4D Panoptic LiDAR Segmentation, Slides
July 2021: RSS 2021 Workshop on Behavioral Inference of Remotely Sensed Multi-agent Systems, invited talk, Tracking Every Object and Pixel, Slides
July 2021: RSS 2021 Workshop on Perception and Control for Autonomous Navigation in Crowded, Dynamic Environments, invited talk, Tracking Every Object and Pixel, Slides, Talk
August 2021: Listed as one of three outstanding reviewers for all top-tier CV conferences in 2020/21. See the informal analysis by Simon Niklaus.
August 2021: Awarded Borchers Plaquette at RWTH Aachen University for outstanding doctoral dissertation.
June 2019: Defended my Ph.D thesis "with highest honor" (summa cum laude) at RWTH Aachen University, Germany's top-ranked engineering school.
All Publications
S. Elflein, R. Li, S. Agostinho, Ž. Gojčič, L. Leal-Leal-Taixé, A. Ošep: VGG-T3: Offline Feed-Forward 3D Reconstruction at Scale, CVPR, 2026. paperpage
G. Brasó, A. Ošep, L. Leal-Leal-Taixé: Native Segmentation Vision Transformers, NeurIPS, 2025. paperpage
A. Takmaz, C. Saltori, N. Peri, T. Meinhardt, R. de Lutio, L. Leal-Taixé, A. Ošep: Towards Learning to Complete Anything in Lidar, ICML, 2025. paperpage
Y. Zhang, A. Ošep, L. Leal-Taixé, T. Meinhardt: Zero-Shot 4D Lidar Panoptic Segmentation, CVPR, 2025. paperpage
A. Ošep, T. Meinhardt, F. Ferroni, N. Peri, D. Ramanan, L. Leal-Taixe: Better Call SAL: Towards Learning to Segment Anything in Lidar, ECCV, 2024. papervideopage
A. Chakravarthy, M. Ganesina, P. Hu, L. Leal-Taixe, S. Kong, D. Ramanan, A. Ošep: Lidar Panoptic Segmentation in an Open World, IJCV, 2024. papercode
J. Seidenschwarz, A. Ošep, F. Ferroni, S. Lucey, L. Leal-Taixe: What Moves Together Belongs Together, CVPR, 2024. paperpagecode
C. Saltori, A. Ošep, E. Ricci, L. Leal-Taixé: Walking Your LiDOG: A Journey Through Multiple Domains for LiDAR Semantic Segmentation, ICCV, 2023. papervideocode
A. Agarwalla, X. Huang, J. Ziglar, F. Ferroni, L. Leal-Taixé, J. Hays, A. Ošep, D. Ramanan: Lidar Panoptic Segmentation and Tracking without Bells and Whistles, IROS, 2023. paperpage
X. Wu, K. Lau, F. Ferroni, A. Ošep, D. Ramanan: Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images, CVPR, 2023. papervideoposterpage
V. Fomenko, I. Elezi, D. Ramanan, L. Leal-Taixé, A. Ošep: Learning to Discover and Detect Objects, NeurIPS, 2022. papervideoposterpagecode
P. Dendorfer, V. Yugay, A. Ošep, L. Leal-Taixé: Quo Vadis: Is Trajectory Forecasting the Key Towards
Long-Term Multi-Object Tracking?, NeurIPS, 2022. papervideocode
A. Kim, G. Brasó, A. Ošep, L. Leal-Taixé: PolarMOT: How far can geometric relations take us in 3D multi-object tracking?, ECCV, 2022. papervideopostercode
Q. Zhou, S. Agostinho, A. Ošep, L. Leal-Taixé: Is Geometry Enough for Matching in Visual Localization?, ECCV, 2022. papercode
L. Nunes, X. Chen, R. Marcuzzi, A. Ošep, L. Leal-Taixé, C. Stachniss, J. Behley: Unsupervised Class-Agnostic Instance Segmentation of 3D LiDAR Data for Autonomous Vehicles, RA-L, 2022. papercode
M. Gladkova, N. Korobov, N. Demmel, A. Ošep, L. Leal-Taixé, D. Cremers: DirectTracker: 3D Multi-Object Tracking Using Direct Image Alignment and Photometric Bundle Adjustment, IROS, 2022. papervideopage
N. Peri, J. Luiten, M. Li, A. Ošep, L. Leal-Taixé, D. Ramanan: Forecasting from LiDAR via Future Object Detection, CVPR, 2022. papercode
M. Kolmet, Q. Zhou, A. Ošep, L. Leal-Taixé: Text2Pos: Text-to-point-cloud cross-modal localization, CVPR, 2022. papercode
Y. Liu, I. Zulfikar, J. Luiten, A. Dave, D. Ramanan, B. Leibe, A. Ošep, L. Leal-Taixé: Opening up Open-World Tracking, CVPR (oral), 2022. papercode
S. Agostinho, A. Ošep, A. Del Bue, L. Leal-Taixé: (Just) A Spoonful of Refinements Helps the Registration Error Go Down, ICCV (oral), 2021. papercode
M. Fabbri, G. Brasó, G. Maugeri, A. Ošep, R. Gasparini, O. Cetintas, S. Calderara, L. Leal-Taixé, R. Cucchiara: MOTSynth: How Can Synthetic Data Help Pedestrian Detection and Tracking?, ICCV, 2021. papervideo
M. Aygün, A. Ošep, M. Weber, M. Maximov, C. Stachniss, J. Behley, L. Leal-Taixé: 4D Panoptic LiDAR Segmentation, CVPR, 2021. papervideopostercode
A. Kim, A. Ošep, L. Leal-Taixé: EagerMOT: 3D Multi-Object Tracking via Sensor Fusion, ICRA, 2021. papervideocode
M. Weber, J. Xie, M. Collins, Y. Zhu, P. Voigtlaender, H. Adam, B. Green, A. Geiger, B. Leibe, D. Cremers, A. Os̆ep, L. Leal-Taixé, L. Chen: STEP: Segmenting and Tracking Every Pixel, NeurIPS Benchmarks and Datasets, 2022. papercode
P. Dendorfer, A. Ošep, L. Leal-Taixé: Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation, ACCV, 2020. papervideopagecode
P. Dendorfer, A. Ošep, A. Milan, K. Schindler, D. Cremers, I. Reid, S. Leal-Taixé: MOTChallenge: A Benchmark for Single-camera Multiple Target Tracking, IJCV, 2020. paper
J. Luiten, A. Ošep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking, IJCV, 2020. papercodeblog
S. Mahadevan*, A. Athar*, A. Ošep, S. Hennen, L. Leal-Taixé, B. Leibe: Making a Case for 3D Convolutions for Object Segmentation in Videos, BMVC, 2020. papervideocode
A. Athar*, S. Mahadevan*, A. Ošep, L. Leal-Taixé, B. Leibe: STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos, ECCV, 2020. papervideocode
Y. Xu, A. Ošep, Y. Ban, R. Horaud, L. Leal-Taixé, X. Alameda-Pineda: How To Train Your Deep Multi-Object Tracker, CVPR, 2020. papervideocode
J. Gross, A. Ošep, B. Leibe: AlignNet-3D for Fast Point Cloud Registration of Partially Observed Objects, International Conference on 3D Vision (3DV), 2019. papervideopostercode
P. Voigtlaender, M. Krause, A. Ošep, J. Luiten, B. Sekar, A. Geiger, B. Leibe: {MOTS}: Multi-Object Tracking and Segmentation, CVPR, 2019. papervideocode
A. Ošep, P. Voigtlaender, M. Weber, J. Luiten, B. Leibe: 4D Generic Video Object Proposals, ICRA, 2020. papervideocodeteaser
A. Ošep, P. Voigtlaender, J. Luiten, S. Breuers, B. Leibe: Large-Scale Object Mining for Object Discovery from Unlabeled Video, ICRA, 2019. papervideo
A. Ošep, W. Mehner, P. Voigtlaender, B. Leibe: Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking, ICRA, 2018. papervideocode
A. Ošep, P. Voigtlaender, J. Luiten, S. Breuers, B. Leibe: Towards Large-Scale Video Object Mining, ECCV 2018 Workshop on Interactive and Adaptive Learning in an Open World, 2018. paper
A. Ošep, W. Mehner, M. Mathias, B. Leibe: Combined Image- and World-Space Tracking in Traffic Scenes, ICRA, 2017. papervideocodeteaser
D. Klostermann, A. Ošep, J. Stueckler, B. Leibe: Unsupervised Learning of Shape-Motion Patterns for Objects in Urban Street Scenes, BMVC, 2016. papervideo
D. Kochanov, A. Ošep, J. Stueckler, B. Leibe: Scene Flow Propagation for Semantic Mapping and Object Discovery in Dynamic Street Scenes, IROS, 2016. papervideo
A. Ošep, A. Hermans, F. Engelmann, D. Klostermann, M. Mathias, B. Leibe: Multi-Scale Object Candidates for Generic Object Tracking in Street Scenes, ICRA, 2016. paper
D. Mitzel, J. Diesel, A. Ošep, U. Rafi, B. Leibe: A Fixed-Dimensional 3D Shape Representation for Matching Partially Observed Objects in Street Scenes, ICRA, 2015. paper
M. Weinmann, A. Ošep, R. Ruiters, R. Klein: Multi-View Normal Field Integration for 3D Reconstruction of Mirroring Objects, ICCV, 2013. paper
M. Weinmann, R. Ruiters, A. Ošep, C. Schwartz, R. Klein: Fusing Structured Light Consistency and Helmholtz Normals for 3D Reconstruction, BMVC, 2012. paper