Alex Bewley

Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection

A simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection. Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly. To extract relationship information, we introduce an attention mechanism that selects object pairs likely to form a relationship.

@inproceedings{salzmann2024scene,
  title={Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection},
  author={Salzmann, Tim and Ryll, Markus and Bewley, Alex and Minderer, Matthias},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2024}
}

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

When human language inputs are observations, and robot code outputs are actions, then training an Large language models (LLMs) to complete previous interactions can be viewed as training a transition dynamics model - that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments - improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%.

@article{liang2024learning,
      title={Learning to Learn Faster from Human Feedback with Language Model Predictive Control}, 
      author={Jacky Liang and Fei Xia and Wenhao Yu and Andy Zeng and Montserrat Gonzalez Arenas and Maria Attarian and Maria Bauza and Matthew Bennice and Alex Bewley and Adil Dostmohamed and Chuyuan Kelly Fu and Nimrod Gileadi and Marissa Giustina and Keerthana Gopalakrishnan and Leonard Hasenclever and Jan Humplik and Jasmine Hsu and Nikhil Joshi and Ben Jyenis and Chase Kew and Sean Kirmani and Tsang-Wei Edward Lee and Kuang-Huei Lee and Assaf Hurwitz Michaely and Joss Moore and Ken Oslund and Dushyant Rao and Allen Ren and Baruch Tabanpour and Quan Vuong and Ayzaan Wahid and Ted Xiao and Ying Xu and Vincent Zhuang and Peng Xu and Erik Frey and Ken Caluwaerts and Tingnan Zhang and Brian Ichter and Jonathan Tompson and Leila Takayama and Vincent Vanhoucke and Izhak Shafran and Maja Mataric and Dorsa Sadigh and Nicolas Heess and Kanishka Rao and Nik Stewart and Jie Tan and Carolina Parada},
      year={2024},
      eprint={2402.11450},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Winner of best conference paper award at ICRA 2024 (DLR photo)!

We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms.

@misc{open_x_embodiment_rt_x_2023,
  title={Open {X-E}mbodiment: Robotic Learning Datasets and {RT-X} Models},
  author = {Open X-Embodiment Collaboration and Abby O'Neill and Abdul Rehman and Abhiram Maddukuri and Abhishek Gupta and Abhishek Padalkar and Abraham Lee and Acorn Pooley and Agrim Gupta and Ajay Mandlekar and Ajinkya Jain and Albert Tung and 
  Alex Bewley and Alex Herzog and Alex Irpan and Alexander Khazatsky and Anant Rai and Anchit Gupta and Andrew Wang and Andrey Kolobov and Anikait Singh and Animesh Garg and Aniruddha Kembhavi and Annie Xie and Anthony Brohan and Antonin Raffin and Archit Sharma and Arefeh Yavary and Arhan Jain and Ashwin Balakrishna and Ayzaan Wahid and Ben Burgess-Limerick and Beomjoon Kim and Bernhard Sch�lkopf and Blake Wulfe and 
  Brian Ichter and Cewu Lu and Charles Xu and Charlotte Le and Chelsea Finn and Chen Wang and Chenfeng Xu and Cheng Chi and Chenguang Huang and Christine Chan and Christopher Agia and Chuer Pan and Chuyuan Fu and Coline Devin and Danfei Xu and Daniel Morton and Danny Driess and Daphne Chen and Deepak Pathak and Dhruv Shah and Dieter B�chler and Dinesh Jayaraman and Dmitry Kalashnikov and Dorsa Sadigh and Edward Johns and Ethan Foster and Fangchen Liu and Federico Ceola and Fei Xia and Feiyu Zhao and Felipe Vieira Frujeri and Freek Stulp and Gaoyue Zhou and Gaurav S. Sukhatme and Gautam Salhotra and Ge Yan and Gilbert Feng and Giulio Schiavi and Glen Berseth and Gregory Kahn and Guangwen Yang and Guanzhi Wang and Hao Su and Hao-Shu Fang and Haochen Shi and Henghui Bao and Heni Ben Amor and Henrik I Christensen and Hiroki Furuta and Homer Walke and Hongjie Fang and Huy Ha and Igor Mordatch and Ilija Radosavovic and Isabel Leal and Jacky Liang and Jad Abou-Chakra and Jaehyung Kim and Jaimyn Drake and Jan Peters and Jan Schneider and Jasmine Hsu and Jeannette Bohg and Jeffrey Bingham and Jeffrey Wu and Jensen Gao and Jiaheng Hu and Jiajun Wu and Jialin Wu and Jiankai Sun and Jianlan Luo and Jiayuan Gu and Jie Tan and Jihoon Oh and Jimmy Wu and Jingpei Lu and Jingyun Yang and 
  Jitendra Malik and Jo�o Silv�rio and Joey Hejna and Jonathan Booher and Jonathan Tompson and Jonathan Yang and Jordi Salvador and Joseph J. Lim and Junhyek Han and Kaiyuan Wang and Kanishka Rao and Karl Pertsch and Karol Hausman and Keegan Go and Keerthana Gopalakrishnan and Ken Goldberg and Kendra Byrne and Kenneth Oslund and Kento Kawaharazuka and Kevin Black and Kevin Lin and Kevin Zhang and Kiana Ehsani and Kiran Lekkala and Kirsty Ellis and Krishan Rana and Krishnan Srinivasan and Kuan Fang and Kunal Pratap Singh and Kuo-Hao Zeng and Kyle Hatch and Kyle Hsu and Laurent Itti and Lawrence Yunliang Chen and Lerrel Pinto and Li Fei-Fei and Liam Tan and Linxi "Jim" Fan and 
  Lionel Ott and Lisa Lee and Luca Weihs and Magnum Chen and Marion Lepert and Marius Memmel and Masayoshi Tomizuka and Masha Itkina and Mateo Guaman Castro and Max Spero and Maximilian Du and Michael Ahn and Michael C. Yip and Mingtong Zhang and Mingyu Ding and Minho Heo and Mohan Kumar Srirama and Mohit Sharma and Moo Jin Kim and Naoaki Kanazawa and Nicklas Hansen and Nicolas Heess and Nikhil J Joshi and Niko Suenderhauf and Ning Liu and Norman Di Palo and Nur Muhammad Mahi Shafiullah and Oier Mees and Oliver Kroemer and Osbert Bastani and Pannag R Sanketi and Patrick "Tree" Miller and Patrick Yin and Paul Wohlhart and Peng Xu and Peter David Fagan and Peter Mitrano and Pierre Sermanet and 
  Pieter Abbeel and Priya Sundaresan and Qiuyu Chen and Quan Vuong and Rafael Rafailov and Ran Tian and Ria Doshi and Roberto Mart{'i}n-Mart{'i}n and Rohan Baijal and Rosario Scalise and Rose Hendrix and Roy Lin and Runjia Qian and Ruohan Zhang and Russell Mendonca and Rutav Shah and Ryan Hoque and Ryan Julian and Samuel Bustamante and Sean Kirmani and 
  Sergey Levine and Shan Lin and Sherry Moore and Shikhar Bahl and Shivin Dass and Shubham Sonawani and Shuran Song and Sichun Xu and Siddhant Haldar and Siddharth Karamcheti and Simeon Adebola and Simon Guist and Soroush Nasiriany and Stefan Schaal and Stefan Welker and Stephen Tian and Subramanian Ramamoorthy and Sudeep Dasari and Suneel Belkhale and Sungjae Park and Suraj Nair and Suvir Mirchandani and Takayuki Osa and Tanmay Gupta and Tatsuya Harada and Tatsuya Matsushima and Ted Xiao and Thomas Kollar and Tianhe Yu and Tianli Ding and Todor Davchev and Tony Z. Zhao and Travis Armstrong and Trevor Darrell and Trinity Chung and Vidhi Jain and 
  Vincent Vanhoucke and Wei Zhan and Wenxuan Zhou and Wolfram Burgard and Xi Chen and Xiangyu Chen and Xiaolong Wang and Xinghao Zhu and Xinyang Geng and Xiyuan Liu and Xu Liangwei and Xuanlin Li and Yansong Pang and Yao Lu and Yecheng Jason Ma and Yejin Kim and Yevgen Chebotar and Yifan Zhou and Yifeng Zhu and Yilin Wu and Ying Xu and Yixuan Wang and Yonatan Bisk and Yongqiang Dou and Yoonyoung Cho and Youngwoon Lee and Yuchen Cui and Yue Cao and Yueh-Hua Wu and Yujin Tang and Yuke Zhu and Yunchu Zhang and Yunfan Jiang and Yunshuang Li and Yunzhu Li and Yusuke Iwasawa and Yutaka Matsuo and Zehan Ma and Zhuo Xu and Zichen Jeff Cui and Zichen Zhang and Zipeng Fu and Zipeng Lin},
  howpublished  = {\url{https://arxiv.org/abs/2310.08864}},
  year = {2023},
}

Robots That Can See: Leveraging Human Pose for Trajectory Prediction

The proposed Human Scene Transformer observes past human positions, head orientations, and 3D skeletal keypoints using onboard robot sensors to predict their future trajectories. This innovative model not only captures the inherent uncertainty in predicting future human trajectories but also attains state-of-the-art performance on widely recognized prediction benchmarks. Furthermore, it emerged victorious in the prestigious ICCV 2023 challenge for end-to-end Human Trajectory Forecasting. The model's success underscores its effectiveness in handling complex scenarios and advancing the field of trajectory prediction.

@article{salzmann2023robots,
  title={Robots That Can See: Leveraging Human Pose for Trajectory Prediction},
  author={Salzmann, Tim and Chiang, Hao-Tien Lewis and Ryll, Markus
    and Sadigh, Dorsa and Parada, Carolina and Bewley, Alex},
  journal={IEEE Robotics and Automation Letters},
  year={2023},
  publisher={IEEE}
}

Agile Catching with Whole-Body MPC and Blackbox Policy Learning

This work studies the challenging task of robot catching by presenting the relative merits of two fundamentally different solution strategies: (i) Model Predictive Control using accelerated constrained trajectory optimization, and (ii) Reinforcement Learning using zeroth-order optimization.

@inproceedings{abeyruwan2023agile,
  title={Agile Catching with Whole-Body MPC and Blackbox Policy Learning},
  author={Abeyruwan, Saminda and Bewley, Alex and Boffi, Nicholas Matthew and Choromanski, Krzysztof Marcin and D�Ambrosio, David B and Jain, Deepali and Sanketi, Pannag R and Shankar, Anish and Sindhwani, Vikas and Singh, Sumeet and others},
  booktitle={Learning for Dynamics and Control Conference},
  pages={851--863},
  year={2023},
  organization={PMLR}
}

Robotic Table Tennis: A Case Study into a High Speed Learning System

This work details the design of a robotic research platform composed of a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots.

@inproceedings{dambrosio2023robotic,
      title={Robotic Table Tennis: A Case Study into a High Speed Learning System}, 
      author={David B. D'Ambrosio and Jonathan Abelian and Saminda Abeyruwan and Michael Ahn and Alex Bewley and Justin Boyd and Krzysztof Choromanski and Omar Cortes and Erwin Coumans and Tianli Ding and Wenbo Gao and Laura Graesser and Atil Iscen and Navdeep Jaitly and Deepali Jain and Juhana Kangaspunta and Satoshi Kataoka and Gus Kouretas and Yuheng Kuang and Nevena Lazic and Corey Lynch and Reza Mahjourian and Sherry Q. Moore and Thinh Nguyen and Ken Oslund and Barney J Reed and Krista Reymann and Pannag R. Sanketi and Anish Shankar and Pierre Sermanet and Vikas Sindhwani and Avi Singh and Vincent Vanhoucke and Grace Vesom and Peng Xu},
      year={2023},
      booktitle={Robotics: Science and Systems},
}

Video OWL-ViT: Temporally-Consistent Open-World Localization in Video

We show successful transfer of open-world models by building on the OWL-ViT open-vocabulary detection model and adapting it to video by adding a transformer decoder. The decoder propagates object representations recurrently through time by using the output tokens for one frame as the object queries for the next.

@inproceedings{heigold2023video,
  title={Video OWL-ViT: Temporally-consistent open-world localization in video},
  author={Heigold, Georg and Minderer, Matthias and Gritsenko, Alexey and Bewley, Alex
    and Keysers, Daniel and Lu{\v{c}}i{\'c}, Mario and Yu, Fisher and Kipf, Thomas},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={13802--13811},
  year={2023}
}

i-sim2real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. i-S2R bootstraps from a simple model of human behaviour and alternates between training in simulation and deploying in the real world. In each iteration, both the human behaviour model and the policy are refined, leading to longer rallies.

@inproceedings{abeyruwan2022sim2real,
  title={i-sim2real: Reinforcement learning of robotic policies in tight human-robot interaction loops},
  author={Abeyruwan, Saminda Wishwajith and Graesser, Laura and DAmbrosio, David B and Singh, Avi and Shankar, Anish and
   Bewley, Alex and Jain, Deepali and Choromanski, Krzysztof Marcin and Sanketi, Pannag R},
  booktitle={Conference on Robot Learning},
  pages={212--224},
  year={2022},
  organization={PMLR}
}

Local Metrics for Multi-Object Tracking

Local metrics provide an intuitive mechanism to explicitly specify the trade-off between detection and association for evaluating object trackers.

@article{valmadre2021local,
  title={Local Metrics for Multi-Object Tracking},
  author={Valmadre, Jack and Bewley, Alex and Huang, Jonathan and 
    Sun, Chen and Sminchisescu, Cristian and Schmid, Cordelia},
  journal={arXiv preprint arXiv:2104.02631},
  year={2021}
}

RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection

Range Sparse Net (RSN) is a simple, efficient, and accurate 3D object framework for real time detection using LiDAR with extensive range. Lightweight 2D convolutions on dense range images results in significantly fewer selected foreground points, thus enabling the later sparse convolutions in RSN to efficiently operate. RSN runs at more than 60 frames per second on a 150mx150m detection region on Waymo Open Dataset (WOD) while being more accurate than previously published detectors.

@InProceedings{Sun_2021_CVPR,
    author    = {Sun, Pei and Wang, Weiyue and Chai, Yuning and Elsayed, Gamaleldin and Bewley, Alex and Zhang, Xiao and Sminchisescu, Cristian and Anguelov, Dragomir},
    title     = {RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {5725-5734}
}

Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection

A novel 3D object detection framework that processes LiDAR data directly on its native range image representation. To overcome scale sensitivity in this perspective view, a range-conditioned dilation (RCD) layer is proposed to dynamically adjust a continuous dilation rate as a function of the measured range. Unparalleled performance is achieved at long range detection when combined with a second stage refinement.

@ inproceedings{bewley2020range,
  title={Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection},
  author={Bewley, Alex and Sun, Pei and Mensink, Thomas and Anguelov, Dragomir and Sminchisescu, Cristian},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2020}
}

Large Scale Outdoor Scene Reconstruction and Correction with Vision

The BOR²G system developed at the Oxford Robotics Institute fuses data from multiple sensor modalities (cameras, lidars, or both) and regularizes the resulting 3D model. We use a compressed 3D data structure which allows us to operate over a large scale. A earned correction mechanism which takes the global context of the reconstruction and adjusts the constructed mesh addressing pathological errors.

@article{tanner2020large,   
    title={Large-scale outdoor scene reconstruction and correction with vision},   
    author={Tanner, Michael and Pini{\'e}s, Pedro and Paz, Lina Mar{\'\i}a and
      S{\u{a}}ftescu, {\c{S}}tefan and Bewley, Alex and Jonasson, Emil and 
      Newman, Paul}, 
    journal={The International Journal of Robotics Research},   
    pages={0278364920937052},   
    year={2020},
    publisher={SAGE Publications Sage UK: London, England} 
}

Learning to Drive from Simulation without Real World Labels

A method for transferring a vision-based lane following driving policy from simulation to operation on a rural road without any real-world labels. Our approach leverages recent advances in image-to-image translation to achieve domain transfer while jointly learning a single-camera control policy from simulation control labels.

@inproceedings{bewley2018learning,
  author={Bewley, Alex and Rigley, Jessica and Liu, Yuxuan and Hawke, Jeffrey and 
        Shen, Richard and Lam, Vinh-Dieu and Kendall, Alex},
  booktitle={2019 International Conference on Robotics and Automation (ICRA)}, 
  title={Learning to Drive from Simulation without Real World Labels}, 
  year={2019},
  pages={4818-4824},
  doi={10.1109/ICRA.2019.8793668}
}

Dropout Distillation for Efficiently Estimating Model Confidence

An efficient way to output better calibrated uncertainty scores from neural networks. These Distilled Dropout Network makes standard (non-Bayesian) neural networks more introspective by adding a new training loss.

@article{gurau2018dropout,
  title={Dropout distillation for efficiently estimating model confidence},
  author={Gurau, Corina and Bewley, Alex and Posner, Ingmar},
  journal={arXiv preprint arXiv:1809.10562},
  year={2018}
}

Learning to Drive in a Day with Deep Reinforcement Learning

This work demonstrates model-free deep reinforcement learning on an autonomous car in the real world. With a handful of exploration and optimisation steps performed on the single onboard NVIDIA DRIVE PX2, our model-free algorithm learnt to follow its lane without any prior map.

@inproceedings{kendall2018learning,
  author={A. {Kendall} and J. {Hawke} and D. {Janz} and P. {Mazur} and D. {Reda} and 
      J. {Allen} and V. {Lam} and A. {Bewley} and A. {Shah}},
  booktitle={2019 International Conference on Robotics and Automation (ICRA)}, 
  title={Learning to Drive in a Day}, 
  year={2019},
  pages={8248-8254},
  doi={10.1109/ICRA.2019.8793742}
}

Neural Stethoscopes: Unifying Analytic, Auxiliary and Adversarial Network Probing

This work unifies auxiliary tasks, adversarial information removal and side tasks analysis with a single multi-task learning framework we call neural stethoscopes. Neural stethoscopes are then used to interrogate specific visual cues a network learns in the context of intuitive physics. Furthermore, we are able to actively de-bias network predictions as well as enhance performance via suitable auxiliary and adversarial stethoscope losses.

@article{fuchs2018neural,
    title={Neural Stethoscopes: Unifying Analytic, Auxiliary and Adversarial Network Probing},
    author={Fuchs, Fabian B and Groth, Oliver and Kosoriek, Adam R and 
      Bewley, Alex and Wulfmeier, Markus and Vedaldi, Andrea and Posner, Ingmar},
    journal={arXiv preprint arXiv:1806.05502},
    year={2018}
}

Deep Cosine Metric Learning for Person Re-Identification

This work presents a method for learning a feature embedding where the cosine similarity is effectively optimised through a simple re-parametrization of the conventional softmax classification regime. At test time, the final classification layer can be stripped of the Network, facilitating nearest neighbour queries on unseen individuals using the cosine similarity metric.

  @inproceedings{wojke2018deep,
      title = {Deep Cosine Metric Learning for Person Re-Identification},
      author = {Wojke, Nicolai and Bewley, Alex},
      booktitle = {IEEE Winter Conference on Applications of Computer Vision},
      year      = {2018}
  }

Incremental Adversarial Domain Adaptation for Continually Changing Environments

Continuous appearance shifts such as changes in weather and lighting conditions can impact the performance of deployed machine learning models. Unsupervised domain adaptation aims to address this challenge, though current approaches do not utilise the continuity of the occurring shifts. This work presents an adversarial approach for lifelong, incremental domain adaptation which benefits from unsupervised alignment to a series of sub-domains which successively diverge from the labelled source domain.

@inproceedings{wulfmeier2017incremental,
  title={Incremental Adversarial Domain Adaptation for Continually Changing Environments},
  author={Wulfmeier, Markus and Bewley, Alex and Posner, Ingmar},
  booktitle={International Conference on Robotics and Automation (ICRA)},
  year={2018}
}

Meshed Up: Learnt Error Correction in 3D Reconstructions

Dense reconstructions often contain errors that prior work has so far minimised using high quality sensors and regularising the output. Nevertheless, errors still persist. This paper proposes a machine learning technique to identify errors in three dimensional (3D) meshes. Beyond simply identifying errors, our method quantifies both the magnitude and the direction of depth estimate errors when viewing the scene.

@inproceedings{tanner2018meshed,
  title={Meshed Up: Learnt Error Correction in 3D Reconstructions},
  author={Tanner, Michael and Saftescu, Stefan and Bewley, Alex and Newman, Paul},
  booktitle={International Conference on Robotics and Automation (ICRA)},
  year={2018}
}

Hierarchical Attentive Recurrent Tracking

Inspired by how the human visual cortex employs spatial attention and separate “where” and “what” processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos.

@inproceedings{Kosiorek2017hierarchical,
   title = {Hierarchical Attentive Recurrent Tracking},
   author = {Kosiorek, Adam R and Bewley, Alex and Posner, Ingmar},
   booktitle = {Neural Information Processing Systems},
   url = {http://www.robots.ox.ac.uk/~mobile/Papers/2017NIPS_AdamKosiorek.pdf},
   pdf = {http://www.robots.ox.ac.uk/~mobile/Papers/2017NIPS_AdamKosiorek.pdf},
   year = {2017},
   month = {December}
}

DeepSORT: Simple Online and Realtime Tracking with a Deep Association Metric

Building on the success of the SORT tracking framework, this work extends the location based tracker with appearance based association optimised via metric learning on a deep neural network.

@inproceedings{Wojke2017simple,
  title={Simple Online and Realtime Tracking with a Deep Association Metric},
  author={Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich},
  booktitle={2017 IEEE International Conference on Image Processing (ICIP)},
  year={2017},
  pages={3645--3649}
}

Addressing Appearance Change in Outdoor Robotics with Adversarial Domain Adaptation

Appearance changes due to weather and seasonal conditions represent a strong impediment to the robust implementation of machine learning systems in outdoor robotics. This work develops a framework for applying adversarial techniques to adapt popular, state-of-the-art network architectures with the additional objective to be invariant across conditions.

@inproceedings{wulfmeier2017addressing,
  title={Addressing Appearance Change in Outdoor Robotics with Adversarial Domain Adaptation},
  author={Wulfmeier, Markus and Bewley, Alex and Posner, Ingmar},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
  year={2017}
}

What Makes a Place? Building Bespoke Place Dependent Object Detectors for Robotics

This paper is about enabling robots to improve their perceptual performance through repeated use in their operating environment, creating local expert detectors fitted to the places through which a robot moves.

@inproceedings{hawke2017makes,
  title={What Makes a Place? Building Bespoke Place Dependent Object Detectors for Robotics},
  author={Hawke, Jeffrey and Bewley, Alex and Posner, Ingmar},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems},
  year={2017}
}

Vision based Detection and Tracking in Dynamic Environments with Minimal Supervision

My PhD thesis in the format of thesis-by-publication composed mainly from papers competed between 2013-2016. Submitted late 2016, accepted 2017 and finally published publically in 2018.

  @phdthesis{bewleyphdthesis,
      author = {Alex J. Bewley},
      title = {Vision based detection and tracking in dynamic environments with minimal supervision},
      school = {Queensland University of Technology},
      year = {2018},
      doi = {10.5204/thesis.eprints.116014},
      url = {https://eprints.qut.edu.au/116014/},
      pdf = {http://alex.bewley.ai/papers/BewleyThesisPhD.pdf}
  }

SORT: Simple Online and Realtime Tracking

This work presents a fast, yet simple, technique for updating trajectory estimates within an online multiple object tracking framework. Furthermore, the impact of detection quality on tracking is highlighted by achieving stat-of-the-art performance on a recent tracking benchmark.

@inproceedings{Bewley2016_sort,
  author={Bewley, Alex and Ge, Zongyuan and Ott, Lionel and Ramos, Fabio and Upcroft, Ben},
  booktitle={2016 IEEE International Conference on Image Processing (ICIP)},
  title={Simple online and realtime tracking},
  year={2016},
  pages={3464-3468},
  doi={10.1109/ICIP.2016.7533003}
}

Background Modelling with Applications to Visual Object Detection in an Open Pit Mine

This work investigates the use of appearance based object detection in an open pit mine. Various forms of background modelling techniques are explored for adapting a pretrained detector to the novel environment.

    @article {Bewley2017JFR,
      author = {Bewley, Alex and Upcroft, Ben},
      title = {Background Appearance Modeling with Applications to Visual Object Detection in an Open-Pit Mine},
      journal = {Journal of Field Robotics},
      volume = {34},
      number = {1},
      issn = {1556-4967},
      url = {http://dx.doi.org/10.1002/rob.21667},
      doi = {10.1002/rob.21667},
      pages = {53--73},
      year = {2017},
    }

ALExTRAC: Affinity Learning by Exploring Temporal Reinforcement within Association Chains

This paper presents a self-supervised approach for learning to associate object detections in a video sequence as often required in tracking-by-detection systems.

  @inproceedings{Bewley2016_alextrac,
  author = {Bewley, Alex and Ott, Lionel and Ramos, Fabio and Upcroft, Ben},
  booktitle = {International Conference on Robotics and Automation (ICRA)},
  title = {{ALExTRAC: Affinity Learning by Exploring Temporal Reinforcement within Association Chains}},
  year = {2016}
  }

Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks

A novel deep convolutional neural network (DCNN) architecture is proposed for fine-grained image classification. This architecture, called MixDCNN, combines the output of several DCNNs within a mixture model framework and is shown to outperform other methods.

  @inproceedings{GeWACV2016,
  author    = {ZongYuan Ge and Alex Bewley and Christopher McCool and Ben Upcroft and Peter Corke and Conrad Sanderson},
  title     = {Fine-Grained Classification via Mixture of Deep Convolutional Neural Networks},
  booktitle = {Winter Conference on the Applications of Computer Vision (WACV)},
  publisher = {IEEE},
  year      = {2016}
  }

From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision

A background modeling approach to reducing the false positive rate of a pre-trained object detector for use in an open-pit mining environment.

@inproceedings{BewleyFSR2015,
  author    = {Alex Bewley and
               Ben Upcroft},
  title     = {From ImageNet to Mining: Adapting Visual Object Detection with Minimal Supervision},
  booktitle = {{F}ield and {S}ervice {R}obotics ({FSR})},
  year      = {2015}
}

Fine-Grained Bird Species Recognition via Hierarchical Subset Learning

This paper presents a novel method to improve fine-grained classification based on hierarchical subset learning. First a similarity tree is formed where classes with strong visual correlations are grouped into subsets. An expert local classifier with strong discriminative power to distinguish visually similar classes is then learnt for each subset.

@inproceedings{Ge_MSBCC_15,
  author = {Ge, ZongYuan and McCool, Christopher and Sanderson, Conrad
          and Bewley, Alex and Chen, Zetao and Corke, Peter},
  title = {{Fine-Grained} Bird Species Recognition via Hierarchical Subset Learning},
  booktitle = {{IEEE} International Conference on Image Processing},
  month = {sep},
  year = {2015}
}

Online Self-Supervised Multi-Instance Segmentation of Dynamic Objects

A training free method for detecting and tracking moving objects is presented and evaluated with video footage from a moving camera.

@inproceedings{bewley2014online,
  title={Online self-supervised multi-instance segmentation of dynamic objects},
  author={Bewley, Alex and Guizilini, Vitor and Ramos, Fabio and Upcroft, Ben},
  booktitle={Robotics and Automation (ICRA), 2014 IEEE International Conference on},
  pages={1296--1303},
  year={2014},
  organization={IEEE}
}

Advantages of Exploiting Projection Structure for Segmenting Dense 3D Point Clouds

A simple, yet efficient method for finding nearest neighbours in projected 3D point clouds is presented with applications towards object segmentation.

@inproceedings{bewley2013advantages,
  title={Advantages of exploiting projection structure for segmenting dense 3D point clouds},
  author={Bewley, Alex and Upcroft, Ben},
  booktitle={Australian Conference on Robotics and Automation},
  year={2013}
}

Development of a Dragline In-Bucket Bulk Density Monitor

This paper details the implementation and trialling of a prototype in-bucket bulk density monitor on a production dragline.

@article{bewley2011development,
  title={Development of a dragline in-bucket bulk density monitor},
  author={Bewley, Alex and Shekhar, Rajiv and Upcroft, Ben and Lever, Paul},
  year={2011},
  publisher={CRC Mining}
}

Real-Time Volume Estimation of a Dragline Payload

This paper presents a method for measuring the in-bucket payload volume on a dragline excavator for the purpose of estimating material bulk density in real-time.

@inproceedings{bewley2011real,
  title={Real-time volume estimation of a dragline payload},
  author={Bewley, Alex and Shekhar, Rajiv and Leonard, Sam and Upcroft, Ben and Lever, Paul},
  booktitle={Robotics and Automation (ICRA), 2011 IEEE International Conference on},
  pages={1571--1576},
  year={2011},
  organization={IEEE}
}