Risk-Aware Planning and Assignment for Ground Vehicles

using Uncertain Perception from Aerial Vehicles

This webpage provides the information about the dataset used in our work on Risk-Aware Planning and Assignment for Ground Vehicles using Uncertain Perception from Aerial Vehicles (IROS 2020). The dataset contains the overhead images captured by a drone from the CityEnviron environment in AirSim. This webpage also acts as the datasheet for our dataset.

Download Data

Explore Data

If you use our data, please cite us as following:


  author    = {Vishnu Sharma and Maymoonah Toubeh and Lifeng Zhou and Pratap Tokekar},

  booktitle = {Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},

  title     = {Risk-Aware Planning and Assignment for Ground Vehicles using Uncertain Perception from Aerial Vehicles},

  year      = {2020},



Motivation        1

Composition        2

Collection Process        3

Processing/Cleaning/Labeling        5

Uses        5

Distribution        6

Maintenance        6


For what purpose was the dataset created? 

This dataset was created to train and test a semantic segmentation model over aerial images (view perpendicular to the ground).

Who created this dataset and on behalf of which entity?

This dataset was created by Vishnu D. Sharma, who is associated with Robotics Algorithms and Autonomous Systems (RAAS) Lab at Virginia Tech and University of Maryland-College Park. The dataset was generated in 2020.


What do the instances that comprise the dataset represent (e.g., documents, photos, people, countries)? 

That dataset contains images scene and segmentation images collected at z=-200 from the CityEnviron Environment of AirSim.  

How many instances are there in total (of each type, if appropriate)?

There are a total 480 images; 300 in the training set, and 90 images each in the validation and test sets.

Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set? 

The dataset is complete in itself.  

What data does each instance consist of?

There are two instances: Scene images, which are the RGB images captured by the downward facing camera of the drone, and the corresponding annotation for 12 classes.

Is there a label or target associated with each instance?

Yes, the following are the labels (0-11): sky, building, column_pole, road, sidewalk, tree, sign, fence, car, pedestrian, bicyclist,'unknown/rest.

 The mapping of AirSim objects to these labels is available at the following link: airsim_scripts/city_park.py at master · raaslab/airsim_scripts (github.com)

Is any information missing from individual instances?


Are relationships between individual instances made explicit (e.g., users’ movie ratings, social network links)? 


Are there recommended data splits (e.g., training, development/validation, testing)?

The images have further been divided into training, validation and test sets. There are two variations of these: ‘shuffled’, where the images are shuffled before splitting, and ‘unshuffled’, where the images were not shuffled and thus the training, validation and test set have different distribution of images (training and validation contain images from urban areas, test set contains images from suburban areas).

Are there any errors, sources of noise, or redundancies in the dataset?


Is the dataset self-contained, or does it link to or otherwise rely on external resources (e.g., websites, tweets, other datasets)? 

Data is self-contained.

Does the dataset contain data that might be considered confidential (e.g., data that is protected by legal privilege or by doctor patient confidentiality, data that includes the content of individuals' non-public communications)?


Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety? 


Does the dataset relate to people? 


Collection Process

How was the data associated with each instance acquired? Was the data directly observable (e.g., raw text, movie ratings), reported by subjects (e.g., survey responses), or indirectly inferred/derived from other data (e.g., part-of-speech tags, model-based guesses for age or language)? 

The data was directly observable and was collected using AirSim’s image APIs.

What mechanisms or procedures were used to collect the data (e.g., hardware apparatus or sensor, manual human curation, software program, software API)? 

A python program was used to collect the data autonomously. The code is available at the following link: https://github.com/raaslab/airsim_scripts/blob/master/risk_aware_data/city_park.py

If the dataset is a sample from a larger set, what was the sampling strategy (e.g., deterministic, probabilistic with specific sampling probabilities)? 


Who was involved in the data collection process (e.g., students, crowdworkers, contractors) and how were they compensated (e.g., how much were crowdworkers paid)?

Data collection process was not outsourced.

Over what timeframe was the data collected? 

Time-frame is not relevant for this dataset.

Were any ethical review processes conducted (e.g., by an institutional review board)? 


Does the dataset relate to people? 



Was any preprocessing/cleaning/labeling of the data done (e.g., discretization or bucketing, tokenization, part-of-speech tagging, SIFT feature extraction, removal of instances, processing of missing values)?

Processing was not-required.

Was the “raw” data saved in addition to the preprocessed/cleaned/labeled data (e.g., to support unanticipated future uses)? 

Scene images serve as the raw data here.

Is the software used to preprocess/clean/label the instances available? 



Has the dataset been used for any tasks already? 

Yes, the data has been used for training a Bayesian SegNet model, which was used in our work in risk-aware path planning and assignment.

Is there a repository that links to any or all papers or systems that use the dataset? 

Yes, this document contains a link to the dataset at the top.

What (other) tasks could the dataset be used for?

The dataset can be used for training deep learning models for semantic segmentation on Aerial images.

Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses?

The data contains only 12 labels in the annotation (ground truth) and cannot be extended.

Are there tasks for which the dataset should not be used? 



Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created? 


How will the dataset will be distributed (e.g., tarball on website, API, GitHub)? 

The dataset will be available for public use using UMIACS’s OBJ utility as a zip file.

When will the dataset be distributed?

The data was made public on UMIAC OBJ on Jan 2021.

Will the dataset be distributed under a copyright or other intellectual property (IP) license, and/or under applicable terms of use (ToU)? 

The use of a dataset would require citing the paper.

Have any third parties imposed IP-based or other restrictions on the data associated with the instances?


Do any export controls or other regulatory restrictions apply to the dataset or to individual instances? 


Any other comments?


Who will be supporting/hosting/maintaining the dataset?

RAAS Lab will maintain the dataset. Vishnu Dutt Sharma is the current point of contact.

How can the owner/curator/manager of the dataset be contacted (e.g., email address)?

The data manager can be reach at the following email address: vishnuds@umd.edu

Is there an erratum?


Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)? 

No plan yet.

If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? 


Will older versions of the dataset continue to be supported/hosted/maintained? 

No plan yet.

If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? 

Not yet.