Introduction to Data Annotation

nipun deelaka
6 min readNov 30, 2020

Even though you have experience in the machine learning and deep learning fields, the word ‘ data annotation’ won’t probably come up. this is because table base databases and image sets for deep learning by default are annotated. This word emerged when you are starting a project on object detection & tracking, linguistics detection & processing, etc.

So, what is data annotation? it’s basically about add labels to the data. As it’s sound we can not generate labels from meta-data of records, there we have to add more than that. The labeling method completely depends on your project. more or less, this is a detailed description of data annotation for a bounding box based object detection model.

data annotation types

Here we are going to explore Image annotation methods suitable for object detection.

  1. Bounding boxes: basically, set a rectangle covering the selected objects. mainly there are two commonly used types.
    (x1,y1,x2,y2) -> In this format we define only coordinates of lower-left corner & upper right corner. [KITTI format]( this is the format support by cv2 & open VOC format.)
KITTI format

(x,y,w,h) -> In this format (x,y) is the center of the image and w & h refers to width and height. [normalized YOLO format]( this is the format that must feed to the YOLO algorithm.)

normalized YOLO format

2. Polygonal segmentation: define complex polygonal shape near the object boundaries.

polygonal segmentation

3. Semantic Segmentation: define a label at every pixel, saying if there is an object what would be its’ type. [Used for masked-RCNN / YOLO-COCO models ]

semantic segmentation

Here, methods 2. & 3. are actually masking formats for object detection.

3D bounding box: this is also a bounding box method, but there defines a 3D box with (x, y, z) coordinates. annotation format varies a lot.

3D bounding boxing

Data annotation formats.

  • COCO (Common Object in COntext): In COCO format there are 5 types of objectives.{ object detection, keypoint detection, stuff segmentation, panoptic segmentation, image captioning}
    Here, we use a .json file to save annotation of the whole dataset. [you can create separate .json files to train, validation & test.]
COCO general format

Terminology description in COCO format

template for dataset info

info: This is used for distinguished separate versions of the same dataset and as a summary of the dataset.

template for dataset license

licenses: This is like a patent for the dataset you have newly created. [generally, creative commons licenses are used ]. On the other hand, this helps to understand constraints to use particular data for a user.

for more info

apply for a license

template for dataset categories

categories: There is a brief description of classes in the dataset. We can use super categories to interact with the data set in a more generalized manner. id must be a unique integer value.

template for dataset images

images: This is about meta-data of every image in the dataset. id must be unique values over the data set. flickr_url, coco_url & data_captured are optional.

template for annotation of objects

annotation: There must be a separate annotation for every object in the image regardless of how many objects in the same image.

id: unique to the object annotation.
image_id: id of the given image.
category_id: integer value related to the object class.
segmentation: useful only if you feed data into a model with a segmentation algorithm.
There two formats to enter segmentation to the object. you have to compress the vector you have to input. Because it carries an integer value for every pixel in the original image.

1. RLE (Run Length Encoding): provides two facilities mainly,
> Store mask compactly & make it easy to operate on mask level. [intersection, union, etc.]

2. polygon: Sake of simplicity it like defining boundaries of the mask as coordinates. But algorithm wise it’s more complex.

area: number of pixels covered by the bounding box.
bbox: box attributes in normalized COCO format.
iscrowd : boolean value. if it’s 0 then there is an object in the image. else there are several objects in the image.

  • Pascal VOC [ Pascal Visual Object Classes ]: This is a specific version of the format for store data for object detection [other than segmentation]. Pascal VOC saves XML files for each individual image in the data-set.
the general template of an open VOC file.

Terminology description in COCO format
Folder: Folder that contains the image.A lot useful to separate data-set to train, validation & test by directory.
Filename: image file name in a general context, this must be a unique value by default.
path: folder directory toward the current image.[ can be relative or absolute vary according to your annotating tool ]
size: in pixel size, not in memory size. there are two common formats
-<width> and <height> only , use for gray scaled images.
-<width>, <height> and <depth> for **RGB images**.
segmented: value for this property is most likely to be 0, because of pascal VOC forces on boundary boxes.
object: mostly, there is an array of objects which length equals the number of objects in the image [ either they are the same class or not ]
name: class of the object in text format
pose: weather the object is left-oriented or right mainly. [but there may be more orientations.]
truncated: boolean value, use 1 if only a part of an object appears in the image, else 0.
difficult: boolean value, use 1 is it’s hard to identify. this is kind of assign a higher confidence value to an object that will not get in the training.[you can get more ideas from here.]
bndbox: bounding box value in KITTI format.

Data annotation tools

there is a variety of tools for data annotation. websites, offline application( commercial & open source), SDKs, APIs, and even whole firms are specialized in data annotation

commercial tools
Lionbridge AI
Amazon Mechanical Turk / MTruk
Computer Vision Annotation Tool (CVAT)
Dataturks

online tools {kind of a free}
MakeSense AI
VGG image annotator
Scalable
LableImg

free APIs
Docker compatible — [video / image ]
Windows & Linux compatible — [ image ]
Windows, Linux & Mac compatible — [ image ]

--

--