COCO-format (object detection)

COCO API よりKWCOCOの方が使いやすそう

目的に応じて微妙に違うフォーマット
JSON形式で保存
共通するフォーマットは以下の通り
一番軽い2017valのアノテーションファイルをダウンロードしてみる
この説明が分かりやすい

{
    "info": {
        "year": int,
        "version": str,
        "description": str,
        "contributor": str,
        "url": str,
        "date_created": datetime,
    },
    "licenses": [{
        "id": int,
        "name": str,
        "url": str,
    }],
    "images": [{
        "id": int,
        "width": int,
        "height": int,
        "file_name": str,
        "license": int,
        "flickr_url": str,
        "coco_url": str,
        "date_captured": datetime,
    }],
    "annotations": [annotation],
}

licenses
- 異なるライセンスの画像を扱うためlistとして保持
- ライセンス毎に1から始まる id を付与
images
- 複数の画像を扱うためlistとして保持
- id の決め方は謎 (データセット毎にユニークならOK？)

物体検出用フォーマット

以下の通りで，各物体はカテゴリID，セグメンテーションマスクの情報を持つ．

annotation{
  "id": int,
  "image_id": int,
  "category_id": int,
  "segmentation": RLE or [polygon],
  "area": float,
  "bbox": [x,y,width,height],
  "iscrowd": 0 or 1,
}

categories[{
  "id": int,
  "name": str,
  "supercategory": str,
}]

annotations
- iscrowd
  - 0 なら単一の物体が含まれている
  - 1 なら同一 category の物体が複数含まれている
- segmentation
  - iscrowd=0 なら，対象物体を囲むポリゴンの頂点座標のlist．オクルージョンで複数の領域に分割されている単一物体はポリゴンの頂点座標のlistのlistになる．
  - iscrowd=1 なら，RLE (Run-Length Encoding)．bbox で囲まれた領域内で物体が占める場所を表す情報らしい．どう考えてもCOCO-API使って計算しなきゃダメなやつ．
- 1枚の画像の中に複数(カテゴリ)の物体が存在するため，物体毎にユニークな id を付与 (overkillと言われてるけどそんなもんらしい)
categories
- supercategory はカテゴリの上位集合
- id は1からはじまる
- 2017valは90個のカテゴリ

cocoapi

Python, MatLab, LuaのAPIが提供されている． LuaのAPIは基本的な機能のみらしい．

pip install pycocotools

I am Charmie

メモとログ

COCO-format (object detection)

物体検出用フォーマット

cocoapi

cocoapiのデモ (train,val)

cocoapiのデモ (test)