tensorflow之object_detection模块

切换当前目录至models文件夹（请在此路径下操作，不然出现许多多多多多多…错误）

当前使用版本为Fix ML Engine Dashboard link (#1599 )

Installation

适当修改

Add Libraries to PYTHONPATH
对应 linux下添加slim文件夹到环境变量
1
2
3
> # From tensorflow/models/
> export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
>

直接将slim文件夹添加到python path中（推荐使用），pycharm可参考http://blog.csdn.net/wh357589873/article/details/53204024

或进行如下代替操作（不推荐）

trainer.py中from deployment import model_deploy改为from slim.deployment import model_deploy,deployment在slim文件夹下
object_detection\models文件夹下的*_feature_extractor.py中from nets 改为from slim.nets ,nets在slim文件夹下
slim/nets文件夹下的inception_utils.py和resnet_utils.py中from nets 改为from slim.nets ,nets在slim文件夹下

Protobuf Compilation

1
2
3

> # From tensorflow/models/
> protoc object_detection/protos/*.proto --python_out=.
>

用protoc生成protos文件夹下的所有.proto对应的pb2文件

使用protoc生成string_int_label_map_pb2.py

1	protoc object_detection/protos/string_int_label_map.proto --python_out=.

注：protoc代码内并没有提供，需自行下载，注意下载3.0以上的，生成python3以上代码

下载地址：http://repo1.maven.org/maven2/com/google/protobuf/protoc/

from object_detection.protos import input_reader_pb2
from object_detection.protos import model_pb2
from object_detection.protos import pipeline_pb2
from object_detection.protos import train_pb2

对应地，在protos文件夹中生成，而生成 model_pb2时需要生成ssd_pb2和faster_rcnn_pb2,生成ssd_pb2又需要如下pb2文件（如下为在object_detection文件夹下操作的提醒，也说明了ssd_pb2.py和faster_rcnn_pb2.py关联的一些文件）

object_detection/protos/anchor_generator.proto: File not found.
object_detection/protos/box_coder.proto: File not found.
object_detection/protos/box_predictor.proto: File not found.
object_detection/protos/hyperparams.proto: File not found.
object_detection/protos/image_resizer.proto: File not found.
object_detection/protos/matcher.proto: File not found.
object_detection/protos/losses.proto: File not found.
object_detection/protos/post_processing.proto: File not found.
object_detection/protos/region_similarity_calculator.proto: File not found.

生成faster_rcnn_pb2文件需要如下pb2文件

object_detection/protos/anchor_generator.proto: File not found.
object_detection/protos/box_predictor.proto: File not found.
object_detection/protos/hyperparams.proto: File not found.
object_detection/protos/image_resizer.proto: File not found.
object_detection/protos/losses.proto: File not found.
object_detection/protos/post_processing.proto: File not found.

依次生成上述pb2文件

anchor_generator_pb2.py关联grid_anchor_generator_pb2 与ssd_anchor_generator_pb2

box_coder_pb2.py关联faster_rcnn_box_coder_pb2 mean_stddev_box_coder_pb2 square_box_coder_pb2

matcher_pb2.py关联argmax_matcher_pb2 bipartite_matcher_pb2

pipeline_pb2.py 关联eval_pb2

train_pb2.py 关联optimizer_pb2

总之，就是生成protos文件夹下的所有.proto对应的pb2文件

Testing the Installation

1 2	> python object_detection/builders/model_builder_test.py >

运行model_builder_test.py文件，结果如下

1
2
3

----------------------------------------------------------------------
Ran 6 tests in 0.003s
OK

Configuring an object detection pipeline

总览

配置文件分为五个部分

model configuration,定义了训练什么类型的模型(如meta-architecture, feature extractor)
train_config,决定哪些参数应该被用来训练模型参数（如SGD参数，输入预处理和特征提取初始化值）。
eval_config,决定了哪些指标将被进行评估报告（目前仅支持PASCAL VOC指标）
train_input_config, 定义了模型训练时用了哪些数据集
eval_input_config, 定义了模型进行评估是哪些数据集。通常这应该与训练输入数据集不同

配置文件架构如下：

model {
(... Add model config here...)
}

train_config : {
(... Add train_config here...)
}

train_input_reader: {
(... Add train_input configuration here...)
}

eval_config: {
}

eval_input_reader: {
(... Add eval_input configuration here...)
}

可参考samples/configs文件夹下的config文件，如

# Faster R-CNN with Resnet-101 (v1), configured for Pascal VOC Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
       faster_rcnn {
                    num_classes: 20
                    image_resizer {
                                   keep_aspect_ratio_resizer {
                                                              min_dimension: 600
                                                              max_dimension: 1024
                                                              }
                                   }
                    feature_extractor {
                                       type: 'faster_rcnn_resnet101'
                                       first_stage_features_stride: 16
                                       }
                    first_stage_anchor_generator {
                                                  grid_anchor_generator {
                                                                         scales: [0.25, 0.5, 1.0, 2.0]
                                                                         aspect_ratios: [0.5, 1.0, 2.0]
                                                                         height_stride: 16
                                                                         width_stride: 16
                                                                         }
                                                  }    
                    first_stage_box_predictor_conv_hyperparams {      
                                                                op: CONV
                                                                regularizer {
                                                                             l2_regularizer {
                                                                                             weight: 0.0
                                                                                             }
                                                                             }
                                                                initializer {
                                                                             truncated_normal_initializer {
                                                                                                           stddev: 0.01
                                                                                                           }
                                                                             }
                                                                }
                    first_stage_nms_score_threshold: 0.0
                    first_stage_nms_iou_threshold: 0.7
                    first_stage_max_proposals: 300
                    first_stage_localization_loss_weight: 2.0
                    first_stage_objectness_loss_weight: 1.0
                    initial_crop_size: 14
                    maxpool_kernel_size: 2
                    maxpool_stride: 2
                    second_stage_box_predictor {
                                                mask_rcnn_box_predictor {
                                                                         use_dropout: false
                                                                         dropout_keep_probability: 1.0
                                                                         fc_hyperparams {
                                                                                         op: FC
                                                                                         regularizer {
                                                                                                      l2_regularizer {
                                                                                                                      weight: 0.0
                                                                                                                      }
                                                                                                      }
                                                                                         initializer {
                                                                                                      variance_scaling_initializer {
                                                                                                                                    factor: 1.0 
                                                                                                                                    uniform: true
                                                                                                                                    mode: FAN_AVG
                                                                                                                                    }
                                                                                                      }
                                                                                         }
                                                                         }
                                                second_stage_post_processing {
                                                                              batch_non_max_suppression {
                                                                                                         score_threshold: 0.0
                                                                                                         iou_threshold: 0.6
                                                                                                         max_detections_per_class: 100
                                                                                                         max_total_detections: 300
                                                                                                         }
                                                                              score_converter: SOFTMAX
                                                                              }
                                                second_stage_localization_loss_weight: 2.0
                                                second_stage_classification_loss_weight: 1.0
                                                }
                    }      
train_config: {
               batch_size: 1
               optimizer {
                          momentum_optimizer: {
                                               learning_rate: {
                                                               manual_step_learning_rate {
                                                                                          initial_learning_rate: 0.0001
                                                                                          schedule {
                                                                                                    step: 0
                                                                                                    learning_rate: .0001
																								  }
                                                                                          schedule {
                                                                                                    step: 500000
                                                                                                    learning_rate: .00001
																						 		  }
                                                                                          schedule {
                                                                                                    step: 700000
                                                                                                    learning_rate: .000001
																								   }
                                                                                          }
                                                               }
                                                momentum_optimizer_value: 0.9
                                               }
                          use_moving_average: false
                          }
               gradient_clipping_by_norm: 10.0
               fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
  			   from_detection_checkpoint: true
  			   num_steps: 800000
  			   data_augmentation_options {
                                          random_horizontal_flip {}
                                          }
                }

train_input_reader: {
                     tf_record_input_reader {
                                             input_path: "PATH_TO_BE_CONFIGURED/pascal_voc_train.record"
                                             }
                     label_map_path: "PATH_TO_BE_CONFIGURED/pascal_voc_label_map.pbtxt"
                     }

eval_config: {
              num_examples: 4952
              }

eval_input_reader: {
                    tf_record_input_reader {
                                            input_path: "PATH_TO_BE_CONFIGURED/pascal_voc_val.record"
                                            }
                    label_map_path: "PATH_TO_BE_CONFIGURED/pascal_voc_label_map.pbtxt"
                    }

模型参数初始化(预训练模型)

虽然可选，但强烈建议用户利用其他对象检测检查点(checkpoints)。从头开始训练一个目标检测器可能需要几天时间。为加快训练过程，建议用户从预先存在的对象分类或检测点重新使用特征提取器参数。train_config提供了两个字段指定预先存在的检查点：fine_tune_checkpoint和from_detection_checkpoint。fine_tune_checkpoint应提供一个到现有检查点的路径（如：“/usr/home/username/checkpoint/model.ckpt-#####”。 from_detection_checkpoint是一个布尔值。如果为false，则假定检查点来自对象分类检查点。请注意，从检测点开始通常会导致比分类检查点更快的训练作业。提供的检查点列表可以在这里找到。

输入预处理

train_config中data_augmentation_options可用于指定的训练数据是如何被修改。此字段是可选的。

SGD参数

train_config剩余的参数是梯度下降的超参数。请注意，这些配置文件中提供的最佳学习率可能取决于训练设置的具体情况（例如，迭代次数，gpu类型）。

配置评估器

目前的评估固定在由PASCAL VOC挑战定义的生成指标上。参数eval_config设置为合理的默认值，通常不需要配置

Preparing Inputs

生成PASCAL VOC TFRecord文件

create_pascal_tf_record.py和label_map_util.py文件中存在编解码的错误。create_pascal_tf_record.py在#1614的提交上得到了解决，utils文件夹下的label_map_util.py在第104行代码
1
label_map_string = fid.read()
修改为
1
label_map_string = fid.read().decode('utf-8')

见下图显示

然后运行create_pascal_tf_record.py，以生成pascal2012训练用的record为例，其后参数如下
1
--data_dir=F:/Database/VOC/VOCtrainval_11-May-2012/VOCdevkit --year=VOC2012 --set=train --output_path=pascal_train.record
由于运行目录需要是models文件夹，可用pycharm打开至models文件夹，在Run选项下的Edit Configurations下设置参数

生成验证所需的record
1
--data_dir=F:/Database/VOC/VOCtrainval_11-May-2012/VOCdevkit --year=VOC2012 --set=val --output_path=pascal_val.record
这样在tensorflow/models/object_detection目录下生成了两个TFRecord文件pascal_train.record和pascal_val.record
PASCAL VOC数据集的label map 可以在data/pascal_label_map.pbtxt找到

Train for VOC

编码问题修改

train.py中的get_configs_from_pipeline_file()函数内text_format.Merge(f.read(), pipeline_config)改为text_format.Merge(f.read().decode(‘utf-8’), pipeline_config)

python2到python3的修改

进行如下修改

https://github.com/tensorflow/models/pull/1610/files/94498b19b954b14e508b34d93badae027592cb83#diff-56e4e0ed49c1ab9a066e570d65d1f330

https://github.com/tensorflow/models/pull/1593/files

训练参数设置

1	--logtostderr --pipeline_config_path=./my_model/ssd_inception_v2_head.config --train_dir=F:/models/bus

1	--logtostderr --pipeline_config_path=./my_model/faster_rcnn_resnet101_voc07.config --train_dir=F:/models/passenger_head/rfcn_resnet50

注：第一次训练需要将faster_rcnn_resnet101_voc07.config中的from_detection_checkpoint: true注释掉或设为False，否则报错，之后可以使用该参数进行继续训练,同时注释fine_tune_checkpoint的话，则不使用预训练模型

eval

https://github.com/tensorflow/models/pull/1758/commits/e9606bc69ae9e8a401db1cf5920b24d8408b0c02

1	--logtostderr --eval_dir=F:/log --pipeline_config_path=./my_model/ssd_inception_v2_head.config --checkpoint_dir=F:\models\bus\model.ckpt-369

Test

本部分查看object_detection_tutorial.ipynb文件（在pycharm中打开）

用export_inference_graph.py将生成的模型转换为.pb格式模型

参数如下

--input_type image_tensor \
--pipeline_config_path path/to/ssd_inception_v2.config \
--checkpoint_path path/to/model-ckpt \
--inference_graph_path path/to/inference_graph.pb

--input_type image_tensor \   
--pipeline_config_path ./my_model/faster_rcnn_resnet101_voc07.config \  
--checkpoint_path F:\models\voc\model.ckpt-189629  
--inference_graph_path ./my_model/inference_graph.pb

export_inference_graph.py文件中需要改一个编码问题

1 2	>93 text_format.Merge(f.read(), pipeline_config) >

改为

1 2	>93 text_format.Merge(f.read().decode('utf-8'), pipeline_config) >

训练我们的数据集

首先需要生成我们数据集对应的TFRecord文件，代码如下

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import hashlib
import io

import PIL.Image
import tensorflow as tf

from object_detection.utils import dataset_util
 def read_label_file(label_file_path):
      object = []
      with open(label_file_path) as label_file:
          raw_lines = [line.strip() for line in label_file.readlines()]
          for raw_line in raw_lines:
              class_num, c_x, c_y, w, h = [float(e) for e in raw_line.split(" ")]
              x1 = (c_x - w / 2)
              y1 = (c_y - h / 2)
              x2 = (c_x + w / 2)
              y2 = (c_y + h / 2)
              x1 = max(x1, 0)
              y1 = max(y1, 0)
              x2 = min(x2, 1)
              y2 = min(y2, 1)
              class_num = int(class_num)
              object.append([class_num, x1, y1, x2, y2])
      return object

  def main():
      xmin = []
      ymin = []
      xmax = []
      ymax = []
      classes = []
      classes_text = []
      truncated = []
      poses = []
      difficult_obj = []
      # image_idx = 0
      image_list_path = r"F:\Database\data_set\train\train_bk.txt"
      # image_list_path = r"F:\Database\data_set\validate\val.txt"
      writer = tf.python_io.TFRecordWriter("F:\tensorflow\tfrecord\train.record")
      # writer = tf.python_io.TFRecordWriter("F:\tensorflow\tfrecord\val.record")
      with open(image_list_path, "r") as file:
          image_list = [line.strip().split() for line in file.readlines()]
          for img_path in image_list:
              # print(img_path[0])
              with tf.gfile.GFile(img_path[0], 'rb') as fid:
                  encoded_jpg = fid.read()
              encoded_jpg_io = io.BytesIO(encoded_jpg)
              image = PIL.Image.open(encoded_jpg_io)
              # image = PIL.Image.open(img_path[0])
              if image.format != 'JPEG':
                  raise ValueError('Image format not JPEG')
              key = hashlib.sha256(encoded_jpg).hexdigest()
              width = image.width
              height = image.height
              # print(width, height)
              label_path = img_path[0].replace("images", "labels").replace("jpg", "txt")
              object = read_label_file(label_path)
              # print(len(object))
              for obj_num in range(0, len(object)):
                  xmin.append(objectobj_num)
                  ymin.append(objectobj_num)
                  xmax.append(objectobj_num)
                  ymax.append(objectobj_num)
                  classes_text.append('head'.encode('utf8'))
                  classes.append(objectobj_num + 1)  # 类别从1开始
                  difficult_obj.append(0)
                  truncated.append(1)
                  poses.append('Unspecified'.encode('utf8'))
              example = tf.train.Example(features=tf.train.Features(feature={
                  'image/height': dataset_util.int64_feature(height),
                  'image/width': dataset_util.int64_feature(width),
                  'image/filename': dataset_util.bytes_feature(
                      img_path[0].strip().split('/')[-1].encode('utf8')),
                  'image/source_id': dataset_util.bytes_feature(
                       img_path[0].strip().split('/')[-1].encode('utf8')),
                  'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
                  'image/encoded': dataset_util.bytes_feature(encoded_jpg),
                  'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),
                  'image/object/bbox/xmin': dataset_util.float_list_feature(xmin),
                  'image/object/bbox/xmax': dataset_util.float_list_feature(xmax),
                  'image/object/bbox/ymin': dataset_util.float_list_feature(ymin),
                  'image/object/bbox/ymax': dataset_util.float_list_feature(ymax),
                  'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
                  'image/object/class/label': dataset_util.int64_list_feature(classes),
                  'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
                  'image/object/truncated': dataset_util.int64_list_feature(truncated),
                  'image/object/view': dataset_util.bytes_list_feature(poses),
              }))
              # image_idx +=1
              # if image_idx == 1:
              #     print(example)
              writer.write(example.SerializeToString())
      writer.close()

  if name == 'main':
    main()

注：类别号需要从1开始，由于我们的标注类别为0，所以classes.append(object[obj_num][0] + 1)这有个+1操作。另外，对比voc的TFRecord文件内容，由于我们的数据集不存在其他参数，所以都设置为一样（参考voc的第1个tf_example输出）