đánh giá So Sánh So sánh Eigenvalue là gì Eigenvalue calculator Eigenvector là gì Không gian vectơ Eigenvalue of matrix

So sánh giioongs nhau rato vecto

Ở bài 12 tôi đã giới thiệu đến các bạn tổng thể các lớp mô hình khác nhau trong object detection. Các kiến trúc cũ hơn có thể kể đến như R-CNN, fast R-CNN. Đặc điểm của chúng là tốc độ xử lý thấp, không đáp ứng được trong việc object dection realtime. Các mạng start-of-art hơn như SSD và YOLOv2, YOLOv3 là những kiến trúc có tốc độ xử lý nhanh mà vẫn đảm bảo về độ chính xác nhờ những thay đổi trong kiến trúc mạng nhằm gói gọn quá trình phát hiện và phân loại vật thể trong 1 lần và cắt bớt được các xử lý không cần thiết.

Trong bài này chúng ta sẽ tìm hiểu về kiến trúc, cách thức hoạt động đi kèm ví dụ thực tiễn để xây dựng một lớp mô hình SSD [Single Shot MultiBox Detector] trong object detection.

Cũng giống như hầu hết các kiến trúc object detection khác, đầu vào của SSD là tọa độ bounding box của vật thể [hay còn gọi là offsets của bounding box] và nhãn của vật thể chứa trong bounding box. Điểm đặc biệt làm nên tốc độ của SSD model là mô hình sử dụng một mạng neural duy nhất. Cách tiếp cận của nó dựa trên việc nhận diện vật thể trong các features map [là một output shape 3D của một mạng deep CNN sau khi bỏ các fully connected layers cuối] có độ phân giải khác nhau. Mô hình sẽ tạo ra một lưới các ô vuông gọi là grid cells trên các feature map, mỗi ô được gọi là một cell và từ tâm của mỗi cell xác định một tợp hợp các boxes mặc định [default boxes] để dự đoán khung hình có khả năng bao quanh vật thể. Tại thời điểm dự báo, mạng neural sẽ trả về 2 giá trị đó là: phân phối xác suất nhãn của vật thể chứa trong bounding box và một tọa độ gọi là offsets của bounding box. Quá trình huấn luyện cũng là quá trình tinh chỉnh xác suất nhãn và bounding box về đúng với các giá trị ground truth input của mô hình [gồm nhãn và offsets bounding box].

Thêm nữa, network được kết hợp bởi rất nhiều các feature map với những độ phân giải khác nhau giúp phát hiện được những vật thể đa dạng các kích thước và hình dạng. Trái với mô hình fast R-CNN, SSD bỏ qua bước tạo mặt nạ region proposal network để đề xuất vùng vật thể. Thay vào đó tất cả quá trình phát hiện vật thể và phân loại vật thể được thực hiện trong cùng 1 mạng. Bản thân tên của mô hình - Single Shot MultiBox Detector cũng nói lên được rằng mô hình sử dụng nhiều khung hình box với tỷ lệ scales khác nhau nhằm nhận diện vùng vật thể và phân loại vật thể, giảm thiểu được bước tạo region proposal network so với fast R-CNN nên tăng tốc độ xử lý lên nhiều lần mà tốc độ xử lý vẫn đảm bảo. Bên dưới là bảng so sánh tốc độ running của các mô hình object detection.

Hình 1: Bảng so sánh tốc độ xử lý và độ chính xác của các lớp model object detection [source: table 7 - SSD: Single Shot MultiBox Detector ]. Ta thấy SSD512 [mô hình SSD với kích thước đầu vào của ảnh là 512 x 512 x 3] có độ chính xác mAP là cao nhất trong khi tốc độ xử lý gần đạt mức realtime là 22 fps.

Hình 2: Cách thức phân chia feature map để nhận diện các hình ảnh với những kích thước khác nhau.

SSD chỉ cần duy nhất đầu vào là 1 bức ảnh và các ground truth boxes ám chỉ vị trí bounding box các vật thể trong suốt quá trình huấn luyện. Trong quá trình phát hiện vật thể, trên mỗi một feature map, chúng ta đánh giá các một tợp hợp nhỏ gồm những default boxes tương ứng với các tỷ lệ cạnh khác nhau [aspect ratio] lên các features map có kích thước [scales] khác nhau [chẳng hạn kích thước 8x8 và 4x4 trong hình [b] và [c]]. Đối với mỗi default box [các boxes nét đứt trong hình] ta cần dự báo một phân phối xác suất $\mathbf{c} = [c_1, c_2, …, c_n]$ tương ứng với các class $C = {C_1, C_2, …, C_n}$. Tại thời điểm huấn luyện, đầu tiên chúng ta cần match default boxes với ground truth boxes sao cho mức độ sai số được đo lường qua localization loss là nhỏ nhất [thường là hàm Smooth L1 - sẽ trình bày ở mục 2.2]. Sau đó ta sẽ tìm cách tối thiểu hóa sai số của nhãn dự báo tương ứng với mỗi vật thể được phát hiện trong default boxes thông qua confidence loss [thường là hàm softmax - sẽ trình bày ở mục 2.2].

Như vậy loss function của object detection sẽ khác với loss function của các tác vụ image classification ở chỗ có thêm localization loss về sai số vị trí của predicted boxes so với ground truth boxes.

Đó là nguyên lý hoạt động chung của SSD. Tuy nhiên kiến trúc các layers và hàm loss function của SSD cụ thể là gì ta sẽ tìm hiểu biên dưới.

Hình 3: Sơ đồ kiến trúc của mạng SSD.

SSD dựa trên một tiến trình lan truyền thuận của một kiến trúc chuẩn [chẳng hạn VGG16] để tạo ra một khối feature map output gồm 3 chiều ở giai đoạn sớm. Chúng ta gọi kiến trúc mạng này là base network [tính từ input Image đến Conv7 trong hình 3]. Sau đó chúng ta sẽ thêm những kiến trúc phía sau base network để tiến hành nhận diện vật thể như phần Extra Feature Layers trong sơ đồ. Các layers này được diễn giải đơn giản như sau:

Kết thúc phần này chúng ta đã hiểu được kiến trúc các layer của mạng SSD. Tuy nhiên quá trình huấn luyện và hàm loss function của SSD vẫn còn là một điều bí ẩn. Liệu hàm loss function của SSD có gì khác so với các thuật toán Image classification? Quá trình tối ưu cần xét đến những mất mát nào? Hãy tìm hiểu ở phần tiếp theo.

Chiến lược mapping default box Trong suốt quá trình huấn luyện ta cần mapping các default boxes có tỷ lệ aspect ratio khác nhau với ground truth box. Để mapping được chúng với nhau ta cần đo lường chỉ số IoU [Intersection of Union] hoặc chỉ số Jaccard overlap index được dùng để đo lường tỷ lệ diện tích giao nhau giữa 2 vùng hình ảnh so với tổng diện tích [không tính phần giao nhau] của chúng. Chúng ta sẽ match các default boxes với bất kì ground truth nào có threshold > 0.5.

Như chúng ta đã biết trên mỗi cell chỉ qui định một số lượng nhất định [4 hoặc 6, tùy từng feature map] các default bounding box. Vậy các default bounding box này được xác định trước thông qua aspect ratio và scale hay ngẫu nhiên? Trên thực tế là chúng hầu hết được xác định từ trước để giảm thiểu sự đa dạng về số lượng khung hình/cell mà vẫn bounding được hầu hết các vật thể. Tợp hợp các khung hình được xác định phải đảm bảo sao cho mỗi một ground truth bất kì đều có thể tìm được một default bounding box gần nó nhất. Do đó một thuật toán K-mean clustering được thực hiện trên aspect ratio của mỗi ground truth image nhằm phân cụm các khung hình ground truth thành các nhóm tương đương về hình dạng. Tâm của các clusters [còn gọi là centroids] sẽ được dùng làm các giá trị aspect ratio đại diện để tính default bounding box. Tôi hi vọng các bạn hiểu những gì tôi vừa trình bày? Không quá phức tạp phải không?

Huấn luyện để tìm ra object: Việc dự báo các object sẽ được thực hiện trên tợp hợp các khung hình output của mạng SSD. Đặt $x_{ij}^k = {0, 1}$ là chỉ số đánh giá cho việc matching giữa default bounding box thứ $i$ với ground truth box thứ $j$ đối với nhãn thứ $k$. Trong quá trình mapping chúng ta có thể có nhiều bounding box được map vào cùng 1 ground truth box với cùng 1 nhãn dự báo nên tổng $\sum_{i}x_{ij}^k \geq 1$. Hàm loss function là tổng có trọng số của localization loss [loc] và confidence loss [conf]:

\[L[x, c, p, g] = \frac{1}{N}[L_{conf}[x, c] + \alpha L_{loc}[x, p, g]] \tag{1}\]

Trong đó $N$ là số lượng các default boxes matching với ground truth boxes. Ta nhận thấy giá trị của hàm loss function của SSD hoàn toàn giống với faster R-CNN và bao gồm 2 thành phần:

1. localization loss: là một hàm Smooth L1 đo lường sai số giữa tham số của box dự báo [predicted box] [$p$] và ground truth box [$g$] như bên dưới:. Chúng ta sẽ cần hồi qui các offsets cho tâm $[x, y]$ và của default bounding box [$d$] và các chiều dài $h$ và chiều rộng $w$.

\[L_{loc}[x, p ,g] = \sum_{i \in Pos}{N}\sum_{m \in \{x, y, w, h\}} x{k}_{ij} \space L_1^\text{smooth}[p_i^m - \hat{g}_j^m]\]

\[\hat{g}_x = \frac{g_x-d_x}{d_w} \triangleq t_{x}\] \[\hat{g}_y = \frac{g_y-d_y}{d_h} \triangleq t_{y}\]

\[\hat{g}_w = log[\frac{d_w}{g_w}] \triangleq t_{w}\] \[\hat{g}_h = log[\frac{d_h}{g_h}] \triangleq t_{h}\]

\[L_1^\text{smooth}[x] = \begin{cases} 0.5 x^2 & \text{if } \vert x \vert < 1\\ \vert x \vert - 0.5 & \text{otherwise} \end{cases}\]

Trong trường hợp $x$ là một véc tơ thì thay $x$ ở vế phải bằng giá trị norm chuẩn bậc 1 của $x$ kí hiệu là $|x|$. Việc lựa chọn hàm loss function là smooth L1 là để giá trị của đạo hàm gradient descent cố định khi $|x|$ lớn và smoothing khi $x$ nhỏ. Về norm chuẩn các bạn có thể xem trong bài tổng hợp ML appendix. Trong phương trình của hàm localization loss thì các hằng số mà ta đã biết chính là $\hat{g}$. Biến cần tìm giá trị tối ưu chính là $p$. Sau khi tìm ra được nghiệm tối ưu của $p$ ta sẽ tính ra predicted box nhờ phép chuyển đổi từ default box tương ứng.

2. confidence loss: là một hàm mất mát được tính toán dựa trên sai số dự báo nhãn. Đối với mỗi một positive match prediction, chúng ta phạt loss function theo confidence score của các nhãn tương ứng. Đối với mỗi một negative match prediction, chúng ta phạt loss function theo confidence score của nhãn ‘0’ là nhãn đại diện cho background không chứa vật thể. Cụ thể hàm confidence loss như bên dưới:

\[L_{conf}[x, c] = -\sum_{i \in Pos} x_{ij}^{k} \text{log}[\hat{c}_{i}^k] - \sum_{i \in Neg}\text{log}[\hat{c}_{i}^0]\]

Trong trường hợp positive match prediction thì vùng được dự báo có vật thể chính xác là chứa vật thể. Do đó việc dự báo nhãn cho nó sẽ tương tự như một bài toán classification với hàm softmax thông thường có dạng \[-\sum_{i \in Pos} x_{ij}{k}\text{log}[\hat{c}_{i}^p]\]. Trong trường hợp negative match prediction tức vùng được dự báo là không chứa vật thể chúng ta sẽ chỉ có duy nhất một nhãn là 0. Và tất nhiên ta đã biết trước bounding box là không chứa vật thể nên xác xuất để xảy ra nhóm 0 là $x_{ij}{0} = 1$. Do đó hàm softmax có dạng $- \sum_{i \in Neg}log[\hat{c}_{i}^0]$.

Hàm loss function cuối cùng được tính là tổng của 2 confidence loss và localization loss như [1].

Các default boundary box được lựa chọn thông qua aspect ratio và scales. SSD sẽ xác định một tỷ lệ scale tương ứng với mỗi một features map trong Extra Feature Layers. Bắt đầu từ bên trái,

from future import division import numpy as np import keras.backend as K from keras.engine.topology import InputSpec from keras.engine.topology import Layer from bounding_box_utils.bounding_box_utils import convert_coordinates class AnchorBoxes[Layer]:

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

2 phát hiện các object tại các scale nhỏ nhất là $s_{min} = 0.2$ [đôi khi là 0.1] và sau đó gia tăng tuyến tính để layer cuối cùng ở phía bên phải có scale là $s_{max} = 0.9$ theo công thức:

Với $k$ là số thứ tự của layers. Kết hợp giữa giá trị scale với aspect ratio chúng ta sẽ tính được width và height của default boxes. Với các layers có 6 dự báo, SSD sẽ tạo ra 5 default boxes với các aspect ratios lần lượt là: 1, 2, 3, 1/2, 1/3. Sau đó width và height của default boxes được tính theo công thức:

\[w = scale * \sqrt{\text{aspect ratio}}\] \[h = \frac{scale}{\sqrt{\text{aspect ratio}}}\]

Trong trường hợp aspect ratio = 1 thì ta sẽ thêm một default bounding box thứ 6 với scale được tính theo công thức:

Thuật toán SSD là một thuật toán rất phức tạp, có nhiều layers và các phases xử lý khác nhau. Vì vậy code này tôi không tự mình viết hết mà tham khảo từ SSD keras. Trong code tôi có chỉnh sửa lại một số đoạn và kèm theo diễn giải về từng step xử lý như thế nào.

Phần tinh túy nhất của SSD có lẽ là việc xác định các layers output của anchor box [hoặc default bounding box] ở các feature map. anchor box layer sẽ nhận đầu vào ra một feature map có kích thước

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

3 và các scales, aspect ratios, trả ra đầu ra là một tensor kích thước

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

4, trong đó chiều cuối cùng đại diện cho 4 offsets của bounding box như mô tả trong Default box và tỷ lệ cạnh [aspect ratio] của mục 2.1.

Code biến đổi khá phức tạp. Tôi trong đó các phần biến đổi chính được thực hiện trong hàm

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

\[\begin{cases} \textit{box_h} = \textit{scale * size} / \sqrt{\textit{aspect ratio}}\\ \textit{box_w} = \textit{scale * size} * \sqrt{\textit{aspect ratio}} \end{cases}\]

\[\begin{cases} \textit{step_h} = \textit{img_h} / \textit{feature_map_h}\\ \textit{step_w} = \textit{img_w} / \textit{feature_map_w} \end{cases}\]

\[\begin{cases} c_x = \textit{np.linspace[start_w, end_w, feature_map_w]}\\ c_y = \textit{np.linspace[start_h, end_h, feature_map_h]} \end{cases}\]

Kết quả trả về là một tensor có shape là

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

6, trong đó chiếu cuối cùng = 8 tương ứng với 4 offsets của default bounding box và 4 variances đại diện cho các scales của default bounding box.

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

Bên dưới ta sẽ kiểm nghiệm kết quả test AnchorBoxes layer khi truyền thử nghiệm đầu vào là tensor $x$.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Test output of Anchor box

import tensorflow as tf x = tf.random.normal[shape = [4, 38, 38, 512]] aspect_ratios_per_layer=[[1.0, 2.0, 0.5],

                     [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                     [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                     [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                     [1.0, 2.0, 0.5],
                     [1.0, 2.0, 0.5]]

two_boxes_for_ar1=True steps=[8, 16, 32, 64, 100, 300] offsets=None clip_boxes=False variances=[0.1, 0.1, 0.2, 0.2] coords='centroids' normalize_coords=True subtract_mean=[123, 117, 104] divide_by_stddev=None swap_channels=[2, 1, 0] confidence_thresh=0.01 iou_threshold=0.45 top_k=200 nms_max_output_size=400

Thiết lập tham số

img_height = 300 img_width = 300 img_channels = 3 mean_color = [123, 117, 104] swap_channels = [2, 1, 0] n_classes = 20 scales = [0.1, 0.2, 0.37, 0.54, 0.71, 0.88, 1.05] aspect_ratios = [[1.0, 2.0, 0.5],

             [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
             [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
             [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
             [1.0, 2.0, 0.5],
             [1.0, 2.0, 0.5]]

two_boxes_for_ar1 = True steps = [8, 16, 32, 64, 100, 300] offsets = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5] clip_boxes = False variances = [0.1, 0.1, 0.2, 0.2] normalize_coords = True anchors = AnchorBoxes[img_height, img_width, this_scale=scales[1], next_scale=scales[2]][x] print['anchors shape: ', anchors.get_shape[]]

Như vậy kết quả output của anchors box trả ra là

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

6 là hợp lý.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474

from future import division import numpy as np from keras.models import Model from keras.layers import Input, Lambda, Activation, Conv2D, MaxPooling2D, ZeroPadding2D, Reshape, Concatenate from keras.regularizers import l2 import keras.backend as K from keras_layers.keras_layer_AnchorBoxes import AnchorBoxes from keras_layers.keras_layer_L2Normalization import L2Normalization from keras_layers.keras_layer_DecodeDetections import DecodeDetections from keras_layers.keras_layer_DecodeDetectionsFast import DecodeDetectionsFast def ssd_300[image_size,

        n_classes,
        mode='training',
        l2_regularization=0.0005,
        min_scale=None,
        max_scale=None,
        scales=None,
        aspect_ratios_global=None,
        aspect_ratios_per_layer=[[1.0, 2.0, 0.5],
                                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                 [1.0, 2.0, 0.5, 3.0, 1.0/3.0],
                                 [1.0, 2.0, 0.5],
                                 [1.0, 2.0, 0.5]],
        two_boxes_for_ar1=True,
        steps=[8, 16, 32, 64, 100, 300],
        offsets=None,
        clip_boxes=False,
        variances=[0.1, 0.1, 0.2, 0.2],
        coords='centroids',
        normalize_coords=True,
        subtract_mean=[123, 117, 104],
        divide_by_stddev=None,
        swap_channels=[2, 1, 0],
        confidence_thresh=0.01,
        iou_threshold=0.45,
        top_k=200,
        nms_max_output_size=400,
        return_predictor_sizes=False]:
'''
Xây dựng model SSD300 với keras.
Base network được sử dụng là VGG16.
Chú ý: Yêu cầu Keras>=v2.0; TensorFlow backend>=v1.0.
Arguments:
    image_size [tuple]: Kích thước image input `[height, width, channels]`.
    n_classes [int]: Số classes, chẳng hạn 20 cho Pascal VOC dataset, 80 cho MS COCO dataset.
    mode [str, optional]: Một trong những dạng 'training', 'inference' và 'inference_fast'. 
        'training' mode: Đầu ra của model là raw prediction tensor.
        'inference' và 'inference_fast' modes: raw predictions được decoded thành tọa độ đã được filtered thông qua threshold.
    l2_regularization [float, optional]: L2-regularization rate. Áp dụng cho toàn bộ các convolutional layers.
    min_scale [float, optional]: Nhân tố scaling nhỏ nhất cho các size của anchor boxes. Tỷ lệ này được tính trên so sánh với cạnh ngắn hơn
    của hình ảnh input.
    max_scale [float, optional]: Nhân tố scale lớn nhất cho các size của anchor boxes.
    scales [list, optional]: List các số floats chứa các nhân tố scaling của các convolutional predictor layer.
        List này phải lớn hơn số lượng các predictor layers là 1 để sử dụng cho trường hợp aspect ratio = 1 sẽ tính thêm next scale.
        Trong TH sử dụng scales thì interpolate theo min_scale và max_scale để tính list scales sẽ không được sử dụng.
    aspect_ratios_global [list, optional]: List của các aspect ratios mà các anchor boxes được tạo thành. List này được áp dụng chung trên toàn bộ các prediction layers.
    aspect_ratios_per_layer [list, optional]: List của các list aspect ratio cho mỗi một prediction layer.
        Nếu được truyền vào sẽ override `aspect_ratios_global`.
    two_boxes_for_ar1 [bool, optional]: Chỉ áp dụng khi aspect ratio lists chứa 1. Sẽ bị loại bỏ trong các TH khác.
        Nếu `True`, 2 anchor boxes sẽ được tạo ra ứng với aspect ratio = 1. anchor box đầu tiên tạo thành bằng cách sử scale, anchor box thứ 2 
        được tạo thành bằng trung bình hình học của scale và next scale.
    steps [list, optional]: `None` hoặc là list với rất nhiều các phần tử có số lượng bằng với số lượng layers.
        Mỗi phần tử đại diện cho mỗi một predictor layer có bao nhiêu pixels khoảng cách giữa các tâm của anchor box.
        steps có thể gồm 2 số đại diện cho [step_width, step_height].
        nếu không có steps nào được đưa ra thì chúng ta sẽ tính để cho khoảng các giữa các tâm của anchor box là bằng nhau
    offsets [list, optional]: None hoặc là các con số đại diện cho mỗi một predictor layer bao nhiêu pixels từ góc trên và bên trái mở rộng của ảnh
    clip_boxes [bool, optional]: Nếu `True`, giới hạn tọa độ các anchor box để nằm trong boundaries của image.
    variances [list, optional]: Một list gồm 4 số floats >0. Một anchor box offset tương ứng với mỗi tọa độ sẽ được chi cho giá trị variance tương ứng.
    coords [str, optional]: Tọa độ của box được sử dụng bên trong model [chẳng hạn, nó không là input format của ground truth labels]. 
        Có thể là dạng 'centroids' format `[cx, cy, w, h]` [box center coordinates, width,
        and height], 'minmax' format `[xmin, xmax, ymin, ymax]`, hoặc 'corners' format `[xmin, ymin, xmax, ymax]`.
    normalize_coords [bool, optional]: Được đặt là `True` nếu model được giả định sử dụng tọa độ tương đối thay vì tuyệt đối coordinates,
        chẳng hạn nếu model dự báo tọa độ box nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    subtract_mean [array-like, optional]: `None` hoặc một array object với bất kì shape nào mà dạng mở rộng phù hợp với shape của ảnh. Gía trị của nó được bớt đi từ độ lớn pixel của ảnh. The elements of this array will be
        Chẳng hạn truyền vào một list gồm 3 số nguyên để tính toán trung bình chuẩn hóa cho các kênh của ảnh.
    divide_by_stddev [array-like, optional]: `None` hoặc một array object. Tương tự như subtract_mean nhưng được chia cho từ độ lớn của ảnh để tính chuẩn hóa.
    swap_channels [list, optional]: Là `False` hoặc một list các số nguyên biểu diễn thứ tự kì vọng mà trong đó đầu vào các channels của ảnh có thể được hoán đổi.
    confidence_thresh [float, optional]: Một số float nằm trong khoảng [0,1], là ngưỡng tin cậy nhỏ nhất trong phân loại của một lớp xảy ra.
    iou_threshold [float, optional]: Một float nằm trong khoảng [0,1]. Tất cả các boxes có chỉ số Jaccard similarity lớn hơn hoặc bằng `iou_threshold`
        sẽ được xem xét là chứa vệt thể bên trong nó.
    top_k [int, optional]: Điểm dự báo cáo nhất được giữ trong mỗi batch item sau bước non-maximum suppression stage.
    nms_max_output_size [int, optional]: Số lượng lớn nhất các dự báo sẽ được chuyển qua bước NMS stage.
    return_predictor_sizes [bool, optional]: Nếu `True`, hàm số này sẽ không chỉ trả về mô hình, mà còn trả về 
        một list chứa các chiều của predictor layers.
Returns:
    model: The Keras SSD300 model.
    predictor_sizes [optional]: Một numpy array chứa các phần `[height, width]` của output tensor shape tương ứng với mỗi convolutional predictor layer.
References:
    //arxiv.org/abs/1512.02325v5
'''
n_predictor_layers = 6 # Số lượng các preductor convolutional layers trong network là 6 cho original SSD300.
n_classes += 1 # Số lượng classes, + 1 để tính thêm background class.
l2_reg = l2_regularization # tham số chuẩn hóa của norm chuẩn l2.
img_height, img_width, img_channels = image_size[0], image_size[1], image_size[2]
############################################################################
# Một số lỗi ngoại lệ.
############################################################################

if aspect_ratios_global is None and aspect_ratios_per_layer is None:
    raise ValueError["`aspect_ratios_global` and `aspect_ratios_per_layer` cannot both be None. At least one needs to be specified."]
if aspect_ratios_per_layer:
    if len[aspect_ratios_per_layer] != n_predictor_layers:
        raise ValueError["It must be either aspect_ratios_per_layer is None or len[aspect_ratios_per_layer] == {}, but len[aspect_ratios_per_layer] == {}.".format[n_predictor_layers, len[aspect_ratios_per_layer]]]
# Tạo list scales
if [min_scale is None or max_scale is None] and scales is None:
    raise ValueError["Either `min_scale` and `max_scale` or `scales` need to be specified."]
if scales:
    if len[scales] != n_predictor_layers+1:
        raise ValueError["It must be either scales is None or len[scales] == {}, but len[scales] == {}.".format[n_predictor_layers+1, len[scales]]]
else: 
    scales = np.linspace[min_scale, max_scale, n_predictor_layers+1]
if len[variances] != 4:
    raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
variances = np.array[variances]
if np.any[variances  1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

'''
Tác dụng: Tạo ra một output tensor chứa tọa độ của các anchor box và các biến thể dựa trên input tensor.
Một tợp hợp các 2D anchor boxes được tạo ra dựa trên aspect ratios và scale trên mỗi một cells của grid cells. Các hộp được tham số hóa bằng các tọa độ `[xmin, xmax, ymin, ymax]`
Input shape:
    4D tensor shape `[batch, channels, height, width]` nếu `dim_ordering = 'th'`
    or `[batch, height, width, channels]` nếu `dim_ordering = 'tf'`.
Output shape:
    5D tensor of shape `[batch, height, width, n_boxes, 8]`. 
    Chiều cuối cùng gồm 4 tọa độ của anchor box và 4 giá trị biến thể ở mỗi box.
'''
def __init__[self,
             img_height,
             img_width,
             this_scale,
             next_scale,
             aspect_ratios=[0.5, 1.0, 2.0],
             two_boxes_for_ar1=True,
             this_steps=None,
             this_offsets=None,
             clip_boxes=False,
             variances=[0.1, 0.1, 0.2, 0.2],
             coords='centroids',
             normalize_coords=False,
             **kwargs]:
    '''
    Arguments:
        img_height [int]: chiều cao input images.
        img_width [int]: chiều rộng input images.
        this_scale [float]: một giá trị float thuộc [0, 1], nhân tố scaling kích thước để tạo các anchor boxes dựa trên một tỷ lệ so với cạnh ngắn hơn trong width và height.
        next_scale [float]: giá trị tiếp theo của scale. Được thiết lập khi vào chỉ khi
            `self.two_boxes_for_ar1 == True`.
        aspect_ratios [list, optional]: tợp hợp các aspect ratios của các default boxes được tạo ra từ layer này.
        two_boxes_for_ar1 [bool, optional]: Được sử dụng chỉ khi `aspect_ratios` = 1.
            Nếu `True`, hai default boxes được tạo ra khi aspect ratio = 1. default box đầu tiên sử dụng scaling factor của layer tương ứng,
            default box thứ 2 sử dụng trung bình hình học giữa scaling factor và next scaling factor.
        clip_boxes [bool, optional]: Nếu đúng `True`, giới hạn tọa độ anchor box nằm bên trong hình ảnh.
        variances [list, optional]: Tợp hợp gồm 4 giá trị floats > 0. Là các anchor box offset tương ứng với mỗi tọa độ chia cho giá trị variances tương ứng của nó.
        coords [str, optional]: Tọa độ của box được sử dụng trong model. Có thể là centroids định dạng `[cx, cy, w, h]` [tọa độ box center, width, height],
            hoặc 'corners' định dạng `[xmin, ymin, xmax,  ymax]`, hoặc 'minmax' định dạng `[xmin, xmax, ymin, ymax]`.
        normalize_coords [bool, optional]: Nếu `True` mô hình sử dụng tọa độ tương đối thay vì tuyệt đối. Chẳng hạn mô hình dự đoán tọa độ nằm trong [0, 1] thay vì tọa độ tuyệt đối.
    '''
    if K.backend[] != 'tensorflow':
        raise TypeError["This layer only supports TensorFlow at the moment, but you are using the {} backend.".format[K.backend[]]]
    if [this_scale < 0] or [next_scale < 0] or [this_scale > 1]:
        raise ValueError["`this_scale` must be in [0, 1] and `next_scale` must be >0, but `this_scale` == {}, `next_scale` == {}".format[this_scale, next_scale]]
    if len[variances] != 4:
        raise ValueError["4 variance values must be pased, but {} values were received.".format[len[variances]]]
    variances = np.array[variances]
    if np.any[variances = self.img_width] = self.img_width - 1
        x_coords[x_coords < 0] = 0
        boxes_tensor[:,:,:,[0, 2]] = x_coords
        y_coords = boxes_tensor[:,:,:,[1, 3]]
        y_coords[y_coords >= self.img_height] = self.img_height - 1
        y_coords[y_coords < 0] = 0
        boxes_tensor[:,:,:,[1, 3]] = y_coords
    # Nếu `normalize_coords` = True, chuẩn hóa các tọa độ nằm trong khoảng [0,1]
    if self.normalize_coords:
        boxes_tensor[:, :, :, [0, 2]] /= self.img_width
        boxes_tensor[:, :, :, [1, 3]] /= self.img_height
    if self.coords == 'centroids':
        # Convert `[xmin, ymin, xmax, ymax]` to `[cx, cy, w, h]`.
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2centroids', border_pixels='half']
    elif self.coords == 'minmax':
        # Convert `[xmin, ymin, xmax, ymax]` to `[xmin, xmax, ymin, ymax].
        boxes_tensor = convert_coordinates[boxes_tensor, start_index=0, conversion='corners2minmax', border_pixels='half']
    # Tạo một tensor chứa các variances và append vào `boxes_tensor`. 
    variances_tensor = np.zeros_like[boxes_tensor] # shape `[feature_map_height, feature_map_width, n_boxes, 4]`
    variances_tensor += self.variances # Mở rộng thêm variances
    # Bây h `boxes_tensor` trở thành tensor kích thước `[feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.concatenate[[boxes_tensor, variances_tensor], axis=-1]
    # Bây h chuẩn bị trước một chiều cho `boxes_tensor` đại diện cho batch size và di chuyển copy theo chiều đó [theo kiểu lợp ngói, xem thêm np.tile]
    #  ta được một 5D tensor kích thước `[batch_size, feature_map_height, feature_map_width, n_boxes, 8]`
    boxes_tensor = np.expand_dims[boxes_tensor, axis=0]
    boxes_tensor = K.tile[K.constant[boxes_tensor, dtype='float32'], [K.shape[x][0], 1, 1, 1, 1]]
    return boxes_tensor
def compute_output_shape[self, input_shape]:
    if K.common.image_dim_ordering[] == 'tf':
        batch_size, feature_map_height, feature_map_width, feature_map_channels = input_shape
    else: 
        batch_size, feature_map_channels, feature_map_height, feature_map_width = input_shape
    return [batch_size, feature_map_height, feature_map_width, self.n_boxes, 8]
def get_config[self]:
    config = {
        'img_height': self.img_height,
        'img_width': self.img_width,
        'this_scale': self.this_scale,
        'next_scale': self.next_scale,
        'aspect_ratios': list[self.aspect_ratios],
        'two_boxes_for_ar1': self.two_boxes_for_ar1,
        'clip_boxes': self.clip_boxes,
        'variances': list[self.variances],
        'coords': self.coords,
        'normalize_coords': self.normalize_coords
    }
    base_config = super[AnchorBoxes, self].get_config[]
    return dict[list[base_config.items[]] + list[config.items[]]]

2 phần xử lý trên chính là những xử lý mấu chốt của thuật toán mà chúng ta cần nắm được. Phần khởi tạo các data_generator và huấn luyện mô hình khá đơn giản các bạn có thể tham khảo code gốc tại SSD_keras - git repository, rất chi tiết. Khi đưa vào 1 hình ảnh, thuật toán sẽ trả về kết quả bao gồm các khung hình bao quan vật thể kèm theo nhãn và xác suất của lớp mà vật thể bao trong khung hình có thể thuộc về nhất. Thuật toán có thể dự báo nhiều vật thể có kích thước to nhỏ khác nhau.

Như vậy qua bài viết này tôi đã trình bày cho bạn đọc tổng quát kiến trúc và cách thức hoạt động của thuật toán SSD. Đây là một trong những thuật toán có độ chính xác cao và tốc độ xử lý tương đối nhanh. Tôi xin tổng kết lại một số ý chính:

Hi vọng rằng chúng ta có thể nắm vững được thuật toán và tự xây dựng cho mình một mạng SSD để nhận diện vật thể.

Test output of Anchor box

Thiết lập tham số

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề