<acronym id="s8ci2"><small id="s8ci2"></small></acronym>

<rt id="s8ci2"></rt><rt id="s8ci2"><optgroup id="s8ci2"></optgroup></rt>

<acronym id="s8ci2"></acronym>

<acronym id="s8ci2"><center id="s8ci2"></center></acronym>

搜索歷史

清空

搜索熱詞

0

聊天消息
系統消息
評論與回復

查看更多

查看更多

查看更多

登錄后你可以

下載海量資料
學習在線課程
觀看技術視頻
寫文章/發帖/加入社區

創作中心

發布

創作活動

完善資料讓更多小伙伴認識你，還能領取20積分哦，立即完善>

3天內不再提示

【飛騰派4G版免費試用】第五章：使用C++部署tflite模型到飛騰派

使用C++部署tflite模型到飛騰派

前面幾章完成了佩奇檢測模型的訓練和測試，并轉換為了 tflite 格式，并分別在 PC 和飛騰派上使用Python和C++完成了簡單tflite模型推理的測試。而本章記錄下使用 C++ 進行佩奇檢測模型推理的過程，本篇分為兩個部分。首先是在 PC 端使用 C++ 加載 tflite 模型進行測試，然后再交叉編譯到飛騰派上。

[Real-Time Pose Detection in C++ using Machine Learning with TensorFlow Lite]
[Tensorflow 1 vs Tensorflow 2 C-API]

工作流程

代碼的開發主要是在 minimal 工程的基礎上進行。整個代碼的工作流程主要是：

加載模型
修改輸入 tensor 的 shape
填充輸入數據
進行推理
提取輸出數據

基礎概念

Inference：推理就是給模型輸入新的數據，讓模型完成預測的過程
Tensor：張量，在模型中表示一個多維數據的數據結構，在tflite中用結構體 TfLiteTensor 表示
Shape：對應 Tensor 的維度，是 TfLiteTensor 中的 TfLiteIntArray* dims 成員，這里 TfLiteIntArray 中含有維度，和具體每一維的形狀

關鍵步驟

我在實際開發的過程中，主要的步驟有三個：

修改模型輸入 tensor 的維度：
這里為什么要修改輸入維度呢？因為原始的 tensor 維度是 [1,-1,-1,3] ，測試部分代碼如下，圖像的 -1, -1 表示對應的圖片的寬和高是未知的。3表示圖像的通道是3,即RGB。

auto a_input = interpreter- >inputs()[0];
auto a_input_batch_size = interpreter- >tensor(a_input)- >dims_signature- >data[0];
auto a_input_height = interpreter- >tensor(a_input)- >dims_signature- >data[1];
auto a_input_width = interpreter- >tensor(a_input)- >dims_signature- >data[2];
auto a_input_channels = interpreter- >tensor(a_input)- >dims_signature- >data[3];
std::cout < < "The input tensor has the following dimensions: ["
           < < a_input_batch_size < < ","
           < < a_input_height < < ","
           < < a_input_width < < ","
           < < a_input_channels < < "]" < < std::endl;

為了明確輸入圖像的大小，這里設置的是200*200, 所以使用下述代碼強制修改輸入 tensor 的shape 為 {1,200,200,3} 。

// 強制修改 tensor 的 shape
std::vector< int > peppa_jpg = {1,200,200,3};
interpreter- >ResizeInputTensor(0, peppa_jpg);

這里限定了輸入圖片的維度后，就方便后面使用數據進行填充測試了。
2. 明確了輸入數據后，還有一個關鍵的步驟是提取輸出，提取哪個輸出呢？這里首先使用 python 檢測模型的輸出參數，這里實際執行如下指令以及對應的打印如下：

? saved_model_cli show --dir exported_models/efficientdet_d0/saved_model/ --tag_set serve --signature_def serving_default
2023-12-27 11:17:23.958429: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-27 11:17:23.959999: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-27 11:17:23.990118: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-27 11:17:23.990510: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-27 11:17:24.489577: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-12-27 11:17:25.022727: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2023-12-27 11:17:25.022762: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: fedora
2023-12-27 11:17:25.022765: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: fedora
2023-12-27 11:17:25.022836: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: 535.146.2
2023-12-27 11:17:25.022845: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 535.146.2
2023-12-27 11:17:25.022847: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:309] kernel version seems to match DSO: 535.146.2
The given SavedModel SignatureDef contains the following input(s):
  inputs['input_tensor'] tensor_info:
      dtype: DT_UINT8
      shape: (1, -1, -1, 3)
      name: serving_default_input_tensor:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['detection_anchor_indices'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 100)
      name: StatefulPartitionedCall:0
  outputs['detection_boxes'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 100, 4)
      name: StatefulPartitionedCall:1
  outputs['detection_classes'] tensor_info:
      dtype: Dming lingT_FLOAT
      shape: (1, 100)
      name: StatefulPartitionedCall:2
  outputs['detection_multiclass_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 100, 1)
      name: StatefulPartitionedCall:3
  outputs['detection_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 100)
      name: StatefulPartitionedCall:4
  outputs['num_detections'] tensor_info:
      dtype: DT_FLOAT
      shape: (1)
      name: StatefulPartitionedCall:5
  outputs['raw_detection_boxes'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 49104, 4)
      name: StatefulPartitionedCall:6
  outputs['raw_detection_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 49104, 1)
      name: StatefulPartitionedCall:7
Method name is: tensorflow/serving/predict

使用命令 saved_model_cli 可以更直觀地看到模型的輸入和輸出，因為佩奇檢測模型是單個類別，主要是測試位置的信息，即目標佩奇在視場中的相對位置信息，這里我們重點關注兩個 tensor。

outputs['detection_scores'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 100)
      name: StatefulPartitionedCall:4
  outputs['detection_boxes'] tensor_info:
      dtype: DT_FLOAT
      shape: (1, 100, 4)
      name: StatefulPartitionedCall:1

這里在 tflite 中，我們就要通過名字是 StatefulPartitionedCall:4 的 tensor 來獲取得推理結果的數據，從 tensor 的 shape 可以看到，這里含有 100 個推理的結果。對應地，通過名字是 StatefulPartitionedCall:1 的 tensor 來獲取得對應概率結果的目標框。也可以更加直觀的使用類似下述代碼進行提取我們關心的輸出 tensor。

// 直接找到輸出 tensor 指針
  auto detection_scores_tensor = interpreter- >output_tensor_by_signature("detection_scores", "serving_default");
  auto detection_boxes_tensor = interpreter- >output_tensor_by_signature("detection_boxes", "serving_default");

提取圖片數據填充到輸入
這里因為是測試，我首先通過 Python 將圖片的 RGB 數據提取出來，然后存儲到一個數組中。然后在 minimal 的工程中，直接調用這個數組的數據填充模型的輸入。
提取圖片的 RGB 數據并存儲文件的 Python 腳本如下：

#!/bin/python

import cv2 as cv
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import sys
import numpy as np

g_color_bits = 4
g_file_name = "gg"
g_file_extend = "xx"
g_file_full_name = g_file_name+'_'+g_file_extend+".cpp"
g_pic_200200_name = '.' + g_file_name+'_.'+g_file_extend

def scale_img(img_name):
	# 讀入原圖片
	img = cv.imread(img_name)
	# 打印出圖片尺寸
	print(img.shape)
	# 將圖片高和寬分別賦值給x，y
	#  x, y = img.shape[0:2]
	# 顯示原圖
	#  cv.imshow('OriginalPicture', img)
	img_200x200 = cv.resize(img, (200, 200))
	#  cv.imshow('img200200', img_200x200)
	print(g_pic_200200_name)
	cv.imwrite(g_pic_200200_name , img_200x200)
	
def load(img_name):
	global g_color_bits
	global g_file_name
	global g_file_extend
	global g_file_full_name
	global g_pic_200200_name
	g_file_extend = img_name.split('.')[-1]
	g_file_name = img_name.split('/')[-1]
	print(g_file_name.split('.'))
	g_file_name = g_file_name.split('.')[-2]
	g_file_full_name = g_file_name+'_'+g_file_extend+".cpp"
	g_pic_200200_name = '.' + g_file_name + "_200200_." + g_file_extend
	print(img_name + " load succes will change to " + g_file_full_name)
	print(img_name + " load succes will scale to " + g_pic_200200_name)
	scale_img(img_name)
	img = mpimg.imread(g_pic_200200_name)
	if isinstance(img[0,0,0], np.float32):
		img *= 255
	else:
		print(type(img[0,0,0]))
	#  類型轉換
	img=np.uint32(img)
	if img.shape[2] == 4:
		g_color_bits = 32
	else:
		g_color_bits = 32
	print("img shape:",  img.shape, g_color_bits);
	return img

def dump_info(img):
	print(img.shape)
	print(type(img.shape[0]))
	print(type(img.shape[1]))
	print(type(img.shape[2]))
	print(type(img[0,0,0]))
	#  print(type(img[500,500,0]))

def show_img(img):
	plt.imshow(img)
	plt.show()

def write_data2file(img):
	global g_file_name
	global g_file_extend
	global g_color_bits
	global g_file_full_name
	ans = np.zeros((img.shape[0], img.shape[1]), dtype = np.uint32)
	output_str="extern "C" { "+ 'n'
	output_str+="unsigned int raw_data[] = {" + 'n'
	# 列
	for i in range(img.shape[1]):
		# 行
		for j in range(img.shape[0]):
			for n in range(4):
				if g_color_bits == 32:
					ans[j, i] = img[j, i, 0] < < 16
					ans[j, i] |= img[j, i, 1] < < 8
					ans[j, i] |= img[j, i, 2]
	#  print(type(img[500, 100, :]), img[500, 100, :])
	#  print('final value:%x' %(ans[500, 100]))
	for j in range(img.shape[0]):
		for i in range(img.shape[1]):
			output_str += hex(ans[j, i]) + ", "
			if (j * img.shape[1] + i) % 16 == 0:
				output_str = output_str[:-1]
				output_str += 'n'
	output_str = output_str[:-2]
	output_str += "};n"
	output_str += "};n"
	global g_file_full_name
	output_file = open(g_file_full_name, "w")
	output_file.write(output_str)
	output_file.close()

#  scale_img(sys.argv[1])
image4convert = load(sys.argv[1])
dump_info(image4convert)

write_data2file(image4convert)
#  show_img(image4convert)

使用圖片進行測試：

執行完該腳本后，會得到一個 .cpp 文件，該文件中包含了 RGB 信息的數組。
如：

Screenshot from 2023-12-27 13-24-10.png
接下來就是將數組填充到模型的輸入 tensor,這部分關鍵的代碼如下：

int insert_raw_data(uint8_t *dst, unsigned int *data)
  {
     int i, j, k, l;

     for (i = 0; i < 200; i++)
       for (j = 0; j < 200; j++)
       {
          *dst++ = *data > > 16 & 0XFF;
          *dst++ = *data > > 8 & 0XFF;
          *dst++ = *data & 0XFF;
          data++;
       }
     return 0;
  }
....
uint8_t * input_tensor = interpreter- >typed_input_tensor< uint8_t >(0);
insert_raw_data(input_tensor, raw_data);

代碼解析

工程現在完整的 minimal.cc 文件是：

/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include < cstdio >
#include < iostream >
#include < vector >
#include < sys/time.h >

#include "tensorflow/lite/interpreter.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/optional_debug_tools.h"

// This is an example that is minimal to read a model
// from disk and perform inference. There is no data being loaded
// that is up to you to add as a user.
//
// NOTE: Do not add any dependencies to this that cannot be built with
// the minimal makefile. This example must remain trivial to build with
// the minimal build tool.
//
// Usage: minimal < tflite model >

#define TFLITE_MINIMAL_CHECK(x)                              
  if (!(x)) {                                                
    fprintf(stderr, "Error at %s:%dn", __FILE__, __LINE__); 
    exit(1);                                                 
  }

int insert_raw_data(uint8_t *dst, unsigned int *data)
{
   int i, j, k, l;

   for (i = 0; i < 200; i++)
     for (j = 0; j < 200; j++)
     {
        *dst++ = *data > > 16 & 0XFF;
        *dst++ = *data > > 8 & 0XFF;
        *dst++ = *data & 0XFF;
        data++;
     }
   return 0;
}

int dump_tflite_tensor(TfLiteTensor *tensor)
{
  std::cout < < "Name:" < < tensor- >name < < std::endl;
  if (tensor- >dims)
  {
      std::cout < < "Shape: [" ;
      for (int i = 0; i < tensor- >dims- >size; i++)
          std::cout < < tensor- >dims- >data[i] < < ",";
      std::cout < < "]" < < std::endl;
  }
  std::cout < < "Type:" < < tensor- >type < < std::endl;

  return 0;
}

extern unsigned int raw_data[];
int main(int argc, char* argv[]) {
  if (argc != 2) {
    fprintf(stderr, "minimal < tflite model >n");
    return 1;
  }
  const char* filename = argv[1];

  // Load model
  // 加載模型
  std::unique_ptr< tflite::FlatBufferModel > model =
      tflite::FlatBufferModel::BuildFromFile(filename);
  TFLITE_MINIMAL_CHECK(model != nullptr);

  // Build the interpreter with the InterpreterBuilder.
  // Note: all Interpreters should be built with the InterpreterBuilder,
  // which allocates memory for the Interpreter and does various set up
  // tasks so that the Interpreter can read the provided model.
  tflite::ops::builtin::BuiltinOpResolver resolver;
  tflite::InterpreterBuilder builder(*model, resolver);
  // builder.SetNumThreads(12);
  // 初始化解釋器
  std::unique_ptr< tflite::Interpreter > interpreter;
  builder(&interpreter);
  TFLITE_MINIMAL_CHECK(interpreter != nullptr);

auto a_input = interpreter- >inputs()[0];
auto a_input_batch_size = interpreter- >tensor(a_input)- >dims_signature- >data[0];
auto a_input_height = interpreter- >tensor(a_input)- >dims_signature- >data[1];
auto a_input_width = interpreter- >tensor(a_input)- >dims_signature- >data[2];
auto a_input_channels = interpreter- >tensor(a_input)- >dims_signature- >data[3];

std::cout < < "The input tensor has the following dimensions: ["
          < < a_input_batch_size < < ","
          < < a_input_height < < ","
          < < a_input_width < < ","
          < < a_input_channels < < "]" < < std::endl;

  // 強制修改 tensor 的 shape
  std::vector< int > peppa_jpg = {1,200,200,3};
  interpreter- >ResizeInputTensor(0, peppa_jpg);
  // Allocate tensor buffers.
  // 申請推理需要的內存
  TFLITE_MINIMAL_CHECK(interpreter- >AllocateTensors() == kTfLiteOk);
  printf("=== Pre-invoke Interpreter State ===n");
  // 打印解釋器的狀態
  // tflite::PrintInterpreterState(interpreter.get());

  // auto keys = interpreter- >signature_keys();
  // for (auto k: keys)
  // {
    // std::cout < < *k < < std::endl;
  // }
  // std::cout < < "---------------------------" < < std::endl;

  // 直接找到輸出 tensor 指針
  auto detection_scores_tensor = interpreter- >output_tensor_by_signature("detection_scores", "serving_default");
  auto detection_boxes_tensor = interpreter- >output_tensor_by_signature("detection_boxes", "serving_default");

  // auto abc = interpreter- >signature_outputs("serving_default");
  // std::cout < < abc.size() < < std::endl;
  // for (auto a:abc)
      // std::cout < < a.first < < "and" < < a.second < < std::endl;

  // Fill input buffers
  // TODO(user): Insert code to fill input tensors.
  // Note: The buffer of the input tensor with index `i` of type T can
  // be accessed with `T* input = interpreter- >typed_input_tensor< T >(i);`
  uint8_t * input_tensor = interpreter- >typed_input_tensor< uint8_t >(0);
  insert_raw_data(input_tensor, raw_data);
  // Run inference
  // 執行推理過程
  struct timeval tv;
  if (0 == gettimeofday(&tv, NULL))
  {
    std::cout < < tv.tv_sec * 1000000 + tv.tv_usec < < std::endl;
  }
  TFLITE_MINIMAL_CHECK(interpreter- >Invoke() == kTfLiteOk);
  if (0 == gettimeofday(&tv, NULL))
  {
    std::cout < < tv.tv_sec * 1000000 + tv.tv_usec < < std::endl;
  }
  printf("n=== Post-invoke Interpreter State ===n");
  // tflite::PrintInterpreterState(interpreter.get());

  int i  = 0;
  for ( ; i < 2; i++)
  {
      std::cout < < detection_scores_tensor- >data.f[i] < < '[';
      std::cout < < detection_boxes_tensor- >data.f[i*4] < < ',';
      std::cout < < detection_boxes_tensor- >data.f[i*4 + 1] < < ',';
      std::cout < < detection_boxes_tensor- >data.f[i*4  +2] < < ',';
      std::cout < < detection_boxes_tensor- >data.f[i*4+3] < < ']' < < std::endl;
  }
  // Read output buffers
  // TODO(user): Insert getting data out code.
  // Note: The buffer of the output tensor with index `i` of type T can
  // be accessed with `T* output = interpreter- >typed_output_tensor< T >(i);`
  // T* output = interpreter- >typed_output_tensor< T >(i);

  return 0;
}

測試結果為：

? ./minimal model.tflite
2023-12-27 14:19:30.468885: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
INFO: Created TensorFlow Lite delegate for select TF ops.
2023-12-27 14:19:30.491445: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO: TfLiteFlexDelegate delegate: 4 nodes delegated out of 21284 nodes with 2 partitions.

The input tensor has the following dimensions: [1,-1,-1,3]
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
WARNING: Attempting to use a delegate that only supports static-sized tensors with a graph that has dynamic-sized tensors (tensor#394 is a dynamic-sized tensor).
=== Pre-invoke Interpreter State ===
1703657970521273
1703657971811800

=== Post-invoke Interpreter State ===
0.831981[0.23847,0.269423,0.909584,0.87969]
0.679475[0.114574,0.145309,0.785652,0.755186]

從代碼中可以看到，通過在 TFLITE_MINIMAL_CHECK(interpreter->Invoke() == kTfLiteOk); 前后分別添加如下代碼打印時間戳：

if (0 == gettimeofday(&tv, NULL))                                                                                                |    main(int argc,char * argv[])
    {                                                                                                                                |~
      std::cout < < tv.tv_sec * 1000000 + tv.tv_usec < < std::endl;                                                                    |~
    }

發現默認的推理耗時：

單位us
1703657970521273
1703657971811800

可以看到推理耗時首次在PC端大概1.3s(這個原因暫時未知，如果有知道的小伙伴希望可以告知一下)，之后的推理時間大概在400ms附近,我繪制了接近50次的推理耗時，結果如下圖所示：

這個和用 Python 的 2s 左右比較，速度提高了接近5倍。
因為，我在 C++ 代碼這里只是打印出了前2個概率較高的推理結果，這里截取 Python 端的前 2 個推理的結果對比如下：

'detection_scores': < tf.Tensor: shape=(1, 100), dtype=float32, numpy=
array([[0.8284813 , 0.67629, ....
{'detection_boxes': < tf.Tensor: shape=(1, 100, 4), dtype=float32, numpy=
array([[[0.23848376, 0.26942557, 0.9095545 , 0.8796709 ],
[0.1146237 , 0.14536926, 0.7857162 , 0.7552357 ],
...

附上使用 Python 標注后的圖片信息：

從數據中可以看到結果是完全匹配的，至此就完成了使用 C++ 在 PC 端對 tensorflow Lite 的調用。

到現在為止，完成了 C++ 在 PC 的推理測試，因為我的項目是要跟蹤目標的，核心是對采集的圖像進行識別，根據識別的目標位置變化驅動轉臺反向運動，將目標鎖定在視場中心，本次試用我重點將工作放在目標識別，檢測以及動作預測上，這里我選擇了佩奇作為識別的目標，繪制了四張圖片，佩奇分別在上，下，左，右位置。我將它們放在一張圖上。

接著就是在 minimal.cc 文件中修改邏輯了。我將這幾張佩奇的圖片對應的 RGB 信息存儲在 4 個數組中。然后定義一個 map 來索引它們。

static jpg_info_t gs_test_peppa_maps[4] = {
    {up_raw_data, "up"},
    {down_raw_data, "down"},
    {left_raw_data, "left"},
    {right_raw_data, "right"},
  };

通過在 main 函數的 while(1) 中使用隨機數調用對應的圖片模擬目標的移動，然后通過執行模型推理計算目標中心點相對上一次的偏移，接著通過打印輸出對應的控制動作反向修正目標的偏移實現目標鎖定在視場中心的效果，這部分動作控制的邏輯如下：

int do_with_move_action(int &last_x, int &last_y, int x, int y)
  {
    if (x > last_x)
      std::cout < < "move right ";
    else if (x < last_x)
      std::cout < < "move left ";

    if (y > last_y)
      std::cout < < "move down";
    else if (y < last_y)
      std::cout < < "move up";
 
    std::cout < < std::endl;

    last_x = x;
    last_y = y;

    return 0;
  };

整個 minimal.cc 文件修改為如下所示：

/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include < cstdio >
#include < iostream >
#include < vector >
#include < sys/time.h >

#include "tensorflow/lite/interpreter.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/optional_debug_tools.h"

// This is an example that is minimal to read a model
// from disk and perform inference. There is no data being loaded
// that is up to you to add as a user.
//
// NOTE: Do not add any dependencies to this that cannot be built with
// the minimal makefile. This example must remain trivial to build with
// the minimal build tool.
//
// Usage: minimal < tflite model >

#define TFLITE_MINIMAL_CHECK(x)                              
  if (!(x)) {                                                
    fprintf(stderr, "Error at %s:%dn", __FILE__, __LINE__); 
    exit(1);                                                 
  }

int insert_raw_data(uint8_t *dst, unsigned int *data)
{
   int i, j, k, l;

   for (i = 0; i < 200; i++)
     for (j = 0; j < 200; j++)
     {
        *dst++ = *data > > 16 & 0XFF;
        *dst++ = *data > > 8 & 0XFF;
        *dst++ = *data & 0XFF;
        data++;
     }
   return 0;
}

int dump_tflite_tensor(TfLiteTensor *tensor)
{
  std::cout < < "Name:" < < tensor- >name < < std::endl;
  if (tensor- >dims)
  {
      std::cout < < "Shape: [" ;
      for (int i = 0; i < tensor- >dims- >size; i++)
          std::cout < < tensor- >dims- >data[i] < < ",";
      std::cout < < "]" < < std::endl;
  }
  std::cout < < "Type:" < < tensor- >type < < std::endl;

  return 0;
}

extern unsigned int raw_data[];
extern unsigned int up_raw_data[];
extern unsigned int down_raw_data[];
extern unsigned int left_raw_data[];
extern unsigned int right_raw_data[];

typedef struct jpg_info {
  unsigned int *data;
  char name[8];
} jpg_info_t;

static jpg_info_t gs_test_peppa_maps[4] = {
  {up_raw_data, "up"},
  {down_raw_data, "down"},
  {left_raw_data, "left"},
  {right_raw_data, "right"},
};

int do_with_move_action(int &last_x, int &last_y, int x, int y)
{
  if (x > last_x)
    std::cout < < "move right ";
  else if (x < last_x)
    std::cout < < "move left ";

  if (y > last_y)
    std::cout < < "move down";
  else if (y < last_y)
    std::cout < < "move up";

  std::cout < < std::endl;

  last_x = x;
  last_y = y;

  return 0;
};


int main(int argc, char* argv[]) {
  if (argc != 2) {
    fprintf(stderr, "minimal < tflite model >n");
    return 1;
  }
  const char* filename = argv[1];

  // Load model
  // 加載模型
  std::unique_ptr< tflite::FlatBufferModel > model =
      tflite::FlatBufferModel::BuildFromFile(filename);
  TFLITE_MINIMAL_CHECK(model != nullptr);

  // Build the interpreter with the InterpreterBuilder.
  // Note: all Interpreters should be built with the InterpreterBuilder,
  // which allocates memory for the Interpreter and does various set up
  // tasks so that the Interpreter can read the provided model.
  tflite::ops::builtin::BuiltinOpResolver resolver;
  tflite::InterpreterBuilder builder(*model, resolver);
  builder.SetNumThreads(4);
  // 初始化解釋器
  std::unique_ptr< tflite::Interpreter > interpreter;
  builder(&interpreter);
  TFLITE_MINIMAL_CHECK(interpreter != nullptr);

auto a_input = interpreter- >inputs()[0];
auto a_input_batch_size = interpreter- >tensor(a_input)- >dims_signature- >data[0];
auto a_input_height = interpreter- >tensor(a_input)- >dims_signature- >data[1];
auto a_input_width = interpreter- >tensor(a_input)- >dims_signature- >data[2];
auto a_input_channels = interpreter- >tensor(a_input)- >dims_signature- >data[3];

std::cout < < "The input tensor has the following dimensions: ["
          < < a_input_batch_size < < ","
          < < a_input_height < < ","
          < < a_input_width < < ","
          < < a_input_channels < < "]" < < std::endl;

  // 強制修改 tensor 的 shape
  std::vector< int > peppa_jpg = {1,200,200,3};
  interpreter- >ResizeInputTensor(0, peppa_jpg);
  // Allocate tensor buffers.

  // Fill input buffers
  // TODO(user): Insert code to fill input tensors.
  // Note: The buffer of the input tensor with index `i` of type T can
  // be accessed with `T* input = interpreter- >typed_input_tensor< T >(i);`
  uint8_t * input_tensor;
  int map_index;
  int pos_x, pos_y;
  int last_pos_x = 100, last_pos_y = 100;

  while(1)
  {
  // 申請推理需要的內存
  TFLITE_MINIMAL_CHECK(interpreter- >AllocateTensors() == kTfLiteOk);
  // printf("=== Pre-invoke Interpreter State ===n");
  input_tensor = interpreter- >typed_input_tensor< uint8_t >(0);

  // 直接找到輸出 tensor 指針
  auto detection_scores_tensor = interpreter- >output_tensor_by_signature("detection_scores", "serving_default");
  auto detection_boxes_tensor = interpreter- >output_tensor_by_signature("detection_boxes", "serving_default");
      map_index = random() % 4;

      std::cout < < "This raw " < < gs_test_peppa_maps[map_index].name < < '@' < < map_index < < std::endl;
      insert_raw_data(input_tensor, gs_test_peppa_maps[map_index].data);
      // Run inference
      // 執行推理過程
      struct timeval tv;
      if (0 == gettimeofday(&tv, NULL))
      {
        std::cout < < tv.tv_sec * 1000000 + tv.tv_usec < < '~';
      }
      TFLITE_MINIMAL_CHECK(interpreter- >Invoke() == kTfLiteOk);
      if (0 == gettimeofday(&tv, NULL))
      {
        std::cout < < tv.tv_sec * 1000000 + tv.tv_usec < < std::endl;
      }
      // printf("n=== Post-invoke Interpreter State ===n");
      // tflite::PrintInterpreterState(interpreter.get());
      std::cout < < detection_boxes_tensor- >data.f[0] < < detection_boxes_tensor- >data.f[1] < <
          detection_boxes_tensor- >data.f[2] < < detection_boxes_tensor- >data.f[3] < < std::endl;

      // 這里注意，推理結果方框的格式是 (y1, x1) 和 (y2, x2)
      pos_y = 100 * (detection_boxes_tensor- >data.f[0] + detection_boxes_tensor- >data.f[2]);
      pos_x = 100 * (detection_boxes_tensor- >data.f[1] + detection_boxes_tensor- >data.f[3]);
          std::cout < < detection_scores_tensor- >data.f[0] < < '[';
          std::cout < < pos_x < < ',';
          std::cout < < pos_y < < ']' < < std::endl;

      do_with_move_action(last_pos_x, last_pos_y, pos_x, pos_y);
      usleep(1000);
      }
  // Read output buffers
  // TODO(user): Insert getting data out code.
  // Note: The buffer of the output tensor with index `i` of type T can
  // be accessed with `T* output = interpreter- >typed_output_tensor< T >(i);`
  // T* output = interpreter- >typed_output_tensor< T >(i);

  return 0;
}

截取測試部分的截圖如下所示：

Screenshot from 2023-12-27 16-59-05.png

接下來就是重新交叉編譯 minimal 工程，然后在飛騰派上測試了。過程和 PC 端差別不大，首先通過 scp 發送到飛騰派，然后查看下依賴：

red@phytiumpi:/tmp$ ldd minimal
        linux-vdso.so.1 (0x0000ffff9cb15000)
        libtensorflowlite_flex.so = > /lib/libtensorflowlite_flex.so (0x0000ffff805e0000)
        librt.so.1 = > /lib/aarch64-linux-gnu/librt.so.1 (0x0000ffff805c0000)
        libdl.so.2 = > /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffff805a0000)
        libpthread.so.0 = > /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff80580000)
        libm.so.6 = > /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff804e0000)
        libstdc++.so.6 = > /lib/aarch64-linux-gnu/libstdc++.so.6 (0x0000ffff802b0000)
        libgcc_s.so.1 = > /lib/aarch64-linux-gnu/libgcc_s.so.1 (0x0000ffff80280000)
        libc.so.6 = > /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff800d0000)
        /lib/ld-linux-aarch64.so.1 (0x0000ffff9cadc000)

接著運行 minimal 進行模型推理。這里只是展示下測試的結果，以及預測的耗時：

Screenshot from 2023-12-27 20-12-42.png
可以從圖中看出，有關模型推理的結果部分和在PC端的可以匹配。

下面從打印信息中繪制飛騰派上的推理耗時，結果如下圖所示：

可以看到在飛騰派上首次推理接近8分鐘（不知道什么原因），之后趨向穩定，單次推理在1.2s左右。和PC端的 400ms 差別不是特別明顯，我使用 btop 看了下，在飛騰派上開了22個線程：
Screenshot from 2023-12-27 20-25-54.png

同樣的在PC端，使用 btop 看下：

Screenshot from 2023-12-27 20-27-08.png

開了30個線程?，F在看來飛騰派使用 C++ CPU 推理的速度大概是 1/3 的 PC 性能。

文章寫到這里，就暫時完成本次在飛騰派上的試用工作，通過最近這一系列的連載文章，我主要記錄了自己如何一步步實現在飛騰派上部署目標識別算法，并提取模型輸出進一步完成控制動作的這個過程。
列舉這近一個月來的文章匯總：

【飛騰派4G版免費試用】第一章：從 Armbian 構建并安裝 jammy 到飛騰派
【飛騰派4G版免費試用】第二章：在PC端使用 TensorFlow2 訓練目標檢測模型
【飛騰派4G版免費試用】第三章：抓取圖像，手動標注并完成自定義目標檢測模型訓練和測試
【飛騰派4G版免費試用】第四章：部署模型到飛騰派的嘗試

在臨近試用的最后，再次向提供我這次試用機會的電子發燒友,飛騰信息技術有限公司表示感謝。

希望我連載的這些文章可以對想接觸體驗使用TensorFlow Lite在嵌入式設備上進行機器學習的小伙伴提供一些幫助。

審核編輯黃宇

聲明：本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人，不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用，如有內容侵權或者其他違規問題，請聯系本站處理。舉報投訴

C++

C++

+關注

關注
21

文章
2066

瀏覽量
72900
tensorflow

tensorflow

+關注

關注
13

文章
313

瀏覽量
60242
飛騰派

飛騰派

+關注

關注
2

文章
9

瀏覽量
109

評論

相關推薦

【飛騰派4G版免費試用】第四章：部署模型到飛騰派的嘗試

) red@phytiumpi:~$ 可以看到檢測的結果和PC端的一致。至此已經完成了佩奇檢測模型部署到飛騰派的前期準備工作（環境搭建

發表于 12-20 21:10

【飛騰派4G版免費試用】第五章：使用C++部署tflite模型到飛騰派

版免費試用】第三章：抓取圖像，手動標注并完成自定義目標檢測模型訓練和測試【飛騰派

發表于 12-27 21:17

【飛騰派4G版免費試用】2飛騰派openwrt固件燒錄

接上文【飛騰派4G版免費試用】環境搭建 9-工具包 Win32DiskImager2.0.1.8寫鏡像文件。選擇：

發表于 12-27 21:37

【飛騰派4G版免費試用】初步認識飛騰派4G版開發板

這幾天收到飛騰派 4G 基礎套件，給大家做個介紹，讓大家可以了解一下這塊開發板，飛騰派 4G

發表于 01-02 22:23

【飛騰派4G版免費試用】大家來了解飛騰派4G版開發板

今天把收到的飛騰派4G版開發板做各視頻，讓大家直觀的了解一下做工精細，布線合理，做工扎實的飛騰派4G

發表于 01-02 22:43

【飛騰派4G版免費試用】2飛騰派 openkylin 固件燒錄

接上文【飛騰派4G版免費試用】環境搭建 9-工具包 Win32DiskImager2.0.1.8寫鏡像文件。選擇：

發表于 01-06 22:09

【飛騰派4G版免費試用】飛騰派開發板運行Ubuntu系統

飛騰派4G版開發板是一款做工精細，布線合理的開發板，今天給大家介紹一下如何運行Ubuntu系統，下面是網上的資料，幫助大家快速認識飛騰派

發表于 01-08 22:40

【飛騰派4G版免費試用】飛騰派運行uefi固件，加載通用操作系統

進一步優化吧。二、環境飛騰派一個 4G版本， 32GU盤兩個，一個做安裝盤，一個做系統盤。 Vkylin鏡像一個（這個版本特殊渠道拿到的），一般來說用通用ubuntu都可以，需要安裝后更換成

發表于 01-11 12:35

【飛騰派4G版免費試用】紅綠燈項目-2飛騰派 openkylin 進行IO控制2

| 接上文【飛騰派4G版免費試用】紅綠燈項目-2飛騰派

發表于 01-17 19:46

【飛騰派4G版免費試用】來更多的了解飛騰派4G版開發板！

以及優刻谷邊緣物聯網關等產品。值得一提的是，飛騰還公布了飛騰派“種子計劃”，該計劃將在飛騰派發布一年內，以創新大賽、現場交流會、產品賦能培訓會等形式，培育不少于10000名

發表于 01-22 00:34

【飛騰派4G版免費試用】飛騰派4G版開發板套裝測試及環境搭建

先簡單介紹一下這款飛騰派4G版開發板套裝；飛騰派是由中電港螢火工場研發的一款面向行業工程師、學生和愛好者的開源硬件。主板處理器采用

發表于 01-22 00:47

飛騰派4g試用

4G飛騰派

夢の旅驛站
發布于 :2024年01月07日 14:13:20

【新品體驗】飛騰派4G版基礎套裝免費試用

飛騰派是由飛騰攜手中電港螢火工場研發的一款面向行業工程師、學生和愛好者的開源硬件，采用飛騰嵌入式四核處理器，兼容ARM V8架構，板載64位 DDR

發表于 10-25 11:44

【飛騰派4G版免費試用】1.開箱與鏡像燒錄

【飛騰派4G版免費試用】1.開箱 & 鏡像燒錄首先非常感謝飛騰

發表于 12-08 12:47

【飛騰派4G版免費試用】開發環境搭建

，非常有競爭力的開源產品。欣賞完飛騰派的外觀和做工，下面進入正題。將這么好的開源硬件耍起來。 1、燒錄系統鏡像飛騰派系統可以選擇從TF卡啟動。 1）準備一張32G及以上的TF卡。

發表于 12-09 17:53

Red Linux
專欄

0 文章 0 閱讀 0 粉絲 0 點贊

關注個人主頁

Hot 【風火輪YY3568開發板免費體驗】第五章：在 Solus 上運行 npu 例程并搭建 nfs 環境實現在 YY3568 上實際運行 npu 例
Hot 【風火輪YY3568開發板免費體驗】第二章：YY3568 ffmpeg 編譯以及 nanogui 移植

New 【飛騰派4G版免費試用】第五章：使用C++部署tflite模型到飛騰派
New 【飛騰派4G版免費試用】第二章：在PC端使用 TensorFlow2 訓練目標檢測模型

精選推薦
更多

文章

資料

帖子

一期一會，中圖儀器參加德國斯圖加特Control展，共探質控創新技術

中圖儀器
18小時前

210 閱讀

OpenHarmony語言基礎類庫【@ohos.util.LightWeightMap (非線性容器LightWeightMap)】

jf_46214456
18小時前

395 閱讀

鴻蒙OpenHarmony【輕量系統環境搭建】（基于Hi3861開發板）

jf_46214456
18小時前

364 閱讀

頻譜擴展（FSS）功能：FSS在現代 SMPS 設計中的優勢及局限性

eeDesign
18小時前

258 閱讀

HarmonyOS開發案例：【image、image-animator組件】

jf_46214456
18小時前

368 閱讀

Protel視頻教程免費下載

電子模塊
3463

免費

0下載

飛思卡爾醫療應用用戶指南

陳貝貝
9333KB

免費

130下載

基于CoDeSys的嵌入式軟PLC系統設計與實現

fdvcxhtg
0.18 MB

免費

38下載

基于ADP1851-EVALZ直流到直流單輸出電源的參考設計

王剛
1.14MB

免費

6下載

StrongShop跨境電商商城網站

chunhuahua
10.39 MB

免費

0下載

基于 FPGA 的光纖混沌加密系統

FPGA技術江湖
1天前

226 閱讀

使用DSP28377D外擴RAM寫入一個地址數據相鄰地址數據也會改變

鄭佳龍
1天前

479 閱讀

rk3568跑屏幕共享app時出現系統重啟

jf_75620565
1天前

478 閱讀

信號線上串個小電阻干啥用的？

回頭太晚
1天前

507 閱讀

給我一個FPGA，可以撬起所有顯示的接口和面板

FPGA技術江湖
2天前

494 閱讀

推薦專欄
更多

華秋（原“華強聚豐”）：

電子發燒友

華秋開發

華秋電路(原"華強PCB")

華秋商城(原"華強芯城")

華秋智造

My ElecFans

APP
網站地圖

設計技術

可編程邏輯

電源/新能源

MEMS/傳感技術

測量儀表

嵌入式技術

制造/封裝

模擬技術

RF/無線

接口/總線/驅動

處理器/DSP

EDA/IC設計

存儲技術

光電顯示

EMC/EMI設計

連接器

行業應用

LEDs

汽車電子

音視頻及家電

通信網絡

醫療電子

人工智能

虛擬現實

可穿戴設備

機器人

安全設備/系統

軍用/航空電子

移動通信

工業控制

便攜設備

觸控感測

物聯網

智能電網

區塊鏈

新科技

特色內容

專欄推薦

學院

設計資源

設計技術

電子百科

電子視頻

元器件知識

工具箱

VIP會員

最新技術文章

社區

小組

論壇

問答

評測試用

企業服務

產品

資料

文章

方案

企業

供應鏈服務

硬件開發

華秋電路

華秋商城

華秋智造

nextPCB

BOM配單

媒體服務

網站廣告

在線研討會

活動策劃

新聞發布

新品發布

小測驗

設計大賽

華秋

關于我們

投資關系

新聞動態

加入我們

聯系我們

舉報投訴

社交網絡

微博

移動端

發燒友APP

硬聲APP

WAP

聯系我們

廣告合作

王婉珠：wangwanzhu@elecfans.com

內容合作

黃晶晶：huangjingjing@elecfans.com

內容合作（海外）

張迎輝：mikezhang@elecfans.com

供應鏈服務 PCB/IC/PCBA

江良華：lanhu@huaqiu.com

投資合作

曾海銀：zenghaiyin@huaqiu.com

社區合作

劉勇：liuyong@huaqiu.com

關注我們的微信

下載發燒友APP

電子發燒友觀察

電子工程師社區

1-32層PCB打樣·中小批量

元器件現貨·全球代購·SmartBOM

SMT貼片·PCBA加工

PCB Manufacturer

華秋簡介

企業動態

聯系我們

企業文化

企業宣傳片

加入我們

版權所有 ? 湖南華秋數字科技有限公司
電子發燒友 （電路圖） 湘公網安備43011202000918 電信與信息服務業務經營許可證：合字B2-20210191 工商網監湘ICP備 2023018690 號

亚洲欧美日韩精品久久_久久精品AⅤ无码中文_日本中文字幕有码在线播放_亚洲视频高清不卡在线观看