Function Approximation Using Lookup Table and Interpolation

Introduction

To compute the output of the function given an input, the computer will usually have to perform all the mathematical or logical operations in the function. Depending on the complexity of the function, it can be very slow. However, if the number of different inputs for the function is small, we could actually pre-compute the output for each input and save it as a lookup table on the memory. During the runtime, when the function is given an input, instead of computing the result, we could just look up the output in the lookup table. In some scenarios, even if the number of different inputs is huge, we can still use lookup table and do interpolation to approximate the function output for acceleration.

In this blog post, I would like to quickly discuss the function approximation using lookup table and interpolation.

Square Root Approximation Using Lookup Tables and Interpolation

This is a C++ implementation for the example of computing the UINT16 value square root using one or two lookup tables presented in the “Motorola CPU32 Bulletin”.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
#include <cassert>
#include <cmath>
#include <cstdint>
#include <iostream>
#include <vector>

// Input range: [min_input, uint32_t max_input)
// T1 and T2 can only be unsigned integral types.
template <typename T1, typename T2>
std::vector<T1> generate_sqrt_uint_lookup_table(uint32_t table_size,
T2 min_input, T2 max_input)
{
assert(max_input > min_input);
std::vector<T1> lookup_table(table_size, static_cast<T1>(0));
uint32_t const num_bins{table_size - 1U};
uint32_t const num_values_per_bin{
(static_cast<uint32_t>(max_input - min_input) + num_bins) / num_bins};
for (uint32_t i{0}; i < table_size; ++i)
{
double const input_value{static_cast<double>(
static_cast<uint32_t>(min_input) + i * num_values_per_bin)};
double const output_value{std::nearbyint(std::sqrt(input_value))};
lookup_table.at(i) = static_cast<T1>(output_value);
}
return lookup_table;
}

template <typename T1, typename T2>
T1 sqrt_uint_lookup(T2 input, std::vector<T1> const& lookup_table, T2 min_input,
T2 max_input)
{
assert(max_input > min_input);
if (input >= max_input)
{
return lookup_table.at(lookup_table.size() - 1U);
}
if (input < min_input)
{
return lookup_table.at(0);
}
uint32_t const num_bins{static_cast<uint32_t>(lookup_table.size()) - 1U};
uint32_t const num_values_per_bin{
(static_cast<uint32_t>(max_input - min_input) + num_bins) / num_bins};
uint32_t const bin_index{input / num_values_per_bin};
T1 const bin_value{lookup_table.at(bin_index)};
T1 const bin_value_next{lookup_table.at(bin_index + 1U)};
T2 const bin_min_input_value{static_cast<T2>(
static_cast<uint32_t>(min_input) + bin_index * num_values_per_bin)};
T2 const bin_fraction{static_cast<T2>(input - bin_min_input_value)};
// Linear interpolation.
T1 const output_value{static_cast<T1>(
bin_value + static_cast<T1>(std::nearbyint(
static_cast<double>(bin_fraction) /
static_cast<double>(num_values_per_bin) *
static_cast<double>(bin_value_next - bin_value))))};
return output_value;
}

template <typename T1, typename T2>
T1 sqrt_uint_lookup(T2 input, std::vector<T1> const& lookup_table_1,
std::vector<T1> const& lookup_table_2, T2 table_1_min_input,
T2 table_1_max_input, T2 table_2_min_input,
T2 table_2_max_input)
{
assert(table_1_max_input > table_1_min_input);
assert(table_2_max_input > table_2_min_input);
assert(table_1_max_input - table_1_min_input >
table_2_max_input - table_2_min_input);
if ((input >= table_2_min_input) && (input < table_2_max_input))
{
return sqrt_uint_lookup(input, lookup_table_2, table_2_min_input,
table_2_max_input);
}
else
{
return sqrt_uint_lookup(input, lookup_table_1, table_1_min_input,
table_1_max_input);
}
}

int main()
{
constexpr uint32_t table_1_size{257U};
constexpr uint16_t table_1_min_input{0U};
constexpr uint16_t table_1_max_input{65535U};

constexpr uint32_t table_2_size{17U};
constexpr uint16_t table_2_min_input{0U};
constexpr uint16_t table_2_max_input{256U};

std::vector<uint8_t> const sqrt_uint8_lookup_table_1{
generate_sqrt_uint_lookup_table<uint8_t, uint16_t>(
table_1_size, table_1_min_input, table_1_max_input)};

std::vector<uint8_t> const sqrt_uint8_lookup_table_2{
generate_sqrt_uint_lookup_table<uint8_t, uint16_t>(
table_2_size, table_2_min_input, table_2_max_input)};

uint16_t input;
std::cout
<< "Enter a UINT16 number [0, 65536) to calculate its square root: ";
std::cin >> input;

uint8_t const result_one_lookup_table{
sqrt_uint_lookup(input, sqrt_uint8_lookup_table_1, table_1_min_input,
table_1_max_input)};
uint8_t const result_two_lookup_tables{sqrt_uint_lookup(
input, sqrt_uint8_lookup_table_1, sqrt_uint8_lookup_table_2,
table_1_min_input, table_1_max_input, table_2_min_input,
table_2_max_input)};
std::cout << "Square root using one lookup table: "
<< static_cast<uint32_t>(result_one_lookup_table) << std::endl;
std::cout << "Square root using two lookup tables: "
<< static_cast<uint32_t>(result_two_lookup_tables) << std::endl;
std::cout << "Actual square root: "
<< static_cast<uint32_t>(std::sqrt(static_cast<double>(input)))
<< std::endl;

return 0;
}

We could see that computing the square root of UINT16 values in the range of [0, 256) using two lookup tables is more accurate than just using one lookup table.

1
2
3
4
5
6
7
8
9
10
11
$ g++ sqrt_lut.cpp -o sqrt_lut -std=c++14
$ ./sqrt_lut
Enter a UINT16 number [0, 65536) to calculate its square root: 361
Square root using one lookup table: 19
Square root using two lookup tables: 19
Actual square root: 19
$ ./sqrt_lut
Enter a UINT16 number [0, 65536) to calculate its square root: 64
Square root using one lookup table: 4
Square root using two lookup tables: 8
Actual square root: 8

Note that the actual implementation presented in the bulletin uses Motorola CPU32 assembly instructions and is much more efficient than the C++ implementation. For example, the table bin index and the interpolation fraction does not have to be computed in practice. They just constitute the hexadecimal value of the input value. For instance, the input value 361 decimal is 0169 hexadecimal. 01 is the table bin index hexadecimal (1 decimal), and 69 is the interpolation fraction hexadecimal (361 - 256 = 105 decimal).

Conclusions

For simple functions that can be computed very fast, using lookup table and specialized hardware might not be beneficial. However, if the function is computational heavy, lookup table and specialized hardware for approximation can be employed for acceleration.

References

Function Approximation Using Lookup Table and Interpolation

https://leimao.github.io/blog/Function-Approximation-Lookup-Table-Interpolation/

Author

Lei Mao

Posted on

09-22-2023

Updated on

09-22-2023

Licensed under


Comments