Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize pooling accumulator inference #947

Closed
wants to merge 7 commits into from
Closed

Conversation

jmduarte
Copy link
Member

@jmduarte jmduarte commented Dec 21, 2023

Currently, pooling uses a wider data type for accumulation inferred based on the input type and the number of elements. However, we only implemented this for ap_fixed<W,I> and ap_int<W> and the template matching fails for ap_uint<W> and ap_ufixed<W,I,Q,O> and ap_fixed<W,I,Q,O> (if Q and O are not the default values). This PR (hopefully) makes it so it matches all those additional types.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change which adds functionality)

Tests

Example before this fix:
profiling_global_average_pooling1d (1)

Example after this fix:
profiling_global_average_pooling1d (2)

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@Brainz22 @ddiaz006 @AnthonyAportela @danprim

@jmduarte
Copy link
Member Author

Do we need a corresponding fix for Quartus?

// Returns the mean value of an array of size N
// Overload of the above function; using a wider accumulator than the input to avoid overflow
template <int W, int N> ac_int<W, true> avg(ac_int<W, true> (&x)[N]) {
hls_register ac_int<W + ceillog2(N), true> tmp = 0;
// Due to loop dependencies, pipelining & unrolling is not possible
// Explictily disabling pipeline significantly reduces resource usage
#pragma disable_loop_pipelining
for (int i = 0; i < N; i++) {
tmp += x[i];
}
tmp /= N;
// Cast back to original type
ac_int<W, true> y = static_cast<ac_int<W, true>>(tmp);
return tmp;
}
// Returns the mean value of an array of size N
// Overload of the above function; using a wider accumulator than the input to avoid overflow
template <int W, int I, int N> ac_fixed<W, I, true> avg(ac_fixed<W, I, true> (&x)[N]) {
hls_register ac_fixed<W + ceillog2(N), I + ceillog2(N), true> tmp = 0;
// Due to loop dependencies, pipelining & unrolling is not possible
// Explictily disabling pipeline significantly reduces resource usage
#pragma disable_loop_pipelining
for (int i = 0; i < N; i++) {
tmp += x[i];
}
tmp /= N;
// Cast back to original type
ac_fixed<W, I, true> y = tmp;
return y;
}

@jmduarte
Copy link
Member Author

pre-commit.ci autofix

@jmduarte jmduarte requested a review from vloncar December 29, 2023 01:34
@calad0i
Copy link
Contributor

calad0i commented Jan 3, 2024

Currently, there is an accum_t set for pooling layers which is only effective for io_stream implementation: vivado_backend.py#L376-L380, nnet_pooling_stream.h#L386. (Which could also be problematic at the moment, see #917.)

Maybe we should converge on whether to set pooling accumulator size with accum_t or cpp template matching? @vloncar

@jmitrevs
Copy link
Contributor

Do we need a corresponding fix for Quartus?

// Returns the mean value of an array of size N
// Overload of the above function; using a wider accumulator than the input to avoid overflow
template <int W, int N> ac_int<W, true> avg(ac_int<W, true> (&x)[N]) {
hls_register ac_int<W + ceillog2(N), true> tmp = 0;
// Due to loop dependencies, pipelining & unrolling is not possible
// Explictily disabling pipeline significantly reduces resource usage
#pragma disable_loop_pipelining
for (int i = 0; i < N; i++) {
tmp += x[i];
}
tmp /= N;
// Cast back to original type
ac_int<W, true> y = static_cast<ac_int<W, true>>(tmp);
return tmp;
}
// Returns the mean value of an array of size N
// Overload of the above function; using a wider accumulator than the input to avoid overflow
template <int W, int I, int N> ac_fixed<W, I, true> avg(ac_fixed<W, I, true> (&x)[N]) {
hls_register ac_fixed<W + ceillog2(N), I + ceillog2(N), true> tmp = 0;
// Due to loop dependencies, pipelining & unrolling is not possible
// Explictily disabling pipeline significantly reduces resource usage
#pragma disable_loop_pipelining
for (int i = 0; i < N; i++) {
tmp += x[i];
}
tmp /= N;
// Cast back to original type
ac_fixed<W, I, true> y = tmp;
return y;
}

Actually I think it also needs to be done for vitis (which has its own implementation) in addition to quartus.

@calad0i calad0i mentioned this pull request Feb 21, 2024
7 tasks
@jmitrevs
Copy link
Contributor

What did we decide to do about accum_t vs template matching?

@jmitrevs
Copy link
Contributor

My general feeling is to prefer the #973 style solution, though implementation details may still need to be addressed.

@jmitrevs
Copy link
Contributor

We decided to go with #973 so I will close this PR.

@jmitrevs jmitrevs closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants