-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added simple tile size calculation #2
base: impala
Are you sure you want to change the base?
Conversation
It is based on img.width and available cache sizes
This seems to block only for L1 cache: I guess you have to pass For debugging, you can define a Boolean |
@with the code in 9720ea8 I get 478ms meadian(27) and with my version in c097a4f I get 493 ms. We are talking about gaussian with You mention blocking for multiple levels, with the given image size of 4096x4096, the L1 block_size needs to be <= 1638 elements, for all other caches the layer conditions are fulfilled this the original size (L2 would need blocking with <=13107 and L3 <= 1M elements). Did I misunderstand what you mean by blocking for other levels? |
mapping_cpu.impala
Outdated
//print_string(", "); | ||
//print_int(mask.size_y); | ||
//print_string(")\n"); | ||
if debug_tiling @{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No @ is required here - the frontend will already discard the conditional code if debug_tiling
is false.
Your current logic starts |
To make debugging programs using PE easier, we will add an instruction to show the "status" during PE, something like |
Regarding speed: your version takes 286 ms and the version on master takes only 76 ms on my laptop (gaussian & iteration_advanced). |
Ok, we've added the Adding let (xtile_dim, ytile_dim) = @get_tile_dims(mask, img);
pe_info("xtile_dim", xtile_dim);
pe_info("ytile_dim", ytile_dim); Shows that
That is, some condition in There are two solution:
|
To be clear: Examples:
That being said, note that usually, other optimizations in thorin will fold those loads & stores afterwards. So folding these loads & stores is only critical if control flow depends on such variables. I think we should fold loads & store during partial evaluation. This is, however, more complicated than it sounds. So, don't expect it to be fixed next week :( |
That was a a measurement mistake (last deleted comment). Is there a way to parse command line arguments? I would like to switch between the different versions dynamically as to avoid such mistakes. |
You can:
fn impala_main(do_thing: i32) -> () {
if do_thing == 42 {
// ...
}
} extern "C" void impala_main(int);
int main(int argc, char** argv) {
int do_thing = parse_args(argc, argv);
impala_main(do_thing);
}
extern "C" {
fn strcmp(&[u8], &[u8]) -> i32;
}
fn main(argc: i32, argv: &[&[u8]]) -> i32 {
for i in range(1, argc) {
if !strcmp(argv(i), "-h") {
usage();
return(0)
} else if !strcmp(argv(i), ...) {
// ... and so on
}
}
} |
Some of our tests in Impala use argc and argv, for example nbody I'm going to have a look at your changes tomorrow. |
I had a look at the code, first a minor remark:
Looking at execution times:
So it seems something is going wrong here ...
shows the problem:
I can push these changes to your fork if you give me the permissions. |
I've pushed my changes. Maybe also good to know: |
Enabling fast-math gives:
|
It is currently tailored to 2D boxed stencils, but can be generalized