Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush debug print to avoid truncated output #3715

Merged
merged 1 commit into from
Jan 16, 2025
Merged

Flush debug print to avoid truncated output #3715

merged 1 commit into from
Jan 16, 2025

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 16, 2025

Quite commonly, debug dump like NVFUSER_DUMP=fusion_ir_math gets truncated when a device-side error happens. This should at least avoid that with some of fusion dump.

@naoyam naoyam requested a review from wujingyue January 16, 2025 01:31
@naoyam
Copy link
Collaborator Author

naoyam commented Jan 16, 2025

!build

Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 1 🔵⚪⚪⚪⚪
🧪 No relevant tests
⚡ Recommended focus areas for review

Function Signature

Verify that the addition of std::flush to the debug() output streams does not introduce any unintended side effects or performance concerns, especially considering the debug logging's impact on the program's behavior.

os << std::flush;
Consistency Check

Ensure that the newly added std::flush statements are consistently applied to all relevant debug output streams throughout the codebase, maintaining uniform behavior for debug logging.

  debug() << std::flush;
}

std::unordered_map<
    TensorView*,
    std::pair<std::vector<int64_t>, std::vector<int64_t>>>
Fusion::bankConflictInfo(const CompileParams& compile_params) {
  std::vector<TensorView*> smem_tvs;
  for (auto v : usedMathVals()) {
    auto tv = dynamic_cast<TensorView*>(v);
    if (tv == nullptr) {
      continue;
    }
    if (tv->getMemoryType() == MemoryType::Shared) {
      smem_tvs.push_back(tv);
    }
  }
  if (smem_tvs.empty()) {
    return {};
  }
  manage("smem_tvs", smem_tvs);

  GpuLower lower(this, compile_params);
  lower.run();
  auto kernel = lower.kernel();
  auto info = getBankConflictInfo(kernel);

  // Convert TVs in kernel to TVs in fusion
  auto smem_tvs_in_kernel =
      kernel->getManaged<std::vector<TensorView*>>("smem_tvs");
  NVF_ERROR(smem_tvs_in_kernel.size() == smem_tvs.size());
  auto getSmemTvInFusion = [&](Val* v) -> TensorView* {
    auto ti = dynamic_cast<kir::TensorIndex*>(v);
    if (ti == nullptr) {
      return nullptr;
    }
    auto tv = ti->view();
    auto it =
        std::find(smem_tvs_in_kernel.begin(), smem_tvs_in_kernel.end(), tv);
    if (it == smem_tvs_in_kernel.end()) {
      return nullptr;
    }
    auto index = std::distance(smem_tvs_in_kernel.begin(), it);
    return smem_tvs.at(index);
  };

  std::unordered_map<
      TensorView*,
      std::pair<std::vector<int64_t>, std::vector<int64_t>>>
      result;
  result.reserve(info.size());
  for (auto i : info) {
    auto expr = i.first;

    // Currently only set and load store op are supported
    NVF_ERROR(expr->inputs().size() == 1);
    NVF_ERROR(expr->outputs().size() == 1);

    auto input = getSmemTvInFusion(expr->input(0));
    auto output = getSmemTvInFusion(expr->output(0));
    if (input == nullptr) {
      NVF_ERROR(i.second.first == 0);
    } else {
      NVF_ERROR(i.second.first != 0);
      result[input].first.push_back(i.second.first);
    }
    if (output == nullptr) {
      NVF_ERROR(i.second.second == 0);
    } else {
      NVF_ERROR(i.second.second != 0);
      result[output].second.push_back(i.second.second);
    }
  }
  return result;
}

void Fusion::printMath(bool from_outputs_only) {
  FUSER_PERF_SCOPE("Fusion::printMath");

  FusionGuard fg(this);
  auto exprs_for_print = exprs();
  debug() << "Inputs:" << std::endl;
  for (auto inp : inputs()) {
    debug() << "  " << inp << std::endl;
  }

  debug() << "Outputs:" << std::endl;
  for (auto out : outputs()) {
    debug() << "  " << out << std::endl;
  }

  // If we want everything in the fusion, grab all values without uses to
  // traverse from.
  if (!from_outputs_only) {
    std::vector<Val*> leaf_vals;
    for (auto val : deterministic_vals()) {
      if (val->uses().empty()) {
        leaf_vals.push_back(val);
      }
    }
    exprs_for_print = StmtSort::getExprsTo(leaf_vals);
  }

  debug() << "\n%kernel_math {\n";
  for (auto expr : exprs_for_print) {
    debug() << expr;
  }
  debug() << "} // %kernel_math \n\n";

  debug() << std::flush;

Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine having to add more flushes in the future. You may want to let debug() subclass ostream& and overload operator<< so we can flush all messages to debug() conditioned by a flag. For example, in glog, LOG(ERROR) always flushes and LOG(INFO) doesn't.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 16, 2025

I can imagine having to add more flushes in the future. You may want to let debug() subclass ostream& and overload operator<< so we can flush all messages to debug() conditioned by a flag. For example, in glog, LOG(ERROR) always flushes and LOG(INFO) doesn't.

Agreed

@naoyam naoyam merged commit 0d0402f into main Jan 16, 2025
18 checks passed
@naoyam naoyam deleted the debug_dump_flush branch January 16, 2025 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants