`EmbeddingFwdOp` node with same functionality as `F.embedding` #3649

Priya2698 · 2024-12-26T20:15:41Z

This PR adds an EmbeddingFwdOp with same functionality as F.embedding.

I am not using take_along_axis. F.embedding allows optional parameters like max_norm, padding_idx which would require further processing if implemented using take_along_axis. So I defaulted to creating a new node to guarantee performance parity.
Thunder uses prims.EMBEDDING if the optional parameters padding_idx/max_norm are specified, else it uses prims.TAKE. This prevents nvfuser from consuming embedding operator in the other cases. Hence, in Thunder, nvfuser will also directly execute ltorch.embedding. This will require a separate backward API to consume ltorch.embedding_backward and cannot reuse grad rules for prims.EMBEDDING. Hence, the EmbeddingFwdOp naming instead of EmbeddingOp.
I first plan to plumb the fwd only embedding support in Thunder while I draft the backward node which should be very similar. Thunder reviews may bring up another way of implementing this support.

github-actions · 2025-01-16T03:06:41Z

PR Reviewer Guide 🔍

(Review updated until commit `f50fe0c`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪

🧪 PR contains tests

⚡ Recommended focus areas for review

Potential Logic Change

The EmbeddingFwdOp constructor has been modified to accept additional parameters. Review the logic to ensure it aligns with the intended functionality.

EmbeddingFwdOp::EmbeddingFwdOp(
    IrBuilderPasskey passkey,
    TensorView* output,
    TensorView* input,
    TensorView* weight,
    Val* padding_idx,
    Val* max_norm,
    Val* norm_type,
    Val* scale_grad_by_freq,
    Val* sparse)
    : Expr(passkey) {
  addOutput(output);

  addInput(input);
  addInput(weight);
  addInput(norm_type);
  addInput(scale_grad_by_freq);
  addInput(sparse);
  if (padding_idx != nullptr) {
    addInput(padding_idx);
    addDataAttribute(true);
  } else {
    addDataAttribute(false);
  }
  if (max_norm != nullptr) {
    addInput(max_norm);
    addDataAttribute(true);
  } else {
    addDataAttribute(false);
  }
}

Function Signature Change

The embedding_fwd function signature has been updated to include new parameters. Verify that the changes are consistent with the function's purpose.

TensorView* embedding_fwd(
    TensorView* input,
    TensorView* weight,
    Val* padding_idx,
    Val* max_norm,
    Val* norm_type,
    Val* scale_grad_by_freq,
    Val* sparse) {
  auto input_domain = TensorDomain::noReductions(input->getLogicalDomain());
  auto weight_domain = TensorDomain::noReductions(weight->getLogicalDomain());
  NVF_CHECK(
      !input_domain.empty(),
      "Expected input to be atleast 1D, got: ",
      input_domain.size());
  NVF_CHECK(
      weight_domain.size() == 2,
      "Expected weight to be 2D, got: ",
      weight_domain.size());

  NVF_CHECK(
      !padding_idx || padding_idx->isScalar(),
      "Expected padding_idx to be a scalar int.");
  NVF_CHECK(
      !max_norm || max_norm->isScalar(),
      "Expected max_norm to be a scalar double.");
  NVF_CHECK(
      !norm_type || norm_type->isScalar(),
      "Expected scale to be a scalar double.");
  NVF_CHECK(
      !scale_grad_by_freq || scale_grad_by_freq->isScalar(),
      "Expected scale to be a scalar bool.");
  NVF_CHECK(
      !sparse || sparse->isScalar(), "Expected scale to be a scalar bool.");

  auto ndims_out = input_domain.size() + 1;
  std::vector<IterDomain*> out_domain(ndims_out, nullptr);

  for (auto idx : c10::irange(ndims_out - 1)) {
    out_domain[idx] = ops::newOutputIterDomain({input_domain[idx]});
  }
  out_domain[ndims_out - 1] = ops::newOutputIterDomain({weight_domain.back()});
  TensorDomain* out_td = IrBuilder::create<TensorDomain>(
      out_domain, TensorDomain::getContiguityFilledWith(out_domain, true));
  TensorView* output = IrBuilder::create<TensorView>(out_td, weight->dtype());

  if (norm_type == nullptr) {
    norm_type = IrBuilder::create<Val>(2.0, DataType::Double);
  }

  if (scale_grad_by_freq == nullptr) {
    scale_grad_by_freq = input->fusion()->falseVal();
  }
  if (sparse == nullptr) {
    sparse = input->fusion()->falseVal();
  }
  IrBuilder::create<EmbeddingFwdOp>(
      output,
      input,
      weight,
      padding_idx,
      max_norm,
      norm_type,
      scale_grad_by_freq,
      sparse);

  return output;

Binding Update

The embedding_fwd binding has been updated to reflect the changes in the C++ function. Ensure that the binding is correct and functional.

nvf_ops.def(
    "embedding_fwd",
    [](FusionDefinition::Operators& self,
       Tensor input,
       Tensor weight,
       std::optional<Scalar> padding_idx,
       std::optional<Scalar> max_norm,
       std::optional<Scalar> norm_type,
       std::optional<Scalar> scale_grad_by_freq,
       std::optional<Scalar> sparse) -> decltype(auto) {
      FUSER_PERF_SCOPE("Operators.embedding_fwd");
      NVF_CHECK(
          self.validUse(), "Attempting to add to a completed definition!");
      FusionDefinition* fd = self.fusion_definition;
      size_t ndims = input.dims + 1;
      Tensor output = fd->defineTensor(/*dims=*/ndims);

      auto padding_idx_state = padding_idx.has_value()
          ? fd->recordingState(padding_idx.value()())
          : State(/*_index=*/0, /*_stype=*/serde::StateType::None);
      auto max_norm_state = max_norm.has_value()
          ? fd->recordingState(max_norm.value()())
          : State(/*_index=*/0, /*_stype=*/serde::StateType::None);
      auto norm_type_state = norm_type.has_value()
          ? fd->recordingState(norm_type.value()())
          : State(/*_index=*/0, /*_stype=*/serde::StateType::None);
      auto scale_grad_by_freq_state = scale_grad_by_freq.has_value()
          ? fd->recordingState(scale_grad_by_freq.value()())
          : State(/*_index=*/0, /*_stype=*/serde::StateType::None);
      auto sparse_state = sparse.has_value()
          ? fd->recordingState(sparse.value()())
          : State(/*_index=*/0, /*_stype=*/serde::StateType::None);

      fd->defineRecord(new EmbeddingFwdOpRecord(
          {fd->recordingState(input()),
           fd->recordingState(weight()),
           padding_idx_state,
           max_norm_state,
           norm_type_state,
           scale_grad_by_freq_state,
           sparse_state},
          {fd->recordingState(output())}));
      return output;
    },
    py::arg("input"),
    py::arg("weight"),
    py::arg("padding_idx").none(true) = py::none(),
    py::arg("max_norm").none(true) = py::none(),
    py::arg("norm_type").none(true) = py::none(),
    py::arg("scale_grad_by_freq").none(true) = py::none(),
    py::arg("sparse").none(true) = py::none(),
    py::return_value_policy::reference);

Priya2698 · 2025-01-16T21:12:38Z

!test

protonu · 2025-01-17T18:30:06Z

tests/cpp/test_embedding_node.cpp

+
+constexpr int64_t n = 5, s = 2;
+
+TEST_F(EmbeddingTest, EmbeddingFwdNode) {


I wonder if it's possible to add a check to verify the output of the toString method as well.

I don't find verifying against a handwritten string to be very robust since that representation can change based on individual toString methods so I don't add it.

protonu · 2025-01-17T18:40:40Z

csrc/ir/nodes.cpp

+  }
+  std::optional<double> max_norm = std::nullopt;
+  if (has_max_norm()) {
+    auto idx = 5 + has_padding_idx();


nit: having this free 5 bothers me a little bit, but not sure what would be better.

It may not be ideal, however, we are fetching the previous variables based on fixed indices as well. The position of the variables is constant so it should be safe.

csrc/ops/composite.cpp

Priya2698 · 2025-01-22T06:32:47Z

!test

Priya2698 force-pushed the pm/embedding branch from e0e76e8 to a0aa5dc Compare December 26, 2024 20:17

kevinstephano mentioned this pull request Jan 10, 2025

HF Llama 1B 1 Layer slowness (inference) Lightning-AI/lightning-thunder#1467

Open

Priya2698 changed the title ~~EmbeddingOp node with same functionality as F.embedding~~ EmbeddingFwdOp node with same functionality as F.embedding Jan 16, 2025

Priya2698 added 8 commits January 15, 2025 19:04

embedding op implementation, scheduling

77b09cf

add test

18816d4

finish adding python API, python tests

7a1e646

use data attributes, private members get reset

db19eff

parametrize in python test

891d50e

update version

6818d6d

rename to embedding_fwd

44780c0

fix torch fn

9ea8f85

Priya2698 force-pushed the pm/embedding branch from a4a8e33 to 9ea8f85 Compare January 16, 2025 03:05

Priya2698 added 3 commits January 15, 2025 20:59

lintrunner

08c2f1f

fix lintunner

6698535

lint

af25ce0

Priya2698 requested review from jacobhinkle and protonu January 16, 2025 21:13

protonu reviewed Jan 17, 2025

View reviewed changes

csrc/ops/composite.cpp Show resolved Hide resolved

review

14cfd6a

Priya2698 requested a review from protonu January 22, 2025 04:17

lintrunner

f50fe0c

Priya2698 requested a review from jjsjann123 January 22, 2025 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`EmbeddingFwdOp` node with same functionality as `F.embedding` #3649

`EmbeddingFwdOp` node with same functionality as `F.embedding` #3649

Priya2698 commented Dec 26, 2024 •

edited

Loading

github-actions bot commented Jan 16, 2025 •

edited

Loading

Priya2698 commented Jan 16, 2025

protonu Jan 17, 2025 •

edited

Loading

Priya2698 Jan 22, 2025

protonu Jan 17, 2025

Priya2698 Jan 18, 2025

Priya2698 commented Jan 22, 2025


		constexpr int64_t n = 5, s = 2;

		TEST_F(EmbeddingTest, EmbeddingFwdNode) {

EmbeddingFwdOp node with same functionality as F.embedding #3649

Are you sure you want to change the base?

EmbeddingFwdOp node with same functionality as F.embedding #3649

Conversation

Priya2698 commented Dec 26, 2024 • edited Loading

github-actions bot commented Jan 16, 2025 • edited Loading

PR Reviewer Guide 🔍

(Review updated until commit f50fe0c)

Priya2698 commented Jan 16, 2025

protonu Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Priya2698 Jan 22, 2025

Choose a reason for hiding this comment

protonu Jan 17, 2025

Choose a reason for hiding this comment

Priya2698 Jan 18, 2025

Choose a reason for hiding this comment

Priya2698 commented Jan 22, 2025

`EmbeddingFwdOp` node with same functionality as `F.embedding` #3649

`EmbeddingFwdOp` node with same functionality as `F.embedding` #3649

Priya2698 commented Dec 26, 2024 •

edited

Loading

github-actions bot commented Jan 16, 2025 •

edited

Loading

(Review updated until commit `f50fe0c`)

protonu Jan 17, 2025 •

edited

Loading