Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Invalid column names in get_historical_features when there are field mappings on join keys #4886

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

aloysius-lim
Copy link
Contributor

@aloysius-lim aloysius-lim commented Jan 3, 2025

What this PR does / why we need it:

When there are field mappings on an entity's join keys, get_historical_features() fails because it does not use the original column name in the data source.

This is needed when the source data cannot be modified, and a field mapping is used to map the source data's column to the join key.

Example:

from feast import Entity, Field, FeatureStore, FeatureView
from feast.types import Float32, String
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource

# Initialize Feature Store.
store = FeatureStore(...)

# "driver_id" is used in the Feature Store.
driver = Entity(name="driver", join_keys=["driver_id"])

# Using SparkSource as an example, but this applies to other sources.
# Source data contains a primary key called "id". This is mapped to the join key "driver_id".
driver_stats_src = SparkSource(
    name="driver_stats",
    field_mapping={"id": "driver_id"},
    path=...,
    file_format=...,
)
driver_stats_fv = FeatureView(
    name="driver_stats",
    source= driver_stats_src,
    entities=[driver],
    schema=[
        # join key must be specified in the schema, else it is not included in driver_stats_fv.entity_columns
        Field(name="driver_id", dtype=String),
        Field(name="stat1", dtype=Float32),
        Field(name="stat2", dtype=Float32),
    ]
)

# Get historical features
store.get_historical_features(
    entity_df=...,
    features=[
        "driver_stats:stat1",
        "driver_stats:stat2",
    ]

Without this fix (Spark example):

pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `driver_id` cannot be resolved. Did you mean one of the following? [`id`, `stat1`, `stat2`]

Underlying Spark query:

driver_stats__subquery AS (
    SELECT
        event_timestamp as event_timestamp,
        created as created_timestamp,

        -- Here is the problem.
        driver_id AS driver_id,

        stat1 as stat1, stat2 as stat2
    FROM `feast_entity_df_677a1a6fd13443c6b0e8ccc059b25f01`
    WHERE event_timestamp <= '2025-01-05T14:00:00'
)

With this fix, the field mapping is respected and the query becomes:

driver_stats__subquery AS (
    SELECT
        event_timestamp as event_timestamp,
        created as created_timestamp,

        id AS driver_id,
        
        stat1 as stat1, stat2 as stat2
    FROM `feast_entity_df_677a1a6fd13443c6b0e8ccc059b25f01`
    WHERE event_timestamp <= '2025-01-05T14:00:00'
)

Which issue(s) this PR fixes:

#4889

Misc

This only works if the entity join key is specified in the schema of the FeatureView (see above).

@aloysius-lim aloysius-lim requested review from a team as code owners January 3, 2025 07:24
@aloysius-lim aloysius-lim force-pushed the fix-get-historical-features-join-key-field-mapping branch from 9c6ca6e to b0a70ac Compare January 3, 2025 07:26
@dmartinol
Copy link
Contributor

Thanks for your contribution!
I’m wondering why we don’t have an issue to track this error: if you can’t find an existing issue, could you please open one before?

@aloysius-lim
Copy link
Contributor Author

Thanks for your contribution! I’m wondering why we don’t have an issue to track this error: if you can’t find an existing issue, could you please open one before?

I added issue #4889 for this.

@franciscojavierarceo
Copy link
Member

Do you mind rebasing? I fixed an issue so after you rebase it should go away.

@aloysius-lim aloysius-lim force-pushed the fix-get-historical-features-join-key-field-mapping branch from b0a70ac to 84c355f Compare January 3, 2025 23:16
@aloysius-lim
Copy link
Contributor Author

I've rebased, but the tests are still failing due to other errors with Keycloak.

@franciscojavierarceo
Copy link
Member

Can you rebase again? Sorry I reverted my most recent change. 😅

Signed-off-by: Aloysius Lim <[email protected]>
Signed-off-by: Aloysius Lim <[email protected]>
@aloysius-lim aloysius-lim force-pushed the fix-get-historical-features-join-key-field-mapping branch from 84c355f to 9b60d43 Compare January 10, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants