Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic data models with JavaScript #9007

Open
tiger-feng opened this issue Nov 29, 2024 · 9 comments
Open

Dynamic data models with JavaScript #9007

tiger-feng opened this issue Nov 29, 2024 · 9 comments
Assignees
Labels
docs Issues that require a documentation improvement question The issue is a question. Please use Stack Overflow for questions.

Comments

@tiger-feng
Copy link

Describe the bug
Each time the data model is compiled, asyncModule is called three times

Expected behavior
Each time the data model is compiled, asyncModule is called three times. asyncModule uses the asynchronous API to obtain the schema model, which seriously affects the API performance.

Minimally reproducible Cube Schema
In the following example, by introducing a logger to print information to the console, it is found that the string 'asyncModule getCubeSchema' is printed three times, which means that the asynchronous interface is requested three times.

const {logger} = require("../src/schemaUtils")
asyncModule(async () => {
  const  { securityContext: { systemCode } } =  COMPILE_CONTEXT
  const startTime = Date.now()
  const cubeList = await getCubeSchema({ systemCode })
  logger('asyncModule getCubeSchema:', Date.now() - startTime)
  for(let i = 0; i < cubeList.length; i++ ){
    const { cubeName, ...item } = cubeList[i]
    const dimensions = transformField(item.dimensions)
    const measures = transformField(item.measures)
    cube(cubeName, {
      ...item,
      sql: item.sql,
      dimensions,
      measures
    })
  }
})

Version:
[e.g. 1.1.7]

@tiger-feng
Copy link
Author

Another question is, in asyncModule, I get the requestId of each request through COMPILE_CONTEXT.requestId. I found that in the same tenant, the requestId obtained after each request is unchanged. I think it should be different every time.

@igorlukanin igorlukanin self-assigned this Dec 4, 2024
@igorlukanin igorlukanin added question The issue is a question. Please use Stack Overflow for questions. docs Issues that require a documentation improvement labels Dec 4, 2024
@igorlukanin
Copy link
Member

igorlukanin commented Dec 4, 2024

which means that the asynchronous interface is requested three times.

Indeed, this is how it is implemented and this is by design. No bug here. The reason why it's happening is that JavaScript-based data models are evaluated in multiple (three) passes so that references to other cubes are resolved correctly.

which seriously affects the API performance.

I can suggest to cache the result of your API call, e.g., put it in a variable. It should help since your API will be called only once then.

I think it should be different every time.

It's hard to say what is exactly happening by that description. Ideally, it would be great to be sure this is impacting the results that you're getting + have clear instructions, with code, on how to reproduce this.

UPDATED: I see that this is related to #9005. Let's continue the conversation in that issue.

@tiger-feng
Copy link
Author

tiger-feng commented Dec 5, 2024

The following is the complete code based on asyncModule mode

const {logger} = require("../src/schemaUtils")
asyncModule(async () => {
  const  { securityContext: { systemCode } requestId } =  COMPILE_CONTEXT
  const startTime = Date.now()
  const cubeList = await getCubeSchema({ systemCode, requestId })
  logger('asyncModule getCubeSchema:', Date.now() - startTime)
  for(let i = 0; i < cubeList.length; i++ ){
    const { cubeName, ...item } = cubeList[i]
    const dimensions = transformField(item.dimensions)
    const measures = transformField(item.measures)
    cube(cubeName, {
      ...item,
      sql: item.sql,
      dimensions,
      measures
    })
  }
})
// schemaUtils.js
class cubeSchema {
     constructor() {
         this.cubeSchema = {};
         this.requestId = null;
     }
     getCubeSchema = async ({ requestId, systemCode}) => {
         if(this.requestId === requestId){
             return this.cubeSchema[systemCode]
         }else {
             try{
                 const data = await queryCubeSchema({ systemCode })
                 this.cubeSchema[systemCode] = data || [];
                 this.requestId = requestId;
             } catch(error){
                  this.requestId = requestId;
                  return []
             } 
         }
     }
}

As you said, I cached the results of the API call in the code, but because a load request will be executed three times, I need to determine whether it is the same request based on requestId in asyncModule. If so, return the cached data. However, according to my observation, in the same tenant, the requestId obtained by COMPILE_CONTEXT in asyncModule for each load request is always the same, which means that when the schemaVersion changes, I still get the cached data.

By looking at the cube source code, I found the problem. In the cubejs-schema-core/src/core/server.ts file, the same tenant shares a CompilerApi instance, which is created when the API request is loaded for the first time. The COMPILE_CONTEXT in asyncModule is actually the compileContext in CompilerApi. Under the same tenant, recompilation is required after the schemaVersion changes, but the CompilerApi instance will not be recreated, so compileContext is still the cached data of the first compilation, including requestId, so COMPILE_CONTEXT is also unchanged, so the requestId obtained in asyncModule is still the cached data of the first compilation. The source code is shown below.

image image image

I don’t know if I have described the problem clearly. I am very much looking forward to your answer.

@tiger-feng
Copy link
Author

tiger-feng commented Dec 5, 2024

UPDATED: This problem is different from #9005. #9005 is a requestId retrieval error that causes the requestId to be undefined in the subsequent program.

@igorlukanin
Copy link
Member

As you said, I cached the results of the API call in the code, but because a load request will be executed three times, I need to determine whether it is the same request based on requestId in asyncModule.

I don't think this is the best way to approach this. How is your schema_version configuration option implemented? I guess you should invalidate the cache there as soon as the version changes instead of trying to track the requestId.

@tiger-feng
Copy link
Author

tiger-feng commented Dec 16, 2024

I don't think this is the best way to approach this. How is your schema_version configuration option implemented? I guess you should invalidate the cache there as soon as the version changes instead of trying to track the requestId.

@igorlukanin
In schema_version, I retrieve the version ID through an asynchronous API and return it. If, as you said, I clear the cache when there is a version change, I need to cache the current return value of schema_version using a variable. However, according to my observation, the schema_version function is executed four times in a single load request, which may also be a problem.

@igorlukanin
Copy link
Member

I clear the cache when there is a version change, I need to cache the current return value of schema_version using a variable

When you observe the version change, can you instantly retrieve the updated version of the data model and cache it?

@tiger-feng
Copy link
Author

@igorlukanin
Do you mean to retrieve the data model in schema_version? It sounds like a good idea, I will give it a try

But I hope you can take a look at the problem that schema_version is executed four times. Is it a normal architecture design of cube or a bug?

@igorlukanin
Copy link
Member

igorlukanin commented Jan 8, 2025

I'll let @paveltiunov take a look. My understanding is that it's not a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues that require a documentation improvement question The issue is a question. Please use Stack Overflow for questions.
Projects
None yet
Development

No branches or pull requests

2 participants