Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting the Advisory locks, XML Functions and System columns on the DDLs in the unsupported features in assess/analyze #2061

Merged
merged 13 commits into from
Dec 13, 2024

Conversation

priyanshi-yb
Copy link
Contributor

@priyanshi-yb priyanshi-yb commented Dec 10, 2024

Describe the changes in this pull request

  • Added three more features to Unsupported features in the reports Advisory locks, XML functions and System columns in the DDL.
  • Moved the MVIEW/VIEW issues from Unsupported PL/pgSQL objects to Unsupported Features.
  • Minor format change of constraint reporting for PK/Unique on complex datatypes issue
  • Added unit tests for GetDDLIssues() in the parser_issue_detector_test.go

Fixes #2025

Sample Assessment report -
Screenshot 2024-12-11 at 12 21 39 PM

Sample Analyze report -
Screenshot 2024-12-11 at 12 23 24 PM

Screenshot 2024-12-11 at 12 23 49 PM

Describe if there are any user-facing changes

Added new features section under Unsupported Features

How was this pull request tested?

Existing tests were enough.
Added unit tests

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB No
Name registry json No
Data File Descriptor Json No
Export Snapshot Status Json No
Import Data State No
Export Status Json No
Data .sql files of tables No
Export and import data queue No
Schema Dump No
AssessmentDB No
Sizing DB No
Migration Assessment Report Json No
Callhome Json No
YugabyteD Tables No
TargetDB Metadata Tables No

@priyanshi-yb priyanshi-yb marked this pull request as ready for review December 11, 2024 10:54
@priyanshi-yb priyanshi-yb force-pushed the priyanshi/report-xml-advisory-issues-on-ddl branch from 87dbbac to 43f1853 Compare December 11, 2024 12:19
@@ -50,7 +50,7 @@ func (d *TableIssueDetector) DetectIssues(obj queryparser.DDLObject) ([]QueryIss
// Check for generated columns
if len(table.GeneratedColumns) > 0 {
issues = append(issues, NewGeneratedColumnsIssue(
TABLE_OBJECT_TYPE,
obj.GetObjectType(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a TableIssueDetector will always return issues for Table object types, right?
Is this just a change to not use a constant and use the function, or is there a certain bug/case because of which you had to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a change to not use a constant and use the function,

Yes right, that way I was able to directly use ddlObj.GetObjectType() to get the respective type.

return issues, nil
}

genericIssues, err := p.genericIssues(query)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here explaining why we're making this call for DDLs. Give an example.

@@ -289,6 +269,22 @@ func (p *ParserIssueDetector) getDDLIssues(query string) ([]QueryIssue, error) {
issues[i].SqlStatement = query
}
}

if _, ok := ddlObj.(*queryparser.Object); ok { // In case the DDL doesn't have any processor skip checking generic issues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a ddlobject is an interface that already adheres to GetObjectName and GetSchemaName. So why do we need to additionally check if you can type cast it to a queryparser.Object. As long as ddlObject is of type DDLObject interface, you should be good, no?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there were reasons to add this check here-

  1. To handle the scenario where the stmt coming to getDDLIssues is DML stmt which can be in case of PLPGQueries where we call internal getAllIssues(), so with that generic function is getting called for the select stmt twice giving the duplicate issues. - This can be solved by just unique issues by some Key or anything
  2. To handle the scenario where the ObjectType is not known in this code path (the case where DDLObject is not implemented for some DDL types yet) so it might return. the objectType - OBJECT (NoOpProcessor). For such case, the analyze code to convertIssue toAnalyzeIssue() where we set invalidCount of objectname based on ObjectTYpe will fail with nil pointer as it doesn't know about it. - This can also be solved by handling it properly in that function.

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, changed this condition with isDDL in the starting of getDDLIssues()

}

// TODO: in future when we will DDL issues detection here we need `GetDDLIssues`
func (p *ParserIssueDetector) getDMLIssues(query string) ([]QueryIssue, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would recommend still keeping the getDMLIssues and GetDMLIssues both. The idea was to not touch the GetDMLIssues often and keep the getIssuesNotFixedInTargetDbVersion in it so that it does not get missed.

If we have just one function, there is a high possibility that we accidentally just write some return from the middle of that function which will lead to issues not being filtered out..


assert.Equal(t, len(expectedIssues), len(issues), "Mismatch in issue count for statement: %s", stmt)
for _, expectedIssue := range expectedIssues {
found := slices.ContainsFunc(issues, func(QueryIssue QueryIssue) bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, see if we can use cmp.Equal for this.

@@ -58,22 +39,6 @@ func IsMviewObject(parseTree *pg_query.ParseResult) bool {
return isCreateAsStmt && createAsNode.CreateTableAsStmt.Objtype == pg_query.ObjectType_OBJECT_MATVIEW
}

func GetSelectStmtQueryFromViewOrMView(parseTree *pg_query.ParseResult) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so we don't really need these anymore because parser can directly parse MVIEW/VIEW, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah

@makalaaneesh
Copy link
Collaborator

@priyanshi-yb It would be good to also include details for those issues in the report.

  • advisory locks: which functions were used?
  • XML functions - which functions?
  • system columns - which system columns.

I believe these details are not available right now, am i right?

@priyanshi-yb
Copy link
Contributor Author

priyanshi-yb commented Dec 13, 2024

I believe these details are not available right now, am i right?

Yes, currently it is not available we might have to enhance the traversal logic to return all the variety of unsupported functions/columns.. we found in the statement

@@ -210,6 +219,11 @@ func TestAllIssues(t *testing.T) {
},
}

//Should modify it in similar way we do it actual code as the particular DDL issue in plpgsql can have different Details map on the basis of objectType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. What was wrong with the previous approach where you define issues directly with "FUNCTION", "list_high_earners" ?

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example for this cluster on -

func NewClusterONIssue(objectType string, objectName string, sqlStatement string) QueryIssue {
	details := map[string]interface{}{}
	//for ALTER AND INDEX both  same struct now how to differentiate which one to not
	if objectType == "TABLE" {
		details["INCREASE_INVALID_COUNT"] = false
	}
	return newQueryIssue(clusterOnIssue, objectType, objectName, sqlStatement, details)
}

earlier I was using the New function for getting queryissue -

NewClusterONIssue("FUNCTION", "list_high_earners", "ALTER TABLE employees CLUSTER ON idx;")

giving this QueryIssue, with details as empty as -

{{CLUSTER_ON ALTER TABLE CLUSTER not supported yet.  Remove it from the exported schema. https://github.com/YugaByte/yugabyte-db/issues/1124 https://docs.yugabyte.com/preview/yugabyte-voyager/known-issues/postgresql/#unsupported-alter-table-ddl-variants-in-source-schema map[]} FUNCTION list_high_earners ALTER TABLE employees CLUSTER ON idx; map[]}

But the actual query issue will look like this where details will be populated based on TABLE object type-

{{CLUSTER_ON ALTER TABLE CLUSTER not supported yet.  Remove it from the exported schema. https://github.com/YugaByte/yugabyte-db/issues/1124 https://docs.yugabyte.com/preview/yugabyte-voyager/known-issues/postgresql/#unsupported-alter-table-ddl-variants-in-source-schema map[]} FUNCTION list_high_earners ALTER TABLE employees CLUSTER ON idx; map[INCREASE_INVALID_COUNT:false]}

In the code to detect these issues in PLPGSQL, we modify the issues later (in getAllPLPGSQLIssues()) to change the objectType and objectName to the actual object name and type of the PLPGSQL object.
So now with cmp.Equal, it also compares details map, I had to change the way I generate the expected issues to be able to check properly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orthogonal point:
INCREASE_INVALID_COUNT should not be in queryissue layer.
It is an analyze-schema detail, we should have some logic of figuring it out at that layer.

Copy link
Contributor Author

@priyanshi-yb priyanshi-yb Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. This was just a hack while refactoring. I already removed it in this PR with handling all cases for invalid count #2073

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice, will check!

Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@priyanshi-yb priyanshi-yb merged commit 13a678b into main Dec 13, 2024
42 checks passed
@priyanshi-yb priyanshi-yb deleted the priyanshi/report-xml-advisory-issues-on-ddl branch December 13, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report report advisory locks, xml functions and system column issues in DDL
2 participants