-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc-gen: migrate scalar functions (encoding & regex) documentation #13919
base: main
Are you sure you want to change the base?
doc-gen: migrate scalar functions (encoding & regex) documentation #13919
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Chen-Yuan-Lai 🎉
I noticed the CI is failing on this PR: https://github.com/apache/datafusion/actions/runs/12517861740/job/34921789845?pr=13919
It looks like some of the content is getting lost - when I ran ./dev/update_function_docs.sh
locally, the results seem to show the regexp functions having lost their documentation 🤔
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git diff
diff --git a/docs/source/user-guide/sql/scalar_functions.md b/docs/source/user-guide/sql/scalar_functions.md
index be4f5e56b..f9585ce12 100644
--- a/docs/source/user-guide/sql/scalar_functions.md
+++ b/docs/source/user-guide/sql/scalar_functions.md
@@ -1729,12 +1729,12 @@ uuid()
Decode binary data from textual representation in string.
-decode(expression, format)
+decode(e xpression, format)
#### Arguments
-- **expression**: Expression containing encoded string data
+- **expression**: Expression containing string or binary data
- **format**: Same arguments as [encode](#encode)
**Related functions**:
@@ -1758,167 +1758,6 @@ encode(expression, format)
- [decode](#decode)
-## Regular Expression Functions
-
-Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
-regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
-(minus support for several features including look-around and backreferences).
-The following regular expression functions are supported:
-
-- [regexp_count](#regexp_count)
-- [regexp_like](#regexp_like)
-- [regexp_match](#regexp_match)
-- [regexp_replace](#regexp_replace)
....
#[user_doc( | ||
doc_section(label = "Binary String Functions"), | ||
description = "Decode binary data from textual representation in string.", | ||
syntax_example = "decode(e xpression, format)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax_example = "decode(e xpression, format)", | |
syntax_example = "decode(expression, format)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb thanks for the correction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alamb for the correction, I have fixed the typo
I think the PR is just a part of #13671. Modifying the |
Sure! I'll modfy all the other PRs. Thanks @goldmedal |
- [regexp_like](#regexp_like) | ||
- [regexp_match](#regexp_match) | ||
- [regexp_replace](#regexp_replace) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be the case the entire doc for 4 functions is vanished, checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have no idea why the documentation of regular expression functions was lost. Thanks @comphead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason being the document printer checks first the doc section
if its doesn't match with predefined enum the printer skips such function. The reason why it skips we need add a description to doc_section
to be as
doc_section(
label = "Regular Expression Functions",
description = r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
),
The description doesn't match as in attribute the description
is not set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file a ticket to make user_doc
to work with predefined consts.
Currently the doc_section
attribute must match fully the predefined DocSection consts, for example
pub const DOC_SECTION_REGEX: DocSection = DocSection {
include: true,
label: "Regular Expression Functions",
description: Some(
r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
),
};
In case of doc_section
attribute contains any mismatches such function will be silently ignored. I believe we can make doc macros more smart like:
- the doc_section will be just a string
- using the string find correspondent const from
scalar_doc_sections.doc_sections()
- in
datafusion/macros/src/user_doc.rs
when constructing the builder use the const instead of buildingDocSection
manually
Is this PR ready to go? Or are we waiting for something else to finisih it up? |
Its not ready yet, it can be fixed by #14001 (preferrable) or alternatively I can do manual correction. I'm sending this to draft for now |
Which issue does this PR close?
Part of #13671 .
Rationale for this change
What changes are included in this PR?
As discussed in #13671, this PR will migrate the builtin binary string and regular expression functions documentation that currently support migration.
Are these changes tested?
Are there any user-facing changes?