Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc-gen: migrate scalar functions (encoding & regex) documentation #13919

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Chen-Yuan-Lai
Copy link
Contributor

@Chen-Yuan-Lai Chen-Yuan-Lai commented Dec 27, 2024

Which issue does this PR close?

Part of #13671 .

Rationale for this change

What changes are included in this PR?

As discussed in #13671, this PR will migrate the builtin binary string and regular expression functions documentation that currently support migration.

Are these changes tested?

Are there any user-facing changes?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Chen-Yuan-Lai 🎉

I noticed the CI is failing on this PR: https://github.com/apache/datafusion/actions/runs/12517861740/job/34921789845?pr=13919

It looks like some of the content is getting lost - when I ran ./dev/update_function_docs.sh locally, the results seem to show the regexp functions having lost their documentation 🤔

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion2$ git diff
diff --git a/docs/source/user-guide/sql/scalar_functions.md b/docs/source/user-guide/sql/scalar_functions.md
index be4f5e56b..f9585ce12 100644
--- a/docs/source/user-guide/sql/scalar_functions.md
+++ b/docs/source/user-guide/sql/scalar_functions.md
@@ -1729,12 +1729,12 @@ uuid()
 Decode binary data from textual representation in string.

-decode(expression, format)
+decode(e xpression, format)


#### Arguments

-- **expression**: Expression containing encoded string data
+- **expression**: Expression containing string or binary data
- **format**: Same arguments as [encode](#encode)

**Related functions**:
@@ -1758,167 +1758,6 @@ encode(expression, format)

- [decode](#decode)

-## Regular Expression Functions
-
-Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
-regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
-(minus support for several features including look-around and backreferences).
-The following regular expression functions are supported:
-
-- [regexp_count](#regexp_count)
-- [regexp_like](#regexp_like)
-- [regexp_match](#regexp_match)
-- [regexp_replace](#regexp_replace)
....

#[user_doc(
doc_section(label = "Binary String Functions"),
description = "Decode binary data from textual representation in string.",
syntax_example = "decode(e xpression, format)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
syntax_example = "decode(e xpression, format)",
syntax_example = "decode(expression, format)",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thanks for the correction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for the correction, I have fixed the typo

@goldmedal
Copy link
Contributor

I think the PR is just a part of #13671. Modifying the close keyword to part of in the PR description is better.

@Chen-Yuan-Lai
Copy link
Contributor Author

Sure! I'll modfy all the other PRs. Thanks @goldmedal

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 3, 2025
- [regexp_like](#regexp_like)
- [regexp_match](#regexp_match)
- [regexp_replace](#regexp_replace)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be the case the entire doc for 4 functions is vanished, checking

Copy link
Contributor Author

@Chen-Yuan-Lai Chen-Yuan-Lai Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have no idea why the documentation of regular expression functions was lost. Thanks @comphead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason being the document printer checks first the doc section if its doesn't match with predefined enum the printer skips such function. The reason why it skips we need add a description to doc_section to be as

    doc_section(
        label = "Regular Expression Functions",
        description = r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
    ),

The description doesn't match as in attribute the description is not set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a ticket to make user_doc to work with predefined consts.

Currently the doc_section attribute must match fully the predefined DocSection consts, for example

    pub const DOC_SECTION_REGEX: DocSection = DocSection {
        include: true,
        label: "Regular Expression Functions",
        description: Some(
            r#"Apache DataFusion uses a [PCRE-like](https://en.wikibooks.org/wiki/Regular_Expressions/Perl-Compatible_Regular_Expressions)
regular expression [syntax](https://docs.rs/regex/latest/regex/#syntax)
(minus support for several features including look-around and backreferences).
The following regular expression functions are supported:"#,
        ),
    };

In case of doc_section attribute contains any mismatches such function will be silently ignored. I believe we can make doc macros more smart like:

  • the doc_section will be just a string
  • using the string find correspondent const from scalar_doc_sections.doc_sections()
  • in datafusion/macros/src/user_doc.rs when constructing the builder use the const instead of building DocSection manually

@alamb
Copy link
Contributor

alamb commented Jan 8, 2025

Is this PR ready to go? Or are we waiting for something else to finisih it up?

@comphead
Copy link
Contributor

comphead commented Jan 8, 2025

Its not ready yet, it can be fixed by #14001 (preferrable) or alternatively I can do manual correction. I'm sending this to draft for now

@comphead comphead marked this pull request as draft January 8, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants