Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Enable CSP with nonce for Helix 5 #773

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

andreituicu
Copy link
Collaborator

@andreituicu andreituicu commented Dec 16, 2024

1. Description

  1. Allow customers to define a nonce based CSP using either a meta tag or header, with cached nonce, with the keyword 'nonce-aem'.

Meta:

    <meta http-equiv="content-security-policy" content="script-src 'nonce-aem' 'strict-dynamic';  base-uri 'self'; objectsrc 'none';">

Header

content-security-policy: script-src 'nonce-aem' 'strict-dynamic'; base-uri 'self'; object-src 'none';
  1. Developers need to also add nonce="aem" on:
  • each script from head.html
  • each script from static html files in github

Example:

<script nonce="aem" src="/scripts/aem.js" type="module"></script>
<script nonce="aem" src="/scripts/scripts.js" type="module"></script>
  1. Add support for a move-as-header attribute for the meta based CSP, where the Helix Pipeline will that tag as a header

Note: if this feature is premature, I can remove it and file it as a separate PR

Example:

<meta http-equiv="content-security-policy" content="script-src 'nonce-aem' 'strict-dynamic'; base-uri 'self'; object-src 'none';" move-as-header="true">
  1. The pipeline will generate a 16byte base64 encoded nonce, place it the aem part of the nonce-aem CSP and on each trusted script where it finds nonce="aem"

2. Implementation choice

After some back and forth between:

  • having 'nonce' vs 'nonce-aem' as a keyword in the CSP
  • requiring developers to add 'nonce="aem" or not to the scripts, to make it as friction-less and transparent as possible for devs.

I chose for the moment 'nonce-aem' and needing to add 'nonce="aem" to the scripts, because in this implementation if there's any reason to deactivate/remove or if the hlx pipeline fails to render the nonce, customers that have this enabled will still have a valid page that is rendered by the browser.
In the other variants, if for any reason 'nonce' is returned the browser will show just a white page.

3. Expected level of protection with a nonce cached on the CDN

1. Protected even if the nonce is cached

.innerHTML for XSS
.outerHTML for XSS
.insertAdjecentHTML for XSS
.setHTMLUnsafe for XSS
eval / setTimeout / setInterval / Function for XSS
javascript: protocol in the href/src attributes
location.href / location.assign() / location.replace for XSS
✅ attribute event handlers (onclick, onblur, onload, onerror etc.) for XSS
✅ stored/reflected XSS

  • as long as there is no server side manipulation of the HTML in the customer's CDN where user input is involved
  • as long as the nonce is regenerated for requests that hit the origin
  • for meta tag: as long as the XSS doesn't happen before the meta tag

2. not protected, because the nonce is cached

document.write / document.writeln -> because you could get the nonce and use it in the exploit

3.not protected for other reasons

document.createRange().createContextualFragment -> whether the nonce is cached/known is irrelevant, scripts are always executed with strict-dynamic. (needs ticket in https://github.com/w3c/webappsec-csp - awaiting OSS approval)
src attribute of a <script> tag created by document.createElement('script') / text content of a <script> tag created by document.createElement('script') / import -> permitted by strict-dynamic

4. not protected by CSPs in general
Script-less attacks
❌ HTML injection
❌ DOM clobbering

5. unknown status
element.style or cssText -> still researching, can’t find a working exploit even without CSP

4. Still TODO

  • Backport to helix 4 in the 5.x branch.
  • Validate that pages render correctly E2E in the helix 4 branch (helix5 doesn't support - hlx-pipeline-version)
  • Add support in the aem cli for nonce on the live-reload script

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (d239b58) to head (fdb2aeb).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #773    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           45        46     +1     
  Lines         3651      3874   +223     
==========================================
+ Hits          3651      3874   +223     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andreituicu
Copy link
Collaborator Author

andreituicu commented Dec 16, 2024

Opening this PR in a WIP state, while I still work through the remaining TODOs to collect feedback and iterate in case there are changes needed.
Please let me know your thoughts! 🙂

Copy link
Contributor

@tripodsan tripodsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.

  • not sure about the extra parsing for all static html.
  • remove the padding from the base64 nonce

checkResponseBodyForAEMNonce(res) && checkResponseBodyForMetaBasedCSP(res))
)
) {
res.document = await unified()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we really should rewrite the provided HTML. and if so, with unified which might alter the html. maybe using a very simple text based parser that only understands minimal HTML and repaces the nonce tokens would less intrusive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used unified because that what was used in processing the head.html, which allowed me to keep the same codebase for processing both the document based and the static html and I like how it works with selectors.

Indeed, it does alter the resulting HTML, it makes it canonical (e.g. from the point of view of spaces, indentation, <SCRIPT> -> <script>, etc.).

very simple text based parser that only understands minimal HTML and repaces the nonce tokens would less intrusive.

Would you have any in mind that I could try out? Intuitively, I would expect that any parser when serialising back to canonicalise, since otherwise it would be very hard to store all the nuances of the original HTML.

If we want to keep the customer's HTML untouched except for the nonce generation, we could try a regex approach.

I usually avoid regexes for this kind of processing, because I'm not good at them, they become hard to maintain troubleshoot and I've seen Kodiak complain about ReDos, which means they could be slow for certain files.

Did a quick try with

res.body.replace(/(<script\b[^>]*\bnonce=")aem(")/ig, `$1${nonce}$2`)
      .replace(/(<style\b[^>]*\bnonce=")aem(")/ig, `$1${nonce}$2`)
      .replace(/(<link\b[^>]*\bnonce=")aem(")/ig, `$1${nonce}$2`)
      .replace(/(<meta\b[^>]*'nonce-)aem(')/ig, `$1${nonce}$2`)
      .replace(/(<meta\b[^>]*'nonce-)aem(')/ig, `$1${nonce}$2`) //can appear twice

I can't speak for all customers, but personally, as a developer, I think I would like that if I drop some messy HTML in github that I copied from somewhere I get a clean one when looking through the delivery service.

Copy link
Collaborator Author

@andreituicu andreituicu Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tripodsan WDYT of the latest approach from fdb2aeb?

I found a nice parser from parse5 which allows working with structured information for taking decisions, while still making it possible to make the final transformations directly on the raw html string, so we preserve (almost?) everything that the customer has (indentation, spacing, casing, etc.)

Additionally, if I understand correctly, the parse5 ecosystem sits at the lower level of rehype (not sure if this specific class that I used), which hopefully means that there should be no compatibility issues when running this JS in cloudflare.

Here is an example where it preserves everything and doesn't add anything extra: https://github.com/adobe/helix-html-pipeline/pull/773/files#diff-aced0b80c69cd5d57663548aa6ff82a7ab4f3ce37078a65eecc97ecc0c2cf4ba

Can now be tried with the following hlx-pipeline-version in helix 4.

  1. HTML from document: https://main--anvil--adobe-rnd.hlx.page/?hlx-pipeline-version=9.ci12807394100
  2. Static HTML: https://main--anvil--adobe-rnd.hlx.page/statichtml.html?hlx-pipeline-version=9.ci12807394100
  3. 404 Handler: https://main--anvil--adobe-rnd.hlx.page/notfound?hlx-pipeline-version=9.ci12807394100

test/rendering.test.js Outdated Show resolved Hide resolved
test/rendering.test.js Show resolved Hide resolved
src/steps/csp.js Outdated Show resolved Hide resolved
src/steps/csp.js Show resolved Hide resolved
src/steps/csp.js Outdated Show resolved Hide resolved
Copy link

This PR will trigger a minor release when merged.

Copy link
Contributor

@tripodsan tripodsan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use parse5

package.json Show resolved Hide resolved
@tripodsan tripodsan self-requested a review January 20, 2025 11:08
@andreituicu andreituicu changed the title WIP: feat: Enable CSP with nonce for Helix 5 feat: Enable CSP with nonce for Helix 5 Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants