-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enable CSP with nonce for Helix 5 #773
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #773 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 45 46 +1
Lines 3651 3874 +223
==========================================
+ Hits 3651 3874 +223 ☔ View full report in Codecov by Sentry. |
Opening this PR in a WIP state, while I still work through the remaining TODOs to collect feedback and iterate in case there are changes needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general.
- not sure about the extra parsing for all static html.
- remove the padding from the base64 nonce
src/steps/render-code.js
Outdated
checkResponseBodyForAEMNonce(res) && checkResponseBodyForMetaBasedCSP(res)) | ||
) | ||
) { | ||
res.document = await unified() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if we really should rewrite the provided HTML. and if so, with unified which might alter the html. maybe using a very simple text based parser that only understands minimal HTML and repaces the nonce
tokens would less intrusive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used unified because that what was used in processing the head.html
, which allowed me to keep the same codebase for processing both the document based and the static html and I like how it works with selectors.
Indeed, it does alter the resulting HTML, it makes it canonical (e.g. from the point of view of spaces, indentation, <SCRIPT>
-> <script>
, etc.).
very simple text based parser that only understands minimal HTML and repaces the nonce tokens would less intrusive.
Would you have any in mind that I could try out? Intuitively, I would expect that any parser when serialising back to canonicalise, since otherwise it would be very hard to store all the nuances of the original HTML.
If we want to keep the customer's HTML untouched except for the nonce generation, we could try a regex approach.
I usually avoid regexes for this kind of processing, because I'm not good at them, they become hard to maintain troubleshoot and I've seen Kodiak complain about ReDos, which means they could be slow for certain files.
Did a quick try with
res.body.replace(/(<script\b[^>]*\bnonce=")aem(")/ig, `$1${nonce}$2`)
.replace(/(<style\b[^>]*\bnonce=")aem(")/ig, `$1${nonce}$2`)
.replace(/(<link\b[^>]*\bnonce=")aem(")/ig, `$1${nonce}$2`)
.replace(/(<meta\b[^>]*'nonce-)aem(')/ig, `$1${nonce}$2`)
.replace(/(<meta\b[^>]*'nonce-)aem(')/ig, `$1${nonce}$2`) //can appear twice
I can't speak for all customers, but personally, as a developer, I think I would like that if I drop some messy HTML in github that I copied from somewhere I get a clean one when looking through the delivery service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tripodsan WDYT of the latest approach from fdb2aeb?
I found a nice parser from parse5
which allows working with structured information for taking decisions, while still making it possible to make the final transformations directly on the raw html string, so we preserve (almost?) everything that the customer has (indentation, spacing, casing, etc.)
Additionally, if I understand correctly, the parse5
ecosystem sits at the lower level of rehype
(not sure if this specific class that I used), which hopefully means that there should be no compatibility issues when running this JS in cloudflare.
Here is an example where it preserves everything and doesn't add anything extra: https://github.com/adobe/helix-html-pipeline/pull/773/files#diff-aced0b80c69cd5d57663548aa6ff82a7ab4f3ce37078a65eecc97ecc0c2cf4ba
Can now be tried with the following hlx-pipeline-version
in helix 4.
- HTML from document: https://main--anvil--adobe-rnd.hlx.page/?hlx-pipeline-version=9.ci12807394100
- Static HTML: https://main--anvil--adobe-rnd.hlx.page/statichtml.html?hlx-pipeline-version=9.ci12807394100
- 404 Handler: https://main--anvil--adobe-rnd.hlx.page/notfound?hlx-pipeline-version=9.ci12807394100
This PR will trigger a minor release when merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use parse5
1. Description
'nonce-aem'
.Meta:
Header
nonce="aem"
on:head.html
Example:
move-as-header
attribute for the meta based CSP, where the Helix Pipeline will that tag as a headerNote: if this feature is premature, I can remove it and file it as a separate PR
Example:
aem
part of thenonce-aem
CSP and on each trusted script where it findsnonce="aem"
2. Implementation choice
After some back and forth between:
'nonce'
vs'nonce-aem'
as a keyword in the CSP'nonce="aem"
or not to the scripts, to make it as friction-less and transparent as possible for devs.I chose for the moment
'nonce-aem'
and needing to add'nonce="aem"
to the scripts, because in this implementation if there's any reason to deactivate/remove or if the hlx pipeline fails to render the nonce, customers that have this enabled will still have a valid page that is rendered by the browser.In the other variants, if for any reason
'nonce'
is returned the browser will show just a white page.3. Expected level of protection with a nonce cached on the CDN
1. Protected even if the nonce is cached
✅
.innerHTML
for XSS✅
.outerHTML
for XSS✅
.insertAdjecentHTML
for XSS✅
.setHTMLUnsafe
for XSS✅
eval
/setTimeout
/setInterval
/ Function for XSS✅
javascript:
protocol in the href/src attributes✅
location.href
/location.assign()
/location.replace
for XSS✅ attribute event handlers (
onclick
,onblur
,onload
,onerror
etc.) for XSS✅ stored/reflected XSS
2. not protected, because the nonce is cached
❌
document.write
/document.writeln
-> because you could get the nonce and use it in the exploit3.not protected for other reasons
❌
document.createRange().createContextualFragment
-> whether the nonce is cached/known is irrelevant, scripts are always executed withstrict-dynamic
. (needs ticket in https://github.com/w3c/webappsec-csp - awaiting OSS approval)❌
src
attribute of a<script>
tag created bydocument.createElement('script')
/ text content of a<script>
tag created bydocument.createElement('script')
/import
-> permitted bystrict-dynamic
4. not protected by CSPs in general
Script-less attacks
❌ HTML injection
❌ DOM clobbering
5. unknown status
❓
element.style
orcssText
-> still researching, can’t find a working exploit even without CSP4. Still TODO
5.x
branch.hlx-pipeline-version
)