====== HTML Encoding Differentials ======
Lessons learned from **SEKAI CTF 2024**: [WEB] htmlsandbox
We want to create a parsing differential when the HTML is loaded incrementally (streamed) vs loaded inline (all at once)
The CSP tag should be present in the parsed HTML when it's loaded inline, but NOT present when this HTML is loaded incrementally
We can take advantage of the fact that the sniffer will not change the encoding of an already parsed HTML chunk
We can use ISO-2022-JP charset
import random
import string
chars = string.digits + string.ascii_uppercase + string.ascii_lowercase
def next():
return ''.join([random.choice(chars) for _ in range(40000)]).encode()
html = b''
html += b''
html += b' '
html += b''
html += b''
html += b''
with open('payload.html', 'wb') as f:
f.write(html)
https://github.com/project-sekai-ctf/sekaictf-2024/blob/main/web/htmlsandbox/solution/gen.py
https://www.sonarsource.com/blog/encoding-differentials-why-charset-matters/