Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple impl to try to use https://github.com/HtmlUnit/htmlunit-neko SAX parser (see #282) #333

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rbri
Copy link

@rbri rbri commented Jun 2, 2024

There seems to be no dynamic way to add another try, did this simple hack.

Hope someone with more knowledge about this lib comes up with some better ideas....

@pbrant
Copy link
Member

pbrant commented Jun 10, 2024

I'm afraid it's pretty much a non-starter. There is a very high probability that it would break a large number of users without warning.

An HTML parser parsing an XML document won't always create the same DOM as an XML parser parsing an XML document.

Making it easy to swap out the XMLReader used globally sounds like a good idea though.

@rbri
Copy link
Author

rbri commented Jun 10, 2024

@pbrant my guess is, the s saucer is about parsing xhtml and not about arbitrary xml. Maybe you can provide some samples that helps me to understand your point.

@rbri rbri closed this Jun 10, 2024
@rbri rbri reopened this Jun 10, 2024
@pbrant
Copy link
Member

pbrant commented Jun 10, 2024

@rbri That's not quite accurate. I'd describe Flying Saucer as a W3C DOM renderer that, by default, parses input as XML (not XHTML).

For an example of how the parsing rules differ consider this HTML5/XHTML document which is also valid XML:

<html>
  <body>
    <p>
      one
      <div>two</div>
      three
    </p>
  </body>
</html>

An HTML5/XHTML parser will produce the DOM equivalent of the following (taken from DevTools):

<html><head></head><body>
    <p>
      one
      </p><div>two</div>
      three
    <p></p>
  

</body></html>

These two DOMs won't render the same in Flying Saucer even with the default stylesheet and since their internal structure differs, user stylesheets might also match differently.

@pbrant
Copy link
Member

pbrant commented Jun 10, 2024

Note these two forks, which have taken steps in supporting html by default. I think FS should move in the same direction.

A fork starting with zero users has a lot more flexibility than a project with hundreds of thousands of downloads a month.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants