-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSoup HTML parser in separate module #391
Conversation
… from maven central.
…for development of new features and support of modern standards.
Hey Andreas, thanks for the PR. I appreciate the effort that went into it. I'm afraid it's kind of an example of "hunting mice with an elephant gun" though. It would be less invasive to add a service interface to allow a user to swap out the DOM parser implementation used by the Swing-based mini-browser (either auto-configured by the presence of the module or explicitly swapped out through configuration). I think I may have suggested this before. Supporting additional CSS properties is an almost entirely orthogonal problem to the DOM parser in use. This could be done while using an XML, an XHTML, or HTML5 parser to create the DOM. We do have some experience with copy-n-pasted modules. The old flying-saucer-pdf-itext5 module was effectively a clone of flying-saucer-pdf with package changes and minor API updates. To put it bluntly, it was a disaster. It had already bitrotted rather badly by the time it was deleted as most contributed fixes only touched flying-saucer-pdf. I'm quite happy that Andrei had the courage to delete it. It's awesome that you'd like to start experimenting with supporting more CSS properties and adding JavaScript. It is a hugely ambitious task. I'd suggest starting that effort in a separate fork to see how it goes. |
@andreasrosdal @pbrant In fact, we already have an example showing how to use JSoup to parse HTML: But yes, we could improve it even more by service loader mechanism... |
No description provided.