Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XHTML/LaTeXML/FLOMDoc #9

Open
Jazzpirate opened this issue Feb 27, 2021 · 12 comments
Open

XHTML/LaTeXML/FLOMDoc #9

Jazzpirate opened this issue Feb 27, 2021 · 12 comments

Comments

@Jazzpirate
Copy link
Collaborator

Jazzpirate commented Feb 27, 2021

XHTML/LaTeXML

  • For declarations we can generate <script type="application/(xml | json)"> nodes (for xml: Requires prefixing all inner nodes with xhtml:, but they're not checked against the rnc and stripped away in post processing anyway).
  • For annotations, we probably need at most an annotation type (OMS, OMA etc.) and a URI/ID/name. For OMS, OMV etc. a URI will do, for OMAs. Applications of complex terms need an apply-operator anyway, so the head of an OMA is always a Symbol (?). For more complex attributes, we can add a <script> as the first child that holds complex terms etc.
  • In text mode, we can use <div> or <span> with attributes resource (URI) and property (annotation type).
  • In math mode, the only valid attribute that survives post processing in LaTeXML is class - might even be "semantically valid" in that both annotation type and a URI could even meaningfully be used as CSS classes. Then we can add property and resource or id as (as usual space separated) class attributes. Complex terms (if necessary) can be moved to a <script> after the <math> node and linked to via an id in the classes (inelegant, but functional, I guess).

Examples

Concept property resource
Theory stex:theory URI
OMA stex:oma URI of head symbol notation
OMV stex.omv URI of variable declaration if declared, name otherwise
@Jazzpirate
Copy link
Collaborator Author

Jazzpirate commented Mar 18, 2021

Turns out: LaTeXML is too restrictive when it comes to <script>-nodes and its content. That is because all elements of the node need to have namspace xhtml, which (for example) MathML content doesn't. So we can't use <script> in LaTeXML, if it contains content that is e.g. generated (by LaTeXML) from math-mode-tex (e.g. types, notations,...).

But I realized that might not be a problem: If we use MMT to convert tex to xhtml, then MMT reads the LaTeXML-generated xhtml in anyway (e.g. to introduce more CSS, javascript and remove the LaTeXML-footer) before writing it out again. The final MMT build target would additionally generate relational and content-OMDoc, so we want/need that anyway. And of course MMT can modify the XHTML further.
What this means is: if we don't insist that the XHTML directly generated by LaTeXML needs to be "nice", then we can simply put all content that should ultimately be invisible (types for symdecl, includes, notations etc.) in plain <span> tags with appropriate property, and have MMT replace them by e.g. <scipt> nodes during building.

One nice advantage (which I tested) is that now all LaTeXML-annotations are extremely generic: They're always (arbitrarily nested) <span>s with (at most) property and resource (in text mode, for math mode see above). That means we can reduce the bindings to a single macro \latexml@annotate#1#2#3 that generates <span property="#1" resource="#2">#3</span> (technically three macros: one for textmode, one for math mode and one for environments, but they're technically just convenience to have tex check for math mode rather than latexml) and have tex do everything else.

...what this means is that anyone can freely extend sTeX with new annotations and e.g. corresponding structural features in MMT (I've already started implementing the MMT-XHTML-Import in an extensible way, so support for new annotations can be provided via an MMT extension); which means we're truly extensible without needing to provide new latexml bindings either.

@kohlhase
Copy link
Member

if we don't insist that the XHTML directly generated by LaTeXML needs to be "nice",

I would still maintain, that the XHTML should look nice in the browser (or did you mean the XHTML source?). But that is easy to achieve: if you want something to be invisible in XHTML, then you can just put a style="display:none" to it.

@Jazzpirate
Copy link
Collaborator Author

I would still maintain, that the XHTML should look nice in the browser (or did you mean the XHTML source?).

I mean look nice in the browser, and by "not look nice", I mean things like "declarations and associated terms are visible, in constrast the the pdf".

you can just put a style="display:none" to it.

I can't - nothing but property and resource survive postprocessing.

@kohlhase
Copy link
Member

kohlhase commented Mar 19, 2021

Not even class=stex-invisible? There should be a way of adding that; maybe you need to ask Deyan about both of these. They seem reasonable to add to LaTeXML functionality.

@kohlhase
Copy link
Member

mean look nice in the browser, and by "not look nice", I mean things like "declarations and associated terms are visible, in constrast the the pdf".

Yes, that is exactly what I would like to maintain.

@Jazzpirate
Copy link
Collaborator Author

Not even class=stex-invisible?

That could be possible, but it would necessitate adding css (requiring a command line parameter), and I'm not sure why it matter... Do we expect anyone to ever manually call latexml on an stex file? In that case they would still have to know which CLs they need to pass...

@kohlhase
Copy link
Member

Do we expect anyone to ever manually call latexml on an stex file?

actually yes, that is exactly what I would expect!

@Jazzpirate
Copy link
Collaborator Author

Also: note that "display:none" also requires width/height-fiddling, otherwise there's a blank space the size of the content.

actually yes, that is exactly what I would expect!

Do you also expect them to use whatever command line parameters we come up with? :D

@kohlhase
Copy link
Member

note that "display:none" also requires width/height-fiddling, otherwise there's a blank space the size of the content.

I think this is incorrect.

!

Do you also expect them to use whatever command line parameters we come up with? :D

no, I think they are going to dream up something interesting, including, but not limited to what we do. :-)

@Jazzpirate
Copy link
Collaborator Author

no, I think they are going to dream up something interesting, including, but not limited to what we do. :-)

I have no idea what you're hinting at, but in that case, adding custom CSS classes doesn't help ;)

I found a CSS class "ltx_rdf" in the LATEXML.css though, that seems to have "display:none" as its only property, that seems to work ;)

Then we can reduce bindings to basically four macros: \latexml@annotate@{text,math,invisible,environment} :)

@Jazzpirate
Copy link
Collaborator Author

I think this is incorrect.

You were right by the way, I confused that with "opacity", which I used for hoverboxes. Next step: Implement nested modules and structural features on the LaTeX-side, which is mainly a matter of namespace/module-name handling.

@kohlhase
Copy link
Member

excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants