Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of HTML in markdown possible? #7

Open
jgrodziski opened this issue Sep 28, 2022 · 4 comments
Open

Parsing of HTML in markdown possible? #7

jgrodziski opened this issue Sep 28, 2022 · 4 comments

Comments

@jgrodziski
Copy link

Hi,

I would like to allow the possibility of inlining HTML in Markdown. I understand you disable this for security reason I guess but can it be made configurable somehow?
I changed the code in nextjournal/markdown/parser.clj at line 415 with:

(defmethod apply-token "html_block" [doc {inlined-html :content}] (push-node doc {:type :text :text inlined-html}))

and it works great.
What do you think?
Thanks for that well-designed and very useful lib.
Jérémie.

@zampino
Copy link
Collaborator

zampino commented Jan 2, 2023

Hi @jgrodziski,

glad you find it useful :-).

We didn't have the need for inline html so far. If needed for rendering purposes, I guess a user could add apply-token method implementations directly in their projects (as you do above) and add renderer functions in the hiccup conversion context under appropriate types, to be used as in

(nextjournal.markdown.transform/->hiccup 
 (assoc nextjournal.markdown.transform/default-hiccup-renderers :html your-fn)
 markdown-data)

@nathell
Copy link

nathell commented Jul 17, 2023

My usecase is for a static site generator. I've recently changed my blogs to use nextjournal.markdown instead of markdown-clj (rationale in this Mastodon thread) and it's working great... except it broke most images on my blog, because they appeared as <img> tags (rather than the native Markdown syntax for images) in the Markdown sources of the existing posts.

I'd say the current behaviour makes nextjournal.markdown violate the CommonMark spec. If security is the reason, I'd still make the parser emit HTML nodes by default, but have them ignored in the ast->hiccup transformer.

BTW, thank you for the fantastic library! :)

nathell added a commit to nathell/nhp that referenced this issue Jul 17, 2023
Turns out nextjournal.markdown ignores inline HTML by default (see
nextjournal/markdown#7). Fortunately, it's
reasonably easy to get it rendered.
@zampino
Copy link
Collaborator

zampino commented Jul 17, 2023

Hi @nathell

I'd still make the parser emit HTML nodes

Right, that shouldn't harm.

@zampino
Copy link
Collaborator

zampino commented Aug 15, 2024

Since it's been asked again, here's a temporary solution until we'll handle html internally.

(ns scratch.markdown-html
  (:require [nextjournal.markdown :as md]
            [nextjournal.markdown.parser :as md.parser]))

(defmethod md.parser/apply-token "html_inline" [doc {html-content :content}]
  (md.parser/push-node doc {:type :html-inline :text html-content}))

(defmethod md.parser/apply-token "html_block" [doc {html-content :content}]
  (md.parser/push-node doc {:type :html-block :text html-content}))


(md/parse "# HTML Handling

<img src=\"https://www.example.com/image1.jpg\" alt=\"High-Efficiency Antenna\">

some <span class='gorgeous'>text</span> inlined

<aside>this is valid commonmark</aside>
")

;; =>

{:toc {:type :toc,
       :children [{:type :toc,
                   :content [{:type :text, :text "HTML Handling"}],
                   :heading-level 1,
                   :attrs {:id "html-handling"},
                   :path [:content 0]}]},
 :footnotes [],
 :content [{:type :heading,
            :content [{:type :text, :text "HTML Handling"}],
            :heading-level 1,
            :attrs {:id "html-handling"}}
           {:type :html-block,
            :text "<img src=\"https://www.example.com/image1.jpg\" alt=\"High-Efficiency Antenna\">\n"}
           {:type :paragraph,
            :content [{:type :text, :text "some "}
                      {:type :html-inline, :text "<span class='gorgeous'>"}
                      {:type :text, :text "text"}
                      {:type :html-inline, :text "</span>"}
                      {:type :text, :text " inlined"}]}
           {:type :html-block, :text "<aside>this is valid commonmark</aside>\n"}],
 :type :doc,
 :title "HTML Handling"}

@zampino zampino changed the title Inline HTML possible? Parsing of HTML in markdown possible? Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants