You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current logic for word wrap is handled on the lexer level for markup languages. In the Sparser / PrettyDiff implementation, the process for wrapping would be applied in both lexing and beautification cycles but this requires additional augmentation of the content wrap. The original handling made sense given that Sparser would be used in isolation, whereas in Æsthetic, the sparser algorithm is tightly coupled with the beautification processing.
Generally speaking, the current approach is fine BUT it will not produce correct word wrap on first run, instead it will take 2 beautification runs to get the desired output, while also requiring additional handling that can otherwise be skipped if wrap logic would be instead processed in the formatting (beautification) cycle. This overhaul encompasses major refactoring to be done at the core, with likely regression to be had, however it is a matter of necessity at this point.
Example
The main problem with the current tactic is that leading indentation levels are not being taken into consideration when wrapping in the parse (lexing) cycle and when we enter the formatting cycle we need to augment tokens in the data structure which have already undergone augmentation.
Current Lexing Cycle
Take the following code snippet, with an assumed wrap limit of 50 the following will occur during the lexing cycle:
<p>
Lorem ipsum, dolor sit amet consectetur adipisicing elit. Facilis quasi corrupti ipsam impedit nostrum odio.
Nulla accusantium repellat officiis voluptate similique aut sint reiciendis totam, aliquid, voluptatum qui consequuntur placeat!
</p>
During the lexing cycle, the above will be transformed to the following, assuming a wrap limit of 50
<p>
Lorem ipsum, dolor sit amet consectetur
adipisicing elit. Facilis quasi corrupti ipsam
impedit nostrum odio.
Nulla accusantium repellat officiis voluptate
similique aut sint reiciendis totam, aliquid,
voluptatum qui consequuntur placeat!
</p>
The resulting uniformed data structure will look something like this (omitting additional references for the sake of example):
{token: ['<p>','Lorem ipsum, dolor sit amet consectetur\n adipisicing elit. Facilis quasi corrupti ipsam\n impedit nostrum odio.\n\n Nulla accusantium repellat officiis voluptate\n similique aut sint reiciendis totam, aliquid,\n voluptatum qui consequuntur placeat!','</p>'],types: ['start','content','end']}
The current approach will insert \n characters at the end of the text content provided, performing wrap without taking into consideration the indentation level to be imposed given the text content is contained within a <p> element, the wrap level will not be correctly applied.
As aforementioned, the logic in PrettyDiff was to patch this handling during the beautification cycle. The current logic in Æsthetic has actually skipped that additional process altogether, or it is either only processing at certain points or on certain tokens (i.e, attributes). The new tactic here will completely eliminate the imposed processing being done in the lexing cycle, instead the wrapping operations will be done during formatting.
New Lexing Cycle
The new approach here will be to significantly eliminate the operations happening during the lexing cycle, specifically the wrapping being incurred. Instead of capturing the entire text region as a token and suffixing the \n where wrap applies, instead new content types will be inserted into the data structure. Newline occurrences will signal to a new record insertion, unless the markup stripTextWrapLines rule is set the true, in such cases newlines will be stripped.
Based on the above, the new data structure will represent the following structure:
{token: ['<p>','Lorem ipsum, dolor sit amet consectetur adipisicing elit. Facilis quasi corrupti ipsam impedit nostrum odio.','Nulla accusantium repellat officiis voluptate similique aut sint reiciendis totam, aliquid, voluptatum qui consequuntur placeat!','</p>'],types: ['start','content','content','end']}
Notice how above, the text content token entries will be accurately represented based on the provided input, opposed to augmented to adhere to wrap. Our newline separated text content inserts a new record. Based on this structure, we can now simply handle wrap during the beautification cycle in a single operation, and most importantly we can ensure that indentation is taken into account whilst applying wrap. The new output will instead be reflected as:
<p>
Lorem ipsum, dolor sit amet consectetur
adipisicing elit. Facilis quasi corrupti ipsam
impedit nostrum odio. Nulla accusantium repellat
officiis voluptate similique aut sint reiciendis
totam, aliquid, voluptatum qui consequuntur
placeat!
</p>
Possible Regression
The overhaul will open up some headaches on the liquid handling front, specifically when carrying out the following operations:
forceFilter
forceArgument
These 2 rules are currently processing on the lexing level, so consideration needs to be had on that front. Other relation rulesets will also be found in the attributes handling operations, but I do believe this is also handled in the current implementation.
The text was updated successfully, but these errors were encountered:
Description
The current logic for word wrap is handled on the lexer level for markup languages. In the Sparser / PrettyDiff implementation, the process for wrapping would be applied in both lexing and beautification cycles but this requires additional augmentation of the content wrap. The original handling made sense given that Sparser would be used in isolation, whereas in Æsthetic, the sparser algorithm is tightly coupled with the beautification processing.
Generally speaking, the current approach is fine BUT it will not produce correct word wrap on first run, instead it will take 2 beautification runs to get the desired output, while also requiring additional handling that can otherwise be skipped if wrap logic would be instead processed in the formatting (beautification) cycle. This overhaul encompasses major refactoring to be done at the core, with likely regression to be had, however it is a matter of necessity at this point.
Example
The main problem with the current tactic is that leading indentation levels are not being taken into consideration when wrapping in the parse (lexing) cycle and when we enter the formatting cycle we need to augment tokens in the data structure which have already undergone augmentation.
Current Lexing Cycle
Take the following code snippet, with an assumed
wrap
limit of50
the following will occur during the lexing cycle:During the lexing cycle, the above will be transformed to the following, assuming a wrap limit of
50
The resulting uniformed data structure will look something like this (omitting additional references for the sake of example):
The current approach will insert
\n
characters at the end of the text content provided, performing wrap without taking into consideration the indentation level to be imposed given the text content is contained within a<p>
element, the wrap level will not be correctly applied.As aforementioned, the logic in PrettyDiff was to patch this handling during the beautification cycle. The current logic in Æsthetic has actually skipped that additional process altogether, or it is either only processing at certain points or on certain tokens (i.e,
attributes
). The new tactic here will completely eliminate the imposed processing being done in the lexing cycle, instead the wrapping operations will be done during formatting.New Lexing Cycle
The new approach here will be to significantly eliminate the operations happening during the lexing cycle, specifically the wrapping being incurred. Instead of capturing the entire text region as a token and suffixing the
\n
where wrap applies, instead newcontent
types will be inserted into the data structure. Newline occurrences will signal to a new record insertion, unless the markupstripTextWrapLines
rule is set thetrue
, in such cases newlines will be stripped.Based on the above, the new data structure will represent the following structure:
Notice how above, the text content
token
entries will be accurately represented based on the provided input, opposed to augmented to adhere towrap
. Our newline separated text content inserts a new record. Based on this structure, we can now simply handle wrap during the beautification cycle in a single operation, and most importantly we can ensure that indentation is taken into account whilst applying wrap. The new output will instead be reflected as:Possible Regression
The overhaul will open up some headaches on the
liquid
handling front, specifically when carrying out the following operations:forceFilter
forceArgument
These 2 rules are currently processing on the lexing level, so consideration needs to be had on that front. Other relation rulesets will also be found in the
attributes
handling operations, but I do believe this is also handled in the current implementation.The text was updated successfully, but these errors were encountered: