Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for Generic Robust Substitution Mechanism #1202

Open
gmischler opened this issue Jun 14, 2024 · 1 comment
Open

Proposal for Generic Robust Substitution Mechanism #1202

gmischler opened this issue Jun 14, 2024 · 1 comment

Comments

@gmischler
Copy link
Collaborator

gmischler commented Jun 14, 2024

Background
The processing model of fpdf2 of pretty much "write any user input to the output stream immediately" makes it difficult and often nearly impossible to dynamically adapt to the characteristics (eg. size) of the input data in many situations.

We currently have one formal substitution mechanism, which allows to use "{nb}" to insert the total number of pages before it is known. This approach is inherently problematic, first because it may conflict with a possible intention to render the same character sequence on page, and also because it conflicts with text shaping.

There's another possible use case for late value substitution: For #678 and #1154, a solution might be to wrap the page content in a transformation (move or rotation), where the actual parameters of that transformation only become known once the page is complete.

Other use cases for similar substitutions may come up with time.

Solution
The robustness could most easily be improved by replacing the explicit string "{nb}" in the output stream with a sequence containing noncharacters. These are special Unicode code points for private and strictly internal use, which means they should never be shared or transferred between different software packages. This makes them safe for use as conflict-free substitution markers.

Note that by the Unicode standard we should not accept such markers from client software, as the noncharacters are strictly for internal use. So for more generic user interaction we need to define a hierarchy of token classes that allow to specify the type and size of the values to substitute.

    SubstitutionToken
        IntSubToken
            TotalPagesToken (predefined singleton)
        FloatSubToken
        TextSubToken

When used for rendering as text, those tokens get converted into a special type of Fragment() (they could even be derived from Fragment() themselfes). Their important properties are a unique key (automatically generated) to distinguish between intended substitution targets and a width (likely in pica), which will be used by _render_styled_text_line() to determine where to continue with the following text.
Those substitution Fragment()s will be ignored by text shaping. This will make some subistitution results slightly less pretty, and tokens can't be substituted by text using a complex script, but that should be an acceptable limitation.
Most likely the substitution Fragment()s should also be considered "unbreakable". Since we don't know their actual content yet when doing the line wrapping, there's really no other way to handle them at that point.

They get written into the output stream in a form eg. like this:

marker_pattern = f"\uFDD0{key:d}\uFDD1"

FPDF.write() and the text regions could be combined with special methods like .insert_total_pages(width="3em").

For backwards compability with the (then deprecated) use of "{nb}", the text parsing routines could replace that string with an appropriate subtype of Fragment(), to be written as a marker as shown.

In all text input methods, maybe we can allow the user to use their own "{format}" keys in the text, with our methods accepting a dict of key/token pairs, which they will then convert internally into the appropriate marker strings.

    my_text = "page {current_pageno} of {my_total_pages}"
    pageno_token = IntSubToken(width="3em")
    pdf.cell(text=my_text, substitute=dict(current_pageno=pageno_token, my_total_pages=TotalPagesToken))
    # add some other stuff
    pageno_token.set_value(42)
    # TotalPagesToken may get automatically updated
    pdf.output("substitution_demo.pdf")

Many other FPDF methods may accept substitution tokens in place of explicit values. Eg. a transformation may accept an instance of a float-type token in place of an actual float. Before writing the file, the user must then update their copy of the token with the correct value, which will cause its to be replaced in the output. Forgetting to set the value of a used token is an error.

    y_move_token = FloatSubToken()
    with pdf.move(x=0, y=y_move_token): # analog to skew(), rotation(), etc.
        # create content
    remaining_y = pdf.eph - pdf.y
    y_move_token.set_value(remaining_y)
    pdf.pages[pdf.page].set_dimensions(pdf.w_pt, (pdf.y + pdf.t_margin)*pdf.k)
    pdf.output("substitution_demo.pdf")

Sorry I couldn't come up with a simpler solution, but there are many different constraints in the different phases of processing, all of which need to be addressed.

Any better ideas? 💡
Any takers?

@andersonhc
Copy link
Collaborator

I had already started working on refactoring the alias code to fix #1090. I just submitted the PR. I believe it is one step closer on your vision for the substitution mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants