Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Button elements are ignored #25

Open
tom-macneil opened this issue Jan 15, 2020 · 0 comments
Open

Button elements are ignored #25

tom-macneil opened this issue Jan 15, 2020 · 0 comments

Comments

@tom-macneil
Copy link

Trying autologin against some of the sites in the training data, I found that some sites have changed since the data was collected and won't work.
Formasaurus is ignoring 'button' elements, which in these cases are being used for the submit instead of an input element and are required to login.

Examples:

The problem mainly seems to be that Buttons are Elements or HtmlElements. Unlike InputElements these don't have .name or .type attributes so are filtered by if getattr(f, 'name', None), and then if I modify the code so that that doesn't filter them it blows up later on when it assumes it's got .name and .type attributes.

As a hacky workaround/proof I modified html.load_html to convert all button elements to input elements:

    parsed = lxml.html.fromstring(html, base_url=base_url, parser=parser)
    for node in parsed.xpath('//button'):
        new_node = etree.Element("input")
        for a,b in node.items():
            new_node.set(a, b)
        node.getparent().replace(node, new_node)
    return parsed

After which autologin worked on the above sites.

Thanks,

Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant