Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON LD parsing fails when multiple strings given to FileSet.includes #698

Open
brendon-boldt opened this issue Jun 12, 2024 · 1 comment · May be fixed by #708
Open

JSON LD parsing fails when multiple strings given to FileSet.includes #698

brendon-boldt opened this issue Jun 12, 2024 · 1 comment · May be fixed by #708

Comments

@brendon-boldt
Copy link

FileSet(..., includes=["xyz"]) works but FileSet(..., includes=["xyz", "wxy"]) does not work, failing with:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "REDACTED/util/croissant.py", line 154, in <module>
    test()
  File "REDACTED/util/croissant.py", line 141, in test
    dataset = mlc.Dataset(jsonld="croissant.json")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 6, in __init__
  File "REDACTED/mlcroissant/_src/datasets.py", line 70, in __post_init__
    self.metadata = Metadata.from_file(ctx=ctx, file=self.jsonld)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACTED/mlcroissant/_src/structure_graph/nodes/metadata.py", line 429, in from_file
    return cls.from_json(ctx=ctx, json_=json_)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACTED/mlcroissant/_src/structure_graph/nodes/metadata.py", line 439, in from_json
    jsonld = expand_jsonld(json_, ctx=ctx)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACTED/mlcroissant/_src/core/json_ld.py", line 231, in expand_jsonld
    recursively_populate_jsonld(entry_node, id_to_node)
  File "REDACTED/mlcroissant/_src/core/json_ld.py", line 171, in recursively_populate_jsonld
    value = [recursively_populate_jsonld(child, id_to_node) for child in value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACTED/mlcroissant/_src/core/json_ld.py", line 161, in recursively_populate_jsonld
    return recursively_populate_jsonld(entry_node, id_to_node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACTED/mlcroissant/_src/core/json_ld.py", line 171, in recursively_populate_jsonld
    value = [recursively_populate_jsonld(child, id_to_node) for child in value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "REDACTED/mlcroissant/_src/core/json_ld.py", line 164, in recursively_populate_jsonld
    for key, value in entry_node.copy().items():
                      ^^^^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'copy'

presumably due to

value = [recursively_populate_jsonld(child, id_to_node) for child in value]

trying to recurse into the array of strings by running recursively_populate_jsonld on strs when it seems to be intended for dicts only. One simple fix would be to exit early from recusrively_populate_jsonld is entry_node is not a dict, but maybe this is just masking a deeper problem.

If you want me to open a PR or something, let me know. Thanks.

@marcenacp
Copy link
Contributor

Hi @brendon-boldt, nice catch! Can you please open a PR? That'd be super useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants