Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add 'notes' to Request #97

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

add 'notes' to Request #97

wants to merge 2 commits into from

Conversation

BurnzZ
Copy link
Member

@BurnzZ BurnzZ commented Jun 20, 2024

A proposal to add some way of putting some kind of information to the Request.

One use case is adding some identifiers to a given request. For example, in https://github.com/zytedata/zyte-spider-templates/blob/main/zyte_spider_templates/pages/product_navigation_heuristics.py#L72, a current workaround would be to add some kind of url prefix to request.name.

It's not pretty but it does the job. However, it'd be hard to properly identify some information here.

It'd be nice if we have some dedicated field to put such arbitrary values. This would be quite useful when developers create page objects with some added notes to the Requests.

@BurnzZ BurnzZ requested review from kmike, wRAR and Gallaecio June 20, 2024 12:11
@Gallaecio
Copy link
Contributor

Gallaecio commented Jun 20, 2024

Something like Request.meta: Dict[str, Any]? 🙂

I wonder if, to make it be in line with the rest of the schema, it should be called metadata or additionalMetadata, and be in the same format as additionalAttributes.

@BurnzZ
Copy link
Member Author

BurnzZ commented Jun 21, 2024

Something like Request.meta: Dict[str, Any]? 🙂

I initially implemented it using a dict but quickly realized I don't have something to put in as values. 😅 However, I do see the point in having dict here since:

  • Faster to find something because of the keys.
  • More convenient to structure the data. For example, {"id": 312, "type": "heuristic", "type": "productNavigation"}

Updated this in a92ca86.

I wonder if, to make it be in line with the rest of the schema, it should be called metadata or additionalMetadata, and be in the same format as additionalAttributes.

Not quite sure yet with this one. I think I'm slightly leaning towards it not having to conform with the rest of the schema just to give the idea that values from this field doesn't come from Zyte API Extraction. I was thinking of calling it user_notes but that's too long.

@wRAR
Copy link
Member

wRAR commented Jun 21, 2024

I like the dict type better, I'm not sure if I like the "notes" name but I don't think "*metadata" is better.

@kmike
Copy link
Contributor

kmike commented Jun 21, 2024

+1 to using dicts.

I think it looks more like metadata, not "notes", so +1 to something along these lines.

ProbabilityRequest already have metadata attribute though. Having both "meta" and "metadata" sounds weird. For me it feels more natural to put metadata into the existing metadata attribute.

But how to do it? I think we can consider extending BaseMetadata to include this dict; it'd be something like request.metadata.non_standard or request.metadata.custom. This is a big change, because this "non_standard" metadata is going to be available for all other item types as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants