Skip to content

Commit

Permalink
Deprecate schematics (#399)
Browse files Browse the repository at this point in the history
* Deprecate schematics

* format

---------

Co-authored-by: mauricio.barg <>
  • Loading branch information
mrwbarg committed Jun 29, 2023
1 parent 44d5316 commit 2c487b7
Show file tree
Hide file tree
Showing 20 changed files with 72 additions and 1,252 deletions.
59 changes: 36 additions & 23 deletions docs/source/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -315,12 +315,8 @@ Item validation
---------------

Item validators allows you to match your returned items with predetermined structure
ensuring that all fields contains data in the expected format. Spidermon allows
you to choose between schematics_ or `JSON Schema`_ to define the structure
of your item.

In this tutorial, we will use a schematics_ model to make sure that all required
fields are populated and they are all of the correct format.
ensuring that all fields contains data in the expected format. supports `JSON Schema`_
to define the structure of your item.

First step is to change our actual spider code to use `Scrapy items`_. Create a
new file called `items.py`:
Expand Down Expand Up @@ -367,25 +363,43 @@ And then modify the spider code to use the newly defined item:
)
)
Now we need to create our schematics model in `validators.py` file that will contain
Now we need to create our jsonschema model in the `schemas/quote_item.json` file that will contain
all the validation rules:

.. _quote-item-validation-schema:

.. code-block:: python
# tutorial/validators.py
from schematics.models import Model
from schematics.types import URLType, StringType, ListType
class QuoteItem(Model):
quote = StringType(required=True)
author = StringType(required=True)
author_url = URLType(required=True)
tags = ListType(StringType)
.. code-block:: json
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"quote": {
"type": "string"
},
"author": {
"type": "string"
},
"author_url": {
"type": "string",
"pattern": ""
},
"tags": {
"type": "array",
"items": {
"type":"string"
}
}
},
"required": [
"quote",
"author",
"author_url"
]
}
To allow Spidermon to validate your items, you need to include an item pipeline and
inform the name of the model class used for validation:
inform the path of the json schema used for validation:

.. code-block:: python
Expand All @@ -394,8 +408,8 @@ inform the name of the model class used for validation:
'spidermon.contrib.scrapy.pipelines.ItemValidationPipeline': 800,
}
SPIDERMON_VALIDATION_MODELS = (
'tutorial.validators.QuoteItem',
SPIDERMON_VALIDATION_SCHEMAS = (
'./schemas/quote_item.json',
)
After that, every time you run your spider you will have a new set of stats in
Expand All @@ -408,7 +422,7 @@ your spider log providing information about the results of the validations:
'spidermon/validation/fields': 400,
'spidermon/validation/items': 100,
'spidermon/validation/validators': 1,
'spidermon/validation/validators/item/schematics': True,
'spidermon/validation/validators/item/jsonschema': True,
[scrapy.core.engine] INFO: Spider closed (finished)
You can then create a new monitor that will check these new statistics and raise
Expand Down Expand Up @@ -473,7 +487,6 @@ The resulted item will look like this:
}
.. _`JSON Schema`: https://json-schema.org/
.. _`schematics`: https://schematics.readthedocs.io/en/latest/
.. _`Scrapy`: https://scrapy.org/
.. _`Scrapy items`: https://docs.scrapy.org/en/latest/topics/items.html
.. _`Scrapy Tutorial`: https://doc.scrapy.org/en/latest/intro/tutorial.html
Expand Down
7 changes: 2 additions & 5 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,8 @@ following features:

* It can check the output data produced by Scrapy (or other sources) and
verify it against a schema or model that defines the expected structure,
data types and value restrictions. It supports data validation based on two
external libraries:

* jsonschema: `<https://github.com/Julian/jsonschema>`_
* Schematics: `<https://github.com/schematics/schematics>`_
data types and value restrictions. It supports data validation based on
the jsonschema library (`<https://github.com/Julian/jsonschema>`_).
* It allows you to define conditions that should trigger an alert based on
Scrapy stats.
* It supports notifications via email, Slack, Telegram and Discord.
Expand Down
5 changes: 1 addition & 4 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,12 @@ build your monitors on top of it. The library depends on jsonschema_ and

If you want to set up any notifications, additional `monitoring` dependencies will help with that.

If you want to use schematics_ validation, you probably want `validation`.

So the recommended way to install the library is by adding both:

.. code-block:: bash
pip install "spidermon[monitoring,validation]"
pip install "spidermon[monitoring]"
.. _`jsonschema`: https://pypi.org/project/jsonschema/
.. _`python-slugify`: https://pypi.org/project/python-slugify/
.. _`schematics`: https://pypi.org/project/schematics/
66 changes: 2 additions & 64 deletions docs/source/item-validation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,37 +21,8 @@ the first step is to enable the built-in item pipeline in your project settings:
subsequent pipeline changes the content of the item, ignoring the
validation already performed.

After that, you need to choose which validation library will be used. Spidermon
accepts schemas defined using schematics_ or `JSON Schema`_.

With schematics
---------------

Schematics_ is a validation library based on ORM-like models. These models include
some common data types and validators, but they can also be extended to define
custom validation rules.

.. warning::

You need to install `schematics`_ to use this feature.

.. code-block:: python
# Usually placed in validators.py file
from schematics.models import Model
from schematics.types import URLType, StringType, ListType
class QuoteItem(Model):
quote = StringType(required=True)
author = StringType(required=True)
author_url = URLType(required=True)
tags = ListType(StringType)
Check `schematics documentation`_ to learn how to define a model and how to extend the
built-in data types.

With JSON Schema
----------------
Using JSON Schema
-----------------

`JSON Schema`_ is a powerful tool for validating the structure of JSON data. You can
define which fields are required, the type assigned to each field, a regular expression
Expand Down Expand Up @@ -133,36 +104,6 @@ Default: ``_validation``
The name of the field added to the item when a validation error happens and
`SPIDERMON_VALIDATION_ADD_ERRORS_TO_ITEMS`_ is enabled.

SPIDERMON_VALIDATION_MODELS
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Default: ``None``

A `list` containing the `schematics models`_ that contain the definition of the items
that need to be validated.

.. code-block:: python
# settings.py
SPIDERMON_VALIDATION_MODELS = [
'tutorial.validators.DummyItemModel'
]
If you are working on a spider that produces multiple items types, you can define it
as a `dict`:

.. code-block:: python
# settings.py
from tutorial.items import DummyItem, OtherItem
SPIDERMON_VALIDATION_MODELS = {
DummyItem: 'tutorial.validators.DummyItemModel',
OtherItem: 'tutorial.validators.OtherItemModel',
}
SPIDERMON_VALIDATION_SCHEMAS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -235,9 +176,6 @@ Some examples:
# checks that no errors is present in any fields
self.check_field_errors_percent()
.. _`schematics`: https://schematics.readthedocs.io/en/latest/
.. _`schematics documentation`: https://schematics.readthedocs.io/en/latest/
.. _`JSON Schema`: https://json-schema.org/
.. _`guide`: http://json-schema.org/learn/getting-started-step-by-step.html
.. _`schematics models`: https://schematics.readthedocs.io/en/latest/usage/models.html
.. _`jsonschema`: https://pypi.org/project/jsonschema/
Binary file not shown.
27 changes: 27 additions & 0 deletions examples/tutorial/tutorial/schemas/quote_item.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"properties": {
"quote": {
"type": "string"
},
"author": {
"type": "string"
},
"author_url": {
"type": "string",
"pattern": ""
},
"tags": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"quote",
"author",
"author_url"
]
}
2 changes: 1 addition & 1 deletion examples/tutorial/tutorial/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
SPIDERMON_SLACK_RECIPIENTS = ["@yourself", "#yourprojectchannel"]

ITEM_PIPELINES = {"spidermon.contrib.scrapy.pipelines.ItemValidationPipeline": 800}
SPIDERMON_VALIDATION_MODELS = ("tutorial.validators.QuoteItem",)
SPIDERMON_VALIDATION_SCHEMAS = ("../schemas/quote_item.json",)

SPIDERMON_VALIDATION_ADD_ERRORS_TO_ITEMS = True

Expand Down
9 changes: 0 additions & 9 deletions examples/tutorial/tutorial/validators.py

This file was deleted.

1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ slack-sdk
boto
premailer
jsonschema[format]
schematics==2.1.0
python-slugify
scrapy
pytest
Expand Down
2 changes: 0 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,6 @@
"premailer",
"sentry-sdk",
],
# Data validation
"validation": ["schematics"],
# Tools to run the tests
"tests": test_requirements,
# Tools to build and publish the documentation
Expand Down
16 changes: 2 additions & 14 deletions spidermon/contrib/scrapy/pipelines.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,10 @@
from itemadapter import ItemAdapter

from scrapy.exceptions import DropItem, NotConfigured
from scrapy.utils.misc import load_object
from scrapy import Field, Item
from scrapy import Item

from spidermon.contrib.validation import SchematicsValidator, JSONSchemaValidator
from spidermon.contrib.validation import JSONSchemaValidator
from spidermon.contrib.validation.jsonschema.tools import get_schema_from
from schematics.models import Model

from .stats import ValidationStatsManager

Expand Down Expand Up @@ -59,7 +57,6 @@ def set_validators(loader, schema):

for loader, name in [
(cls._load_jsonschema_validator, "SPIDERMON_VALIDATION_SCHEMAS"),
(cls._load_schematics_validator, "SPIDERMON_VALIDATION_MODELS"),
]:
res = crawler.settings.get(name)
if not res:
Expand Down Expand Up @@ -100,15 +97,6 @@ def _load_jsonschema_validator(cls, schema):
)
return JSONSchemaValidator(schema)

@classmethod
def _load_schematics_validator(cls, model_path):
model_class = load_object(model_path)
if not issubclass(model_class, Model):
raise NotConfigured(
"Invalid model, models must subclass schematics.models.Model"
)
return SchematicsValidator(model_class)

def process_item(self, item, _):
validators = self.find_validators(item)
if not validators:
Expand Down
1 change: 0 additions & 1 deletion spidermon/contrib/validation/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
from .schematics.validator import SchematicsValidator
from .jsonschema.validator import JSONSchemaValidator
Empty file.
39 changes: 0 additions & 39 deletions spidermon/contrib/validation/schematics/monkeypatches.py

This file was deleted.

Loading

0 comments on commit 2c487b7

Please sign in to comment.