Skip to content

πŸŠπŸ”Ž A pluggable search service for large collections of objects (like Open Food Facts)

License

Notifications You must be signed in to change notification settings

openfoodfacts/search-a-licious

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Search-a-licious

NOTE: This is a prototype which is being heavily evolved to be more generic, more robust and have much more functionalities.

This API is currently in development. Read Search-a-licious roadmap architecture notes to understand where we are headed.

Organization

There is a Lit/JS Frontend and a Python (FastAPI) Backend (current README) located on this repository.

Backend

The main file is api.py, and the schema is in models/product.py.

A CLI is available to perform common tasks.

Running the project on your machine

Note: the Makefile will align the user id with your own uid for a smooth editing experience.

Before running the services, you need to make sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:

sudo sysctl -w vm.max_map_count=262144

Then build the services with:

make build

Start docker:

docker compose up -d

Note

You may encounter a permission error if your user is not part of the docker group, in which case you should either add it or modify the Makefile to prefix sudo to all docker and docker compose commands. Update container crash because we are not connected to any Redis

Docker spins up:

  • Two elasticsearch nodes
  • Elasticvue
  • The search service on port 8000
  • Redis on port 6379

You will then need to import from a JSONL dump (see instructions below).

Development

Pre-requisites

Installing Docker
Installing Direnv

For Linux and macOS users, You can follow our tutorial to install direnv.1

Get your user id and group id by running id -u and id -g in your terminal. Add a .envrc file at the root of the project with the following content:

export USER_GID=<your_user_gid>
export USER_UID=<your_user_uid>

export CONFIG_PATH=data/config/openfoodfacts.yml
export OFF_API_URL=https://world.openfoodfacts.org
export ALLOWED_ORIGINS='http://localhost,http://127.0.0.1,https://*.openfoodfacts.org,https://*.openfoodfacts.net' 
Installing Pre-commit

You can follow the following tutorial to install pre-commit on your machine.

Installing mmap

Be sure that your system mmap count is high enough for Elasticsearch to run. You can do this by running:

sudo sysctl -w vm.max_map_count=262144

To make the change permanent, you need to add a line vm.max_map_count=262144 to the /etc/sysctl.conf file and run the command sudo sysctl -p to apply the changes. This will ensure that the modified value of vm.max_map_count is retained even after a system reboot. Without this step, the value will be reset to its default value after a reboot.

Running your local instance using Docker

Now you can run the project with Docker docker compose up . After that run the following command on another shell to compile the project: make tsc_watch. Do this for next installation steps and to run the project.

Exploring Elasticsearch data

Importing data into your development environment

  • Import Taxonomies: make import-taxonomies
  • Import products :
    # get some sample data
    curl https://world.openfoodfacts.org/data/exports/products.random-modulo-10000.jsonl.gz --output data/products.random-modulo-10000.jsonl.gz
    gzip -d data/products.random-modulo-10000.jsonl.gz
    # we skip updates because we are not connected to any redis
    make import-dataset filepath='products.random-modulo-10000.jsonl' args='--skip-updates'

#### Pages
Now you can go to :
- http://localhost:8000 to have a simple search page without use lit components
or 
- http://localhost:8000/static/off.html to access to lit components search page

To look into the data, you may use elasticvue, going to http://127.0.0.1:8080/ and reaching  http://127.0.0.1:9200 cluster: `docker-cluster` (unless you changed env variables).

#### Pre-Commit

This repo uses [pre-commit](https://pre-commit.com/) to enforce code styling, etc. To use it:
```console
pre-commit install

To run tests without committing:

pre-commit run

Debugging the backend app

To debug the backend app:

  • stop API instance: docker compose stop api
  • add a pdb.set_trace() at the point you want,
  • then launch docker compose run --rm --use-aliases api uvicorn app.api:app --proxy-headers --host 0.0.0.0 --port 8000 --reload[^use_aliases]

Running the full import (45-60 min)

To import data from the JSONL export, download the dataset in the data folder, then run:

make import-dataset filepath='products.jsonl.gz'

If you get errors, try adding more RAM (12GB works well if you have that spare), or slow down the indexing process by setting num_processes to 1 in the command above.

Typical import time is 45-60 minutes.

If you want to skip updates (eg. because you don't have a Redis installed), use make import-dataset filepath='products.jsonl.gz' args="--skip-updates"

You should also import taxonomies:

make import-taxonomies

Using sort script

See How to use scripts

Thank you to our sponsors !

This project has received financial support from the NGI Search (New Generation Internet) program, funded by the πŸ‡ͺπŸ‡Ί European Commission. Thank you for supporting Open-Souce, Open Data, and the Commons.

NGI-search logo

European flag

Footnotes

  1. For Windows users, the .envrc is only taken into account by the make commands. ↩