Release 0.6.0 post fixes (#290)

* some fixes on ml * modify dockerfile * adding Procfile * Fix heroku deployment * fixes * fix docker compose * change default port/host * rollback readme * add web page classification example * add bin bash to fix gcloud deploy * add cite * remove task button * disable serving local files by default * add prediction score sampling * some docs refinements * fix anaconda compat & add docs * fixed conll export * audio overlay fix * fix ui task deletion * update js/css scripts * small readme fixes * rc0 * update conda installation readme * change setup Co-authored-by: nik <[email protected]>
HumanSignal · May 18, 2020 · 64b6cc8 · 64b6cc8
1 parent 9661db8
commit 64b6cc8
Show file tree

Hide file tree

Showing 32 changed files with 346 additions and 125 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -8,10 +8,12 @@ COPY requirements.txt /label-studio
 RUN pip install -r requirements.txt
 
 ENV PORT="8080"
-ENV collect_analytics=0
+ENV PROJECT_NAME=my_project
+
 EXPOSE ${PORT}
 
 COPY . /label-studio
 
-RUN pip install -e .
-CMD ["label-studio", "start", "my_project", "--init", "--no-browser", "--port", "8080"]
+RUN python setup.py develop
+
+CMD ["./tools/run.sh"]
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -3,4 +3,5 @@ recursive-include label_studio/static *
 include label_studio/templates/*.html
 include label_studio/utils/schema/*.json
 include label_studio/logger.json
-include label_studio/config.json
+include label_studio/config.json
+include label_studio/ml/default_configs/*
diff --git a/README.md b/README.md
@@ -61,6 +61,20 @@ pip install lxml‑4.5.0‑cp38‑cp38‑win_amd64.whl
 pip install label-studio
 ```
 
+#### Install from Anaconda
+
+```bash
+conda create --name label-studio python=3.8
+conda activate label-studio
+pip install label-studio
+```
+
+If you see any errors during installation, try to rerun installation
+
+```bash
+pip install --ignore-installed label-studio
+```
+
 #### Local development
 Running the latest Label Studio version locally without installing package from pip could be done by:
 ```bash
@@ -75,7 +89,7 @@ python label-studio/server.py start labeling_project --init
 ## Run docker
 You can also start serving at `http://localhost:8080` by using docker:
 ```bash
-docker run --rm -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --host 0.0.0.0
+docker run --rm -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init
 ```
 
 By default, it starts blank project in `./my_project` directory.
@@ -85,7 +99,7 @@ By default, it starts blank project in `./my_project` directory.
 You can override the default startup command by appending:
 
 ```bash
-docker run -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --force --template image_mixedlabel --host 0.0.0.0
+docker run -p 8080:8080 -v `pwd`/my_project:/label-studio/my_project --name label-studio heartexlabs/label-studio:latest label-studio start my_project --init --force --template text_classification
 ```
 
 If you want to build a local image, run:
@@ -161,37 +175,17 @@ The list of supported use cases for data annotation. Please contribute your own
 
 ## Machine Learning Integration
 
-You can easily connect your favorite machine learning framework with Label Studio by using [Heartex SDK](https://github.com/heartexlabs/pyheartex). 
+You can easily connect your favorite machine learning framework with Label Studio Machine Learning SDK. It's done in the simple 2 steps:
+1. Start your own ML backend server ([check here for detailed instructions](label_studio/ml/README.md)),
+2. Connect Label Studio to the running ML backend on [/model](http://localhost:8080/model.html) page
 
 That gives you the opportunities to use:
-- **Pre-labeling**: Use model predictions for pre-labeling
+- **Pre-labeling**: Use model predictions for pre-labeling (e.g. make use on-the-fly model predictions for creating rough image segmentations for further manual refinements)
+- **Autolabeling**: Create automatic annotations
 - **Online Learning**: Simultaneously update (retrain) your model while new annotations are coming
-- **Active Learning**: Perform labeling in active learning mode
+- **Active Learning**: Perform labeling in active learning mode - select only most complex examples
 - **Prediction Service**: Instantly create running production-ready prediction service
 
-There is a quick example tutorial on how to do that with simple image classification:
-
-1. Clone pyheartex, and start serving example image classifier ML backend at `http://localhost:9090`
-    ```bash
-    git clone https://github.com/heartexlabs/pyheartex.git
-    cd pyheartex/examples/docker
-    docker-compose up -d
-    ```
-
-2. Run Label Studio project specifying ML backend URLs:
-
-    ```bash
-    label-studio start imgcls --init --template image_classification \
-    --ml-backend-url http://localhost:9090 --ml-backend-name my_model
-    ```
-
-Once you're satisfied with pre-labeling results, you can immediately send prediction requests via REST API:
-```bash
-curl -X POST -H 'Content-Type: application/json' -d '{"image_url": "https://go.heartex.net/static/samples/sample.jpg"}' http://localhost:8080/predict
-```
-
-Feel free to play around any other models & frameworks apart from image classifiers! (see instructions [here](https://github.com/heartexlabs/pyheartex#advanced-usage))
-
 ## Label Studio for Teams, Startups, and Enterprises :office:
 
 Label Studio for Teams is our enterprise edition (cloud & on-prem), that includes a data manager, high-quality baseline models, active learning, collaborators support, and more. Please visit the [website](https://www.heartex.ai/) to learn more.
@@ -205,6 +199,22 @@ Label Studio for Teams is our enterprise edition (cloud & on-prem), that include
 | [label-studio-converter](https://github.com/heartexlabs/label-studio-converter) | Encode labels into the format of your favorite machine learning library | 
 | [label-studio-transformers](https://github.com/heartexlabs/label-studio-transformers) | Transformers library connected and configured for use with label studio | 
 
+## Citation
+
+```tex
+@misc{Label Studio,
+  title={{Label Studio}: A Swiss Army Knife of Data Labeling and Annotation Tools},
+  url={https://github.com/heartexlabs/label-studio},
+  note={Open source software available from https://github.com/heartexlabs/label-studio},
+  author={
+    Maxim Tkachenko and
+    Mikhail Malyuk and
+    Nikita Shevchenko and
+    Nikolai Liubimov},
+  year={2020},
+}
+```
+
 ## License
 
 This software is licensed under the [Apache 2.0 LICENSE](/LICENSE) © [Heartex](https://www.heartex.ai/). 2020

diff --git a/app.json b/app.json
@@ -1,7 +1,9 @@
 {
+  "name": "Label Studio",
   "description": "Multi-type data labeling, annotation and exploration tool",
   "keywords": ["data annotation", "data labeling"],
   "website": "https://labelstud.io",
   "repository": "https://github.com/heartexlabs/label-studio",
-  "logo": "https://labelstud.io/images/opossum/heartex_icon_opossum_green.svg"
+  "logo": "https://labelstud.io/images/opossum/heartex_icon_opossum_green.svg",
+  "stack": "container"
 }
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -7,7 +7,7 @@ services:
     working_dir: /label-studio
     volumes:
       - ./my_project:/label-studio/my_project
-    command: "label-studio start my_project ${INIT_COMMAND} "
+    command: "label-studio start my_project ${INIT_COMMAND} --host 0.0.0.0"
     ports: 
       - "8080:8080"
     restart: always
diff --git a/docs/source/guide/ml.md b/docs/source/guide/ml.md
@@ -4,12 +4,13 @@ type: guide
 order: 906
 ---
 
-You can easily connect your favorite machine learning framework with Label Studio by using [Heartex SDK](https://github.com/heartexlabs/pyheartex). 
+You can easily connect your favorite machine learning framework with Label Studio Machine Learning SDK. 
 
 That gives you the opportunities to use:
-- **Pre-labeling**: Use model predictions for pre-labeling
+- **Pre-labeling**: Use model predictions for pre-labeling (e.g. make use on-the-fly model predictions for creating rough image segmentations for further manual refinements)
+- **Autolabeling**: Create automatic annotations
 - **Online Learning**: Simultaneously update (retrain) your model while new annotations are coming
-- **Active Learning**: Perform labeling in active learning mode
+- **Active Learning**: Perform labeling in active learning mode - select only most complex examples
 - **Prediction Service**: Instantly create running production-ready prediction service
 
 
@@ -21,28 +22,37 @@ That gives you the opportunities to use:
 
 ## Quickstart
 
-Here is a quick example tutorial on how to do that with simple text classification:
+Here is a quick example tutorial on how to run the ML backend with a simple text classifier:
 
 0. Clone repo
    ```bash
    git clone https://github.com/heartexlabs/label-studio  
    ```
 
-1. Create new ML backend
+1. Setup environment
+   ```bash
+   cd label-studio
+   pip install -e .
+   cd label_studio/ml/examples
+   pip install -r requirements.txt
+   ```
+
+2. Create new ML backend
    ```bash
    label-studio-ml init my_ml_backend --script label-studio/ml/examples/simple_text_classifier.py
    ```
 
-2. Start ML backend server
+3. Start ML backend server
    ```bash
    label-studio-ml start my_ml_backend
    ```
 
-3. Run Label Studio connecting it to the running ML backend:
+4. Run Label Studio connecting it to the running ML backend:
     ```bash
     label-studio start text_classification_project --init --template text_sentiment --ml-backend-url http://localhost:9090
     ```
 
+
 ## Create your own ML backend
 
 Check examples in `label-studio/ml/examples` directory.
diff --git a/docs/source/guide/tasks.md b/docs/source/guide/tasks.md
@@ -78,6 +78,7 @@ Here is an example of a config and tasks list composed of one element, for text
         "choices": ["Neutral"]
       }
     }],
+  # score is used for active learning sampling mode
     "score": 0.95
   }]
 }]
@@ -146,27 +147,31 @@ You can split your input data into several plain text files, and specify the dir
 ### Directory with image files
 
 ```bash
-label-studio init --input-path=dir/with/images --input-format=image-dir --label-config=config.xml
+label-studio init --input-path=dir/with/images --input-format=image-dir --label-config=config.xml --allow-serving-local-files
 ```
 
+> WARNING: "--allow-serving-local-files" is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you're doing.
+
 You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:
 
 ```
-http://<host:port>/static/filename?d=<path/to/the/local/directory>
+http://<host:port>/data/filename?d=<path/to/the/local/directory>
 ```
 
 Supported formats are: `.png` `.jpg` `.jpeg` `.tiff` `.bmp` `.gif`
 
 ### Directory with audio files
 
 ```bash
-label-studio init --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml
+label-studio init --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files
 ```
 
+> WARNING: "--allow-serving-local-files" is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you're doing.
+
 You can point to a local directory, which is scanned recursively for audio files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:
 
 ```
-http://<host:port>/static/filename?d=<path/to/the/local/directory>
+http://<host:port>/data/filename?d=<path/to/the/local/directory>
 ```
 
 Supported formats are: `.wav` `.aiff` `.mp3` `.au` `.flac`
@@ -180,3 +185,23 @@ Use API to import tasks in [Label Studio basic format](tasks.html#Basic-format)
 curl -X POST -H Content-Type:application/json http://localhost:8080/api/import \
 --data "[{\"my_key\": \"my_value_1\"}, {\"my_key\": \"my_value_2\"}]"
 ```
+
+## Sampling
+
+You can define the way of how your imported tasks are exposed to annotators. Several options are available. To enable one of them, specify `--sampling=<option>` as command line option.
+
+#### sequential
+
+Tasks are ordered ascending by their `"id"` fields. This is default mode.
+
+#### uniform
+
+Tasks are sampled with equal probabilities.
+
+#### prediction-score-min
+
+Task with minimum average prediction score is taken. When this option is set, `task["predictions"]` list should be presented along with `"score"` field within each prediction.
+
+#### prediction-score-max
+
+Task with maximum average prediction score is taken. When this option is set, `task["predictions"]` list should be presented along with `"score"` field within each prediction.
diff --git a/heroku.yml b/heroku.yml
@@ -1,6 +1,5 @@
 build:
   docker:
     web: Dockerfile
-
 run:
-  web: "/app/scripts/run-demo.sh image_bbox"
+  web: ./tools/run.sh
diff --git a/label_studio/examples/html_classification/config.xml b/label_studio/examples/html_classification/config.xml
@@ -0,0 +1,17 @@
+<!-- {"title": "Web page classification", "category": "html", "complexity": "basic", "order": "!"} -->
+<View>
+  <Choices name="toxicity" toName="web_page" choice="multiple" showInline="true">
+    <Choice value="Toxic" background="red"/>
+    <Choice value="Severe Toxic" background="brown"/>
+    <Choice value="Obsene" background="green"/>
+    <Choice value="Threat" background="blue"/>
+    <Choice value="Insult" background="orange"/>
+    <Choice value="Identity Hate" background="grey"/>
+  </Choices>
+
+  <View style="border: 1px solid #CCC;
+               border-radius: 10px;
+               padding: 5px">
+    <HyperText name="web_page" value="$text"/>
+  </View>
+</View>
diff --git a/label_studio/ml/README.md b/label_studio/ml/README.md
@@ -1,23 +1,31 @@
 ## Quickstart
 
-Here is a quick example tutorial on how to do that with simple text classification:
+Here is a quick example tutorial on how to run the ML backend with a simple text classifier:
 
 0. Clone repo
    ```bash
    git clone https://github.com/heartexlabs/label-studio  
    ```
 
-1. Create new ML backend
+1. Setup environment
+   ```bash
+   cd label-studio
+   pip install -e .
+   cd label_studio/ml/examples
+   pip install -r requirements.txt
+   ```
+
+2. Create new ML backend
    ```bash
    label-studio-ml init my_ml_backend --script label-studio/ml/examples/simple_text_classifier.py
    ```
 
-2. Start ML backend server
+3. Start ML backend server
    ```bash
    label-studio-ml start my_ml_backend
    ```
 
-3. Run Label Studio connecting it to the running ML backend:
+4. Run Label Studio connecting it to the running ML backend:
     ```bash
     label-studio start text_classification_project --init --template text_sentiment --ml-backend-url http://localhost:9090
     ```

diff --git a/label_studio/ml/default_configs/_wsgi.py.tmpl b/label_studio/ml/default_configs/_wsgi.py.tmpl
@@ -9,8 +9,11 @@ from {script} import {model_class}
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description='Label studio')
     parser.add_argument(
-        '--port', dest='port', type=int, default=9090,
+        '-p', '--port', dest='port', type=int, default=9090,
         help='Server port')
+    parser.add_argument(
+        '--host', dest='host', type=str, default='0.0.0.0',
+        help='Server host')
     parser.add_argument(
         '--kwargs', dest='kwargs', metavar='KEY=VAL', nargs='+', type=lambda kv: kv.split('='),
         help='Additional LabelStudioMLBase model initialization kwargs')
@@ -70,4 +73,4 @@ if __name__ == "__main__":
         **kwargs
     )
 
-    app.run(host='localhost', port=args.port, debug=args.debug)
+    app.run(host=args.host, port=args.port, debug=args.debug)