Skip to content

XiaoYang66/anonymous4nlp.github.io

Repository files navigation

Interpretable Evaluation for NER

The implementation of interpretable evaluation for NER in our paper:

"Interpretable Multi-dataset Evaluation for Named Entity Recognition"

Advantages of this codes

  • Our codes can automatically generate figures (with latex codes), web pages.
  • It is easy to delete or add attributes by simply modifying the conf.ner-attributes.
  • It is easy to change the bucketing strategy for a specific attribute by modifying the bucketing strategy defined in conf.ner-attributes.
  • It is easy to extend this code for other sequence labeling tasks. Only a few parameters in the run_task_ner.sh need to be modified, such as task_type and path_attribute_conf. (Maybe adding or deleting attributes are needed.)
  • It can help us quickly analyze and diagnose the strengths and weaknesses of a model.

Requirements

  • python3
  • texlive
  • pip3 install -r requirements.txt

Run

./run_task_ner.sh

The shell scripts include the following three aspects:

  • tensorEvaluation-ner.py -> Calculate the dependent results of the fine-grained analysis.

  • genFig.py -> Drawing figures to show the results of the fine-grained analysis.

  • genHtml.py -> Put the figures drawing in the previous step into the web page.

After running ./run_task_ner.sh, a web page named tEval-ner.html will be generated for displaying the figures with respect to fine-grained analysis.

Datasets

The datasets utilized in our paper including:

  • CoNLL-2003 (in this repository.)
  • WNUT-2016 (in this repository.)
  • OntoNotes 5.0 (You can download from LDC )

Demo

Results

We provide analysis and diagnosis of model architectures and pre-trained knowledge on six data sets, and the fine-grained analysis includes five aspects:

  • Holistic Results;
  • Break-down Performance;
  • Self-diagnosis;
  • Aided-diagnosis;
  • Heatmap.

Following, we will give an example of the BERT- and ELMo-system pair analysis and diagnosis on six datasets.

  1. Holistic Results

  1. Break-down Performance

Flair:

ELMo:

  1. Self-diagnosis

  1. Aided-diagnosis

  1. Heatmap

Analysis and diagnosis your own model.

  1. Put the result-files of your models on this path: preComputed/ner/result/. At least two result-files are required because the comparative-diagnosis is based on comparing with two models. If you have only one result-file for a model, you can choose one result-file provided by us (on the path: preComputed/ner/metric/result/).

  2. Put the train-set (your result-file trained on) on the path.: ./data/.

  1. Modify parameters in run_task_ner.sh to adjust to your data. Such as setting the following parameters: path_data (path of training set), datasets[-] (dataset name), model1 (the first model's name), model2 (the second model's name), and resfiles[-] (the paths of the results).

Note:

  • At least two result-files are required. Comparative-diagnosis is utilized to compare the strengths and weaknesses of two models, so it is necessary to input as least two model results.

  • The result-file must contain three columns separated by spaces, the columns from left to right are words, true-tags, and predicted-tags. If your result-file format does not meet the requirement, you can set the column delimiter of your result-file (or train-set file) in tensorEvaluation-ner.py.

Here, we give an example of result file format as follow:

show fig

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published