Interpretable Evaluation for NER

The implementation of interpretable evaluation for NER in our paper:

"Interpretable Multi-dataset Evaluation for Named Entity Recognition"

Advantages of this codes

Our codes can automatically generate figures (with latex codes), web pages.
It is easy to delete or add attributes by simply modifying the conf.ner-attributes.
It is easy to change the bucketing strategy for a specific attribute by modifying the bucketing strategy defined in conf.ner-attributes.
It is easy to extend this code for other sequence labeling tasks. Only a few parameters in the run_task_ner.sh need to be modified, such as task_type and path_attribute_conf. (Maybe adding or deleting attributes are needed.)
It can help us quickly analyze and diagnose the strengths and weaknesses of a model.

Requirements

python3
texlive
pip3 install -r requirements.txt

Run

./run_task_ner.sh

The shell scripts include the following three aspects:

tensorEvaluation-ner.py -> Calculate the dependent results of the fine-grained analysis.
genFig.py -> Drawing figures to show the results of the fine-grained analysis.
genHtml.py -> Put the figures drawing in the previous step into the web page.

After running ./run_task_ner.sh, a web page named tEval-ner.html will be generated for displaying the figures with respect to fine-grained analysis.

Datasets

The datasets utilized in our paper including:

CoNLL-2003 (in this repository.)
WNUT-2016 (in this repository.)
OntoNotes 5.0 (You can download from LDC )

Demo

Flair-ELMo: https://anonymous4nlp.github.io/analysis/tEval-ner-6datas-flair_elmo.html

Results

We provide analysis and diagnosis of model architectures and pre-trained knowledge on six data sets, and the fine-grained analysis includes five aspects:

Holistic Results;
Break-down Performance;
Self-diagnosis;
Aided-diagnosis;
Heatmap.

Following, we will give an example of the BERT- and ELMo-system pair analysis and diagnosis on six datasets.

Holistic Results

Break-down Performance

Flair:

ELMo:

Self-diagnosis

Aided-diagnosis

Heatmap

Analysis and diagnosis your own model.

Put the result-files of your models on this path: preComputed/ner/result/. At least two result-files are required because the comparative-diagnosis is based on comparing with two models. If you have only one result-file for a model, you can choose one result-file provided by us (on the path: preComputed/ner/metric/result/).
Put the train-set (your result-file trained on) on the path.: ./data/.

Modify parameters in run_task_ner.sh to adjust to your data. Such as setting the following parameters: path_data (path of training set), datasets[-] (dataset name), model1 (the first model's name), model2 (the second model's name), and resfiles[-] (the paths of the results).

Note:

At least two result-files are required. Comparative-diagnosis is utilized to compare the strengths and weaknesses of two models, so it is necessary to input as least two model results.
The result-file must contain three columns separated by spaces, the columns from left to right are words, true-tags, and predicted-tags. If your result-file format does not meet the requirement, you can set the column delimiter of your result-file (or train-set file) in tensorEvaluation-ner.py.

Here, we give an example of result file format as follow:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interpretable Evaluation for NER

Advantages of this codes

Requirements

Run

Datasets

Demo

Results

Analysis and diagnosis your own model.

Note:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
analysis		analysis
data		data
img		img
output_tensorEval		output_tensorEval
preComputed		preComputed
README.md		README.md
conf.ner-attributes		conf.ner-attributes
conlleval		conlleval
ner_overall_f1.py		ner_overall_f1.py
requirements.txt		requirements.txt
run_task_ner.sh		run_task_ner.sh
tensorEvaluation-ner.py		tensorEvaluation-ner.py

XiaoYang66/anonymous4nlp.github.io

Folders and files

Latest commit

History

Repository files navigation

Interpretable Evaluation for NER

Advantages of this codes

Requirements

Run

Datasets

Demo

Results

Analysis and diagnosis your own model.

Note:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages