Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aib improve accessibility #137

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
14 changes: 4 additions & 10 deletions docs/about.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@

# Data Science in NHS England
# About the Data Science Team in NHS England

<div markdown>

![Image title](images/DS_team_photo_smaller.jpeg){ width="450" alt-tex="Picture of the Data Science team stood on some steps in London." align=right }
![Data science team photo, all stood on some stairs outdoors.](images/DS_team_photo_smaller.jpeg){ width="450" align=right}

We are the [NHS England](https://www.england.nhs.uk/) Data Science Team.

Expand All @@ -18,8 +16,6 @@ We are passionate about getting the most value out of the data collected by NHS

[Contact Us ([email protected])](mailto:[email protected]){ .md-button .md-button--primary }

</div>

## Teams

In NHSE data scientists are concentrated in the central team but also embedded across a number of other areas.
Expand Down Expand Up @@ -60,8 +56,6 @@ In NHSE data scientists are concentrated in the central team but also embedded a

</div>

<br/>

## Learn about Data Science in Healthcare

To support knowledge share of data science in healthcare we've put together a **monthly newsletter** with valuable **insights**, **training opportunities** and **events**.
Expand All @@ -83,7 +77,7 @@ We also support the [NHS Data Science Community](https://data-science-community.

## Our Members
??? "Our Members"
<table id="myTable" style="width:100%;">
<table id="myTable" >
<div class="flex flex-basis">
<select id="columnToSearch">
<option value="Name">Name</option>
Expand All @@ -93,7 +87,7 @@ We also support the [NHS Data Science Community](https://data-science-community.
</select>
<input type="text" id="myInput" onkeyup="tableFilter('myTable','myInput')" placeholder="Search...">
</div>
<tr><th style="width: 30%;">Name</th><th>Role</th><th>Team</th><th>Github</th></tr>
<tr><th>Name</th><th>Role</th><th>Team</th><th>Github</th></tr>
<tr><td>Sarah Culkin</td><td>Deputy Director</td><td>Central Data Science Team</td><td><a href="https://github.com/SCulkin-code">SCulkin-code</a></td> </tr>
<tr><td>Rupert Chaplin</td><td>Assistant Director</td><td>Central Data Science Team</td><td><a href="https://github.com/rupchap">rupchap</a></td> </tr>
<tr><td>Jonathan Hope</td><td>Data Science Lead</td><td>Central Data Science Team</td><td><a href="https://github.com/JonathanHope42">JonathanHope42</a></td> </tr>
Expand Down
4 changes: 2 additions & 2 deletions docs/articles/posts/20230105_rap.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ Over the past year, we’ve been going through a change programme and adopting R

<!-- more -->

![The authors Alistair Bullward and Sam Hollings.](https://digital.nhs.uk/binaries/content/gallery/website/data-points-blog/rap-blog-lead-image.jpg/rap-blog-lead-image.jpg/website%3AnewsPostImageLarge2x)
![Picture of the authors Alistair Bullward (left) and Sam Hollings(right).](https://digital.nhs.uk/binaries/content/gallery/website/data-points-blog/rap-blog-lead-image.jpg/rap-blog-lead-image.jpg/website%3AnewsPostImageLarge2x)

This is about analytics and data, but knowledge of RAP isn’t just for those cutting code day-to-day. It’s crucial that senior colleagues understand the levels and benefits of RAP and get involved in promoting this new way of working and planning how we implement it.

This improves the lives of our data analysts and the quality of our work.

[Read the whole article **HERE**](https://digital.nhs.uk/blog/data-points-blog/2023/why-were-getting-our-data-teams-to-rap){ .md-button .md-button--primary }
[Read the whole article **HERE** (opens in new tab)](https://digital.nhs.uk/blog/data-points-blog/2023/why-were-getting-our-data-teams-to-rap){ .md-button .md-button--primary }

4 changes: 2 additions & 2 deletions docs/articles/posts/20240411_privlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ description: >
### LMs can memorize their Training Data

<figure class="inline end" markdown>
![xkcd - Predictive Models](https://imgs.xkcd.com/comics/predictive_models.png)
<figcaption>Figure 1: <a href="https://xkcd.com/2169/">xkcd 2169 - Predictive Models</a></figcaption>
![Cartoon of a stick figure sat at a desk. The caption says "When you train predictive models on input from your users, it can leak information in unexpected ways". On the computer screen it says "Long live the revolution. Our next meeting will be at" with an autofill greyed out of "the docks at midnight on June 28". The stick figure is saying "Aha, found them!"](https://imgs.xkcd.com/comics/predictive_models.png)
<figcaption>Figure 1: <a href="https://xkcd.com/2169/">xkcd 2169 - Predictive Models (opens in new tab)</a></figcaption>
</figure>

Studies have shown that LMs can inadvertently memorise and disclose information verbatim from their training data when prompted in certain ways, a phenomenon referred to as training data leakage. This leakage can violate the privacy assumptions under which datasets were collected and can make diverse information more easily searchable.
Expand Down
16 changes: 8 additions & 8 deletions docs/articles/posts/20240807_annotation_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,19 @@ description: >

## Introduction

We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data called [Privacy Fingerprint](https://nhsengland.github.io/datascience/our_work/ds255_privacyfp/).
We have been building a proof-of-concept tool that scores the privacy risk of free text healthcare data called [Privacy Fingerprint (opens in new tab)](https://nhsengland.github.io/datascience/our_work/ds255_privacyfp/).

Named Entity Recognition (NER) is a particularly important part of our pipeline. It is the task of identifying, categorizing and labelling specific pieces of information, known as entities, within a given piece of text. These entities can include the names of people, dates of birth, or even unique identifiers like NHS Numbers.

As of the time of writing, there are two NER models fully integrated within the Privacy Fingerprint pipeline used to identify entities which may contribute towards a privacy risk. These are:

- [UniversalNER](https://universal-ner.github.io/): A prompted-based NER Model, where a language model has been finetuned with a conversation-style prompt to output a list containing all entities in the text corresponding to an input entity type.
- [GLiNER](https://github.com/urchade/GLiNER): A BERT-like bidirectional transformer encoder with a key benefit over UniversalNER in that it is a smaller model in terms of memory size.
- [UniversalNER (opens in new tab)](https://universal-ner.github.io/): A prompted-based NER Model, where a language model has been finetuned with a conversation-style prompt to output a list containing all entities in the text corresponding to an input entity type.
- [GLiNER (opens in new tab)](https://github.com/urchade/GLiNER): A BERT-like bidirectional transformer encoder with a key benefit over UniversalNER in that it is a smaller model in terms of memory size.

Both NER models in our pipeline need to be fed a list of entities to extract. This is true for many NER models, although some like [Stanza](https://stanfordnlp.github.io/stanza/) from [Stanford NLP Group](https://stanfordnlp.github.io/) and [BERT](https://huggingface.co/docs/transformers/tasks/token_classification) token classifiers do not need an initial entity list for extraction. For our privacy tool to be effective, we want our list of entities to be representative of the real entities in the data, and not miss any important information.
Both NER models in our pipeline need to be fed a list of entities to extract. This is true for many NER models, although some like [Stanza (opens in new tab)](https://stanfordnlp.github.io/stanza/) from [Stanford NLP Group (opens in new tab)](https://stanfordnlp.github.io/) and [BERT (opens in new tab)](https://huggingface.co/docs/transformers/tasks/token_classification) token classifiers do not need an initial entity list for extraction. For our privacy tool to be effective, we want our list of entities to be representative of the real entities in the data, and not miss any important information.

<figure class="inline end" markdown>
![Cartoon of man trying to extract entities. He looks confused and frustrated](../../images/annotation_tools_blog/entity_extraction_cartoon.jpg)
![Cartoon of man trying to extract entities. He looks confused and frustrated. He has a speech bubble saying "Extract an entity? WHat does that mean?"](../../images/annotation_tools_blog/entity_extraction_cartoon.jpg)
<figcaption>Figure 1: A frustrated user trying to extract entites!. </figcaption>
</figure>

Expand Down Expand Up @@ -73,7 +73,7 @@ There were two approaches we took to develop an annotation tool.
<figcaption>Figure 2: An example of the ipyWidgets and DisplaCy labelling application. All clinicial notes are synthetic. </figcaption>
</figure>

First, we used [DisplaCy](https://spacy.io/usage/visualizers/), [ipyWidgets](https://github.com/jupyter-widgets/ipywidgets/blob/main/docs/source/examples/Index.ipynb), and a NER model of choice to generate an interactive tool that works inside Jupyter notebooks. DisplaCy is a visualiser integrated into the SpaCy library which allows you to easily visualise labels. Alongside ipyWidgets, a tool that allows you to create interactive widgets such as buttons, we created an interface which allowed a user to go through reviews and add new entities.
First, we used [DisplaCy (opens in new tab)](https://spacy.io/usage/visualizers/), [ipyWidgets (opens in new tab)](https://github.com/jupyter-widgets/ipywidgets/blob/main/docs/source/examples/Index.ipynb), and a NER model of choice to generate an interactive tool that works inside Jupyter notebooks. DisplaCy is a visualiser integrated into the SpaCy library which allows you to easily visualise labels. Alongside ipyWidgets, a tool that allows you to create interactive widgets such as buttons, we created an interface which allowed a user to go through reviews and add new entities.

One of the main advantages of this method is that everything is inside a Jupyter notebook. The entity names you want to extract come straight from the experiment parameters, so if you used this in the same notebook as the rest of your pipeline the entitiy names could be updated automatically from the labelling tool. This would allow easy integration into a user workflow.

Expand All @@ -88,7 +88,7 @@ This approach was simple and resulted in a fully working example. However, highl
<figcaption>Figure 3: An example of the Streamlit labelling application. All clinicial notes are synthetic. </figcaption>
</figure>

We explored a second option using [Streamlit](https://streamlit.io/). Streamlit is a python framework that allows you to build simple web apps. We can use it alongside a package called [Streamlit Annotation Tools](https://github.com/rmarquet21/streamlit-annotation-tools) to generate a more interactive user interface. As an example, a user can now use their cursor to highlight particular words and assign them an entity type which is more hands-on and engaging. Unlike our ipyWidgets example, users can select different labels to be displayed which makes the tool less cluttered, and you can easily navigate using a slider to separate reviews. Like the previous widget, there is a button which uses a NER model to label the text and give live feedback. Including this, the tool is more synergistic, easier to use and more immersive than the ipyWidgets alternative.
We explored a second option using [Streamlit (opens in new tab)](https://streamlit.io/). Streamlit is a python framework that allows you to build simple web apps. We can use it alongside a package called [Streamlit Annotation Tools (opens in new tab)](https://github.com/rmarquet21/streamlit-annotation-tools) to generate a more interactive user interface. As an example, a user can now use their cursor to highlight particular words and assign them an entity type which is more hands-on and engaging. Unlike our ipyWidgets example, users can select different labels to be displayed which makes the tool less cluttered, and you can easily navigate using a slider to separate reviews. Like the previous widget, there is a button which uses a NER model to label the text and give live feedback. Including this, the tool is more synergistic, easier to use and more immersive than the ipyWidgets alternative.

However, there were still a few teething issues when developing the Streamlit app. Firstly, Streamlit annotation tool’s has an inability to display `\n` as a new line and instead prints `\n`, resulting in the structure of text being lost. This is a Streamlit issue and we haven’t yet found a way to keep the structure of the text intact. There was an easy fix in which each `\n` was replaced with two spaces (this means the start and end character count for each labelled entity remains consistent with the original structured text), but the structure of the text is still lost which may cause issues for some future users.

Expand All @@ -100,4 +100,4 @@ Both labelling tools we have identified have key advantages. DisplaCy and ipyWid

Following the research and development of these two tools, we believe the ability to interactively annotate, explore and extract entities from your data greatly improves the user experience when using our privacy risk scorer pipeline.

We will publish working examples of annotation using both ipyWidgets and Streamlit, such that a future user can build on them or use them to improve their workflow. The code is available on our [github](https://github.com/nhsengland/privfp-experiments).
We will publish working examples of annotation using both ipyWidgets and Streamlit, such that a future user can build on them or use them to improve their workflow. The code is available on our [github (opens in new tab)](https://github.com/nhsengland/privfp-experiments).
Binary file added docs/images/LIME-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ai-deep-dive.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ai-skunkworks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/dag_job_opportunity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/example_report_output.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/nhs-resolution.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/sas.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/stminsights_lowquality.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/vae.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/our_work/adrenal-lesions.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ tags: ['CLASSIFICATION','LESION DETECTION','COMPUTER VISION','AI']
---

<figure markdown >
![Adrenal flow of transfer](../images/Flow_of_transfer.width-800.png) </a>
![Flow of work for the adrenal lesion project. Starts with 2.5D images on the left, with an arrow to "Pre-trained deep learning model", then to "model training". This then flows into "Model for this usecase", which has two arrows to "Normal" and "Abnormal". The core of the diagram is also labelled as "2D neural network".](../images/Flow_of_transfer.width-800.png) </a>
</figure>

Many cases of adrenal lesions, known as adrenal incidentalomas, are discovered incidentally on CT scans performed for other medical conditions. These lesions can be malignant, and so early detection is crucial for patients to receive the correct treatment and allow the public health system to target resources efficiently. Traditionally, the detection of adrenal lesions on CT scans relies on manual analysis by radiologists, which can be time-consuming and unsystematic.
Expand Down Expand Up @@ -47,7 +47,7 @@ Due to the intrinsic nature of CT scans (e.g., a high operating cost, limited nu

To overcome some of the disadvantage of training a 3D deep learning model, we took a 2.5D deep learning model approach in this case study. Training the model using 2.5D images enables our deep learning model to still learn from the 3D features of the CT scans, while increasing the number of training and testing data points in this study. Moreover, we can apply 2D deep learning models to the set of 2.5D images, which allow us to apply transfer learning to train our own model further based on the knowledge learned by other deep learning applications (e.g., ImageNet, and the NHS AI Lab’s National COVID-19 Chest Imaging Database).

![Adrenal flow of transfer](../images/Flow_of_transfer.width-800.png)
![Same image as at the top of the page: Flow of work for the adrenal lesion project. Starts with 2.5D images on the left, with an arrow to "Pre-trained deep learning model", then to "model training". This then flows into "Model for this usecase", which has two arrows to "Normal" and "Abnormal". The core of the diagram is also labelled as "2D neural network".](../images/Flow_of_transfer.width-800.png)

#### Classification of 3D CT scans

Expand All @@ -59,7 +59,7 @@ To connect the classification prediction results from the 2.5D images to the CT

To prepare the CT scans for this case study (region of interest focus on the adrenal grands), we also developed a manual 3D cropping tool for CT scans. This cropping applied to all three dimensions, including a 1D cropping to select the appropriate axial slices and a 2D cropping on each axial slice. The final cropped 3D image covered the whole adrenal gland on both sides with some extra margin on each side.

![Adrenal cropping](../images/Cropping_process.width-800.png)
![Diagram of how the image cropping to focus on the adrenal glands occurs.](../images/Cropping_process.width-800.png)

### Outcomes and lessons learned

Expand Down
6 changes: 4 additions & 2 deletions docs/our_work/ai-deep-dive.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
---
title: 'AI Deep Dive'
title: 'AI Deep Dive Workshops'
summary: 'The NHS AI Lab Skunkworks team have developed and delivered a series of workshops to improve confidence working with AI.'
category: 'Playbooks'
origin: 'Skunkworks'
tags : ['AI', 'GUIDANCE', 'BEST PRACTICE']
---

# Case Study
<figure markdown >
![](../images/ai-deep-dive.jpg)
</figure>

## Info

Expand Down
2 changes: 1 addition & 1 deletion docs/our_work/ai-dictionary.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ origin: 'Skunkworks'
tags : ['AI', 'DICTIONARY', 'JAVASCRIPT', 'REACT']
---

[![AI Dictionary](../images/ai-dictionary.png)](https://nhsx.github.io/ai-dictionary)
[![Image of a browser showing the AI dictionary.](../images/ai-dictionary.png)](https://nhsx.github.io/ai-dictionary)

AI is full of acronyms and a common understanding of technical terms is often lacking.

Expand Down
2 changes: 2 additions & 0 deletions docs/our_work/ai-skunkworks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ origin: 'Skunkworks'
tags: ['CLASSIFICATION','LESION DETECTION','AI', 'PYTHON']
---

![AI Skunkworks website homepage](../images/ai-skunkworks.png)

!!! info
Welcome to the technical website of the NHS AI Lab Skunkworks team. For our general public-facing website, please visit the [AI Skunkworks programme](https://www.nhsx.nhs.uk/ai-lab/ai-lab-programmes/skunkworks/)

Expand Down
2 changes: 1 addition & 1 deletion docs/our_work/ambulance-delay-predictor.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ origin: 'Skunkworks'
tags: ['AMBULANCE','PREDICTION','RANDOM FOREST', 'CLASSIFICATION', 'TIME SERIES', 'PYTHON']
---

![Ambulance Handover Delay Predictor screenshot](../images/ambulance-delay-predictor.png)
![Ambulance Handover Delay Predictor screenshot showing the handover times expected for different hospitals, with the high times highlighted in orange.](../images/ambulance-delay-predictor.png)

Ambulance Handover Delay Predictor was selected as a project in Q2 2022 following a successful pitch to the AI Skunkworks problem-sourcing programme.

Expand Down
2 changes: 1 addition & 1 deletion docs/our_work/bed-allocation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ origin: 'Skunkworks'
tags: ['HOSPITAL','BAYESIAN FORECASTING','MONTE CARLO','GREEDY ALLOCATION', 'PYTHON']
---

![Bed allocation screenshot](../images/bed-allocation.png)
![Browser shwing the dashboard for Kettering General Hospital that shows the forecasting of their bed occupancy.](../images/bed-allocation.png)

Bed allocation was identified as a suitable opportunity for the AI Skunkworks programme in May 2021.

Expand Down
4 changes: 2 additions & 2 deletions docs/our_work/c245_synpath.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: Building the Foundations for a Generic Patient Simulator
title: Building the Foundations for a Generic Patient Simulator (SynPath)
summary: Developing an agent-based simulation for generating synthetic patient pathways and scenario modelling for healthcare specific implementations.
category: Projects
permalink: c245_synpath.html
tags: ['SYNTHETIC DATA', 'PATHWAYS','SIMULATION']
---

![Overview of data model](../images/c245fig1.png)
![](../images/c245fig1.png)
<figcaption>Figure 1: Overview of the Synpath data model</figcaption>

A data model (“Patient Agent”) was developed for fake patients to be defined in the simulation. The patient is then assigned a health record (conditions, medications, ..) with optional additional attributes.
Expand Down
2 changes: 1 addition & 1 deletion docs/our_work/c250_nhscorpus.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ permalink: c250_nhscorpus.html
tags: ['NLP']
---

![Ingest, Enrich, Share](../images/c250fig1.png)
![Ingest box containing the logo for scrapy and a screenshot of the NHS.uk website, Enrich box including logos for Helin, brat, and doccan, Share box including huggingface, database. Under the boxes there are the docker, SQLPad, elasticsearch and caddy logos.](../images/c250fig1.png)
<figcaption>Figure 1: Open source tools used in each functional setting</figcaption>

We aimed to explore how to build an Open, Representative, Extensible and Useful set of tools to curate, enrich and share sources of healthcare text data in an appropriate manner.
Expand Down
Loading