Harnessing the Potential of Pretrained Language Models and Active Learning for Tweets Sentiment Analysis

Authors:

Marc-Antoine ALLARD ([email protected])

Antoine MAGRON ([email protected])

Paul TEILETCHE ([email protected])

Repository of APMA-AI team's report for CS-433 project 2.

Main Results

Using our codebase

Available Pre-Trained Models

This codebase is built to be compatible with any HuggingFace listed model. You can look for available models on their models page.

Experiments

Requirements: Here are the requirements to use our code:
```
pip install -r requirements.txt
```
Experiment Arguments: You are free to set any of these arguments for your experiment:
1. Model & Data Arguments
- BASE_MODEL: Base model used for training.
- N: Number of instances in the dataset.
- test_ratio: Ratio of the dataset used for testing.
1. Training Arguments
- epochs: Number of training epochs.
- optimizer: The training optimizer.
- bs: Batch size used during training.
- lr: Learning rate for the training process.
- wd: Weight decay parameter.
- warm_pct: Ratio of steps used to warmup the optimizer
1. Active Learning Arguments
- active_learning: Boolean indicating whether active learning is enabled.
- T: A parameter related to active learning.
- aware_sampling: Boolean indicating whether aware sampling is enabled.
- aware_sampling_type: Type of aware sampling.
1. Global Arguments
- SAVE_DIR: Directory for saving the model and related files.
- DATA_PATH: Path to the dataset.
- seed: Random seed for reproducibility.
- device: Device used for training (e.g., "cuda:0" for GPU).

Launch An Experiment: Our code is really simple to use. You can produce a sample test AI-Crowd submission using our notebook experiment.

Specify your arguments in the Parameters section. Here is an example of use.

# This launch an experiment using DistillBERT model with 10 000 samples using 3 epochs.
exp = Experiment(
    N=10_000,
    epochs=3,
    BASE_MODEL='distilbert-base-uncased'
)
# This launch the training procedure
model = exp.finetune()

# Perform the prediction with the previous model and store it as a csv file ready to be submit on a platform such as AI-Crowd
predictions = exp.predict(save=True)

Run the Training section to fine-tune your model.
Run the Predict section to predict your test data.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
img_res		img_res
.gitignore		.gitignore
README.md		README.md
clustering_process.py		clustering_process.py
dataset.ipynb		dataset.ipynb
eda.ipynb		eda.ipynb
experiment.ipynb		experiment.ipynb
project2_description.pdf		project2_description.pdf
requirements.txt		requirements.txt
test_data.txt		test_data.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harnessing the Potential of Pretrained Language Models and Active Learning for Tweets Sentiment Analysis

Main Results

Using our codebase

Available Pre-Trained Models

Experiments

About

Releases

Packages

Contributors 3

Languages

CS-433/ml-project-2-apma_ai

Folders and files

Latest commit

History

Repository files navigation

Harnessing the Potential of Pretrained Language Models and Active Learning for Tweets Sentiment Analysis

Main Results

Using our codebase

Available Pre-Trained Models

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages