Skip to content

its-me-anvesh-var/NLP-text-corpora-build

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

NLP-text-corpora-build

Hi, In this I have used two corpora :

  1. part of coca corpus (Corpus of Contemporary American English) it is an english language Corpus and

  2. corona virus corpus

It has 11946296 words

I performed these analyses :

1)Word frequency analysis 2)Parts of Speech tagging 3)chunking and chinking 4)Word feature extraction 5)ngrams 6)Named Entity Recognition

The outputs are attached under outputs folder The codes are attached under codes folder The corpora are attached under new corpus folder

This is the directory structure in which these are the subfolders:

*new corpus -     consists of all the .txt files of the corpus
*code       -     consists of all .py files 
*outputs    -     consists of all outputs of .py files

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages