-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
984cb73
commit 65ea210
Showing
1 changed file
with
4 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,4 +3,7 @@ | |
This is probably the most complex folder of all the repository, so I will try to be as detailed as possible. | ||
|
||
This folder is organized as follows: | ||
- If you are looking for how we extracted documentation data from GitHub, you should look at the `scraper` folder. The `api_scraper.py` file is the main file of this folder, containing the code that requests custom URLs to GitHub API. The file `main.py` presents the whole process of extracting a documentation file, `scrapy.py` shows how to do the URL requets to the `api_scraper.py` module and `validate.py` shows how we validated if a documentation file was valid for qualitative analysis or not. If you want to know how we converted the markdown files to spreadsheets, take a look at `export.py` (noticed that we use cmark-gfm to convert the markdown content to plaintext, which might be a pain if you are not using a system based on Linux). More information about all these files are given as doctstrings. | ||
- If you are looking for how we extracted documentation data from GitHub, you should look at the `scraper` folder. The `api_scraper.py` file is the main file of this folder, containing the code that requests custom URLs to GitHub API. The file `main.py` presents the whole process of extracting a documentation file, `scrapy.py` shows how to do the URL requets to the `api_scraper.py` module and `validate.py` shows how we validated if a documentation file was valid for qualitative analysis or not. If you want to know how we converted the markdown files to spreadsheets, take a look at `export.py` (Please noticed that we use cmark-gfm to convert the markdown content to plaintext and, if you want to run it, you will need to build cmark-gfm on your computer). More information about all these files are given in doctstrings. | ||
- Inside the `classifier` folder you will find how we performed all the classification steps until getting a final model. The subfolders are supposed to as intuitive as possible. The `data_preparation` folder, contains the code about how we prepared data for classification, the `model_selection` folder about how we selected the best estimator for our problem, the `results_report` should contain scripts used to report our final model, and the `classification` folder contains the code used to perform classification. If you want to understand the whole process, I recommend starting with the `main.py` file, where I tried to split in clear methods the stages of this process. | ||
|
||
Don't hesitate to contact me at [email protected] if you get confused, this was a one-developer job and I know that some parts might be unclear. I did my best. |