Project Description

In your project, you will develop a forecast or pattern recognition based on machine learning technologies, which we also cover in our lecture. Your first task is defining a research question, which then determines your project work. To do this, proceed as follows:

Think about what you would like to know (your own curiosity is a good advisor).
1. Which subject area and which relationships have you always wanted to analyze?
2. Which question (which includes a forecast or pattern detection) do you find particularly interesting?
3. Try to move from a general to a specific question.
4. Is the question a classification or a regression task?
Check what data you need to perform your analysis.
1. Which data sets are available for this analysis?
2. What is the data quality of datasets you can find?
3. Do you need to merge data? (data fusion)
Submit a proposal to your lecturer. In your proposal you must characterize:
1. What is your research question?
2. What data do you need to work on the research question?
3. What problems do you expect during processing?
4. What is your schedule for the project?
Work on your project tasks.
1. Consider possible variants in the processing. Demonstrate how you proceeded in the evaluation of variants.
2. When presenting your results, justify why you have decided for or against a particular method/algorithm.

When defining the task, you may also find inspiration by the list of dataset repositories and datasets provided below. If you have any questions about defining your topic, please contact us via Teams. If you do not want or cannot define a topic by yourself, we will also offer topics that you can choose to work on.

Happy exploration in finding and working on your project! 🤩

📖 Guidelines

Your project work is organized via a GitHub Classroom. That means that you submit your results via GitHub. Make sure that not only one team member delivers results, as the participation in the project is also evaluated. We can see in the commit history (what was devlivered by whom and in which time periods).

Your proposal will be submitted via Moodle (file upload task). Of course, you are welcome to place your proposal in GitHub. In that case, you must simple provide a text with a link to your GitHub submission (commit hash) to indicate that your proposal resides in GitHub.

We recommend to develop your proposal in discussion with your lecturers. A good process is to think about research questions which are interesting to your team, explore them (see the steps above) and then ask for feedback. Of course, you may also ask for feedback for different alternatives.

Information: This explanation is also put into your classroom assignment repository. You may delete or modify it there. Additionally, you may create structures (folders and files) on your own within your repository. All your code and findings must be placed inside this repository.

⏳ Deadlines

Proposal (due): Wednesday, 17.05.2023
Proposal (approval): Wednesday, 24.05.2023
Project (final submission deadline): Friday, 30.06.2023

🗄️ Dataset Repositories

Dataset Repository	Description	Additional Sources
Database of the German Federal Statistical Office	GENESIS-Online is a database containing deeply structured results of official results from the areas of society and environment, economy, work and government.	More Information
Dataset Search	Search engine for datasets. It allows users to find datasets hosted in thousands of repositories across the web using a simple search term.	Help
FiveThirtyEight Database	Data behind the the articles and graphics of the FiveThirtyEight website from the areas of politics, sports, science & health, economics and culture	---
Kaggle	Kaggle offers datasets to various topics (based on community contribution) and also challenges. Additionally kaggle includes a customizable, Jupyter Notebooks environment.	---
Microsoft Research Open Data	A collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain specific sciences.	About
New York Open Data	Detailed data on businesses, government, education, environment and health of New York City.	Overview
TidyTuesday	A weekly data project aimed at the R ecosystem. The intent of Tidy Tuesday is to provide a safe and supportive forum for individuals to practice their wrangling and data visualization skills independent of drawing conclusions. While we understand that the two are related, the focus of this practice is purely on building skills with real-world data.	---

🔢 Interesting Datasets 🔣

Dataset	Description	Additional Sources
Airline On-Time Performance Data	This database contains scheduled and actual departure and arrival times reported by certified U.S. air carriers that account for at least one percent of domestic scheduled passenger revenues. The data is collected by the Office of Airline Information, Bureau of Transportation Statistics (BTS).	Download here
Aircraft Wildlife Strikes	The dataset contains a record of each reported wildlife strike of a military, commercial, or civil aircraft between 1990 and 2015. Each row contains the incident date, aircraft operator, aircraft make and model, engine make and model, airport name and location, species name and quantity, and aircraft damage.
DIVI Intensivregister	Seit April 2020 erfasst das DIVI-Intensivregister täglich die freien und belegten Behandlungskapazitäten in der Intensivmedizin von etwa 1.300 Akut-Krankenhäusern in Deutschland. Im Rahmen der SARS-CoV-2-Pandemie werden zudem auch aktuelle Fallzahlen intensivmedizinisch behandelter COVID-19-Patient*innen aufgezeichnet. Das Register ermöglicht in der Pandemie, und darüber hinaus, Engpässe in der intensivmedizinischen Versorgung im regionalen und zeitlichen Vergleich zu erkennen. Damit schafft das DIVI-Intensivregister eine wertvolle Grundlage zur Reaktion und zur datengestützten Handlungssteuerung in Echtzeit.	Download
MaskedFace-Net	MaskedFace-Net is a dataset of human faces with a correctly or incorrectly worn mask (133,783 images).	Download Part I (19GB) Download Part 2 (19GB)
MusicNet	MusicNet is a collection of 330 freely-licensed classical music recordings, together with over 1 million annotated labels indicating the precise time of each note in every recording, the instrument that plays each note, and the note's position in the metrical structure of the composition. The labels are acquired from musical scores aligned to recordings by dynamic time warping. The labels are verified by trained musicians; a labeling error rate of 4% has been estimated. The MusicNet labels are offered to the machine learning and music communities as a resource for training models and a common benchmark for comparing results.	Download here
RKI COVID_19 Data	CSV-Datei mit den aktuellen Covid-19 Infektionen pro Tag (Zeitreihe). Die CSV-Datei wird täglich mit den aktuellen Fallzahlen des Robert Koch-Instituts aktualisiert.	-
Sarcasm in News	News Headlines dataset for Sarcasm Detection is collected from two news website. TheOnion aims at producing sarcastic versions of current events and we collected all the headlines from News in Brief and News in Photos categories (which are sarcastic). We collect real (and non-sarcastic) news headlines from HuffPost.	1. Trained CNN 2. Corresponding Article
Stanford Car Dataset	The Cars dataset contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images, where each class has been split roughly in a 50-50 split. Classes are typically at the level of Make, Model, Year, ex. 2012 Tesla Model S or 2012 BMW M3 coupe.	Corresponding Paper
World Happiness Report	The World Happiness Report is a publication of the United Nations Sustainable Development Solutions Network. It contains articles and rankings of national happiness, based on respondent ratings of their own lives, which the report also correlates with various (quality of) life factors.

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github		.github
1_Cleansing_Join.csv		1_Cleansing_Join.csv
1_Cleansing_Join.ipynb		1_Cleansing_Join.ipynb
2_Classification.ipynb		2_Classification.ipynb
2_Data_for_Modeltraining.csv		2_Data_for_Modeltraining.csv
3_Regression.ipynb		3_Regression.ipynb
AI&ML_Project_Hotel_Dataset.ipynb		AI&ML_Project_Hotel_Dataset.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

📖 Guidelines

⏳ Deadlines

🗄️ Dataset Repositories

🔢 Interesting Datasets 🔣

About

Releases

Packages

Contributors 5

Languages

hochschule-pforzheim/project-st23-team-f23

Folders and files

Latest commit

History

Repository files navigation

Project Description

📖 Guidelines

⏳ Deadlines

🗄️ Dataset Repositories

🔢 Interesting Datasets 🔣

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages