DA_Cryptocurrency

Exploratory Analytics on Cryptocurrencies

Our Dataset:

Top 10 cryptocurrencies

https://www.kaggle.com/sudalairajkumar/cryptocurrencypricehistory

Other Similar Datasets:

For all cryptocurrencies

https://www.kaggle.com/jessevent/all-crypto-currencies

Top 100 cryptocurrencies

https://www.kaggle.com/natehenderson/top-100-cryptocurrency-historical-data/data

Expectations from the document:

Business guidelines
Target users: Who is the target consumer of your analytics? Describe how Analytics is likely to help those target users.
Business Benefits: The project will include 4 analytics milestones. Describe the potential business benefits of each of the milestones
- Descriptive and Exploratory Analytics
- Data mining - classification
- Data mining - clustering
- Data mining - association rules

Reading Literature:

https://dealbook.nytimes.com/2014/01/21/why-bitcoin-matters/?_php=true&_type=blogs&_r=0
http://www.dummies.com/programming/big-data/phase-1-of-the-crisp-dm-process-model-business-understanding/
https://hackernoon.com/bitcoin-ethereum-blockchain-tokens-icos-why-should-anyone-care-890b868cec06
https://medium.freecodecamp.org/blockchain-is-our-first-22nd-century-technology-d4ad45fca2ce
https://www.linkedin.com/pulse/blockchain-absolute-beginners-mohit-mamoria/
https://arxiv.org/pdf/1611.03941.pdf
https://decentralize.today/5-benefits-of-cryptocurrency-a-new-economy-for-the-future-925747434103
Automated Bitcoin Trading via Machine Learning Algorithms - Technical Paper By - Isaac Madan, Department of Computer Science - Stanford University, Stanford, CA 94305- [email protected]; Shaurya Saluja, Department of Computer Science - Stanford University- Stanford, CA 94305, [email protected] ; Aojia Zhao - Department of Computer Science, Stanford University - Stanford, CA 94305, [email protected]
Trading Bitcoin and Online Time Series Prediction - Technical Paper By - Muhammad J Amjad [email protected], Operations Research Center - Massachusetts Institute of Technology - Cambridge, MA 02139, USA ; Devavrat Shah [email protected] - Department of Electrical Engineering and Computer Science - Massachusetts Institute of Technology - Cambridge, MA 02139, USA
Predicting the price of Bitcoin using Machine Learning - MSc Reseach Project Report - Data Analytics - Sean McNally - x15021581 - School of Computing - National College of Ireland - Supervisor: Dr. Jason Roche

Milestone 1 - Business Understanding

Milestone 2

Data Understanding

There are two types of datasets:

Related to daily trading on cryptocurrency. This includes: Date, Low, High, Close, Open, Volume, MarketCap All the data except Date is of numeric and continuous type.
Related to other attributes specific to particular cryptocurrency Eg. bitcoin_dataset. These includes hash transactions, no of transaction per block, block size. This type of data is available only for bitcoin and ethereum.

-Data Quality assessment -Missing values prediction

Imputing missing data

Data Preparation

-Normalisation of the Bitcoin and Ethereum data

Dataset used is bitcoin_price.csv

Though all the features(attributes) are in numeric format except Date, but the values in Volume/Market Capitalization are very high to use them for computation. For that reason, the data is normalized for all the columns to bring them to same scale.

To noramlize the data, following formula is used: (value - average)/(standard deviation).

http://www.statisticshowto.com/normalized/ http://www.dataminingblog.com/standardization-vs-normalization/

Since the dataset has lot of outliers because of recent large surge in the prices, z-score mechanism rather than (x - xmin)/(xmax-xmin).

Only the volume feature has missing values. There are multiple mechanims to handle missing values eg:

1 Ignore the rows with missing values 2 Fill the missing values using mean/median 3 Use the regression and predict the missing values.

By opting 1, the data useful from other columns could have also been lost. And option 2 was not useful because this dataset has many outliers and so using option 2 could have been given biased values. So, option 3 is opted. Linear Regression is used for continuous data and since the attribute Volume is continuous, this model is used for prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
milestone2/bitcoin		milestone2/bitcoin
Business_Understanding.tex		Business_Understanding.tex
Business_Understanding_Document_Group_27.pdf		Business_Understanding_Document_Group_27.pdf
ExploratoryAnalytics.doc		ExploratoryAnalytics.doc
InitialDataExplorationOutlierValues.R		InitialDataExplorationOutlierValues.R
README.md		README.md
data_cleaning.r		data_cleaning.r

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DA_Cryptocurrency

Milestone 1 - Business Understanding

Milestone 2

Data Understanding

Data Preparation

About

Releases

Packages

Languages

naynajain/DA_Cryptocurrency

Folders and files

Latest commit

History

Repository files navigation

DA_Cryptocurrency

Milestone 1 - Business Understanding

Milestone 2

Data Understanding

Data Preparation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages