Predicting the Unpredictable

12 downloads 0 Views 2MB Size Report
5.2.2 Ethereum Classic Predicted Values vs. GSBD Open Values . ..... Figure 33: Ethereum Classic Prediction vs GSBD Open Time Series Analysis .
MS5103: Business Analytics Project

Predicting the Unpredictable: A Correlation of Cryptocurrencies, Twitter Sentiment, and Stock Markets Project Supervisor: Dr. Murray Scott 2017-2018

Group 10: Jennifer Jarnstrom – 17233142 Gavin Kane – 12432952 Claire Knightly – 17231278

0

1

Executive Summary The cryptocurrency market has undergone substantial growth in the past few years. However, the market still remains as volatile as ever. Due to this volatility, many people are interested to find out if there are factors that influence the cryptocurrency market or even those that can predict trends in the market. The current state of knowledge around cryptocurrencies does not fully take into consideration the effect of stock market fluctuations and Twitter sentiment. This means there is a lack of understanding regarding the relationships between these three items. This study aims to use stock market data and Twitter sentiment data to analyse the correlations, if any, with the cryptocurrency market. This study is important for investors and bankers because they can benefit through gained knowledge created in this report. This knowledge creates a more predictable market for investments or standardising currencies.

2

Contents Table of Figures: ...................................................................................................................................... 5 1.0 Introduction ...................................................................................................................................... 7 1.1 Background Information ............................................................................................................... 7 1.2 Ideation ......................................................................................................................................... 7 2.0 Project Methodologies...................................................................................................................... 8 2.1 Socratic Questions ........................................................................................................................ 8 2.2 Data Analytics Lifecycle Plan ......................................................................................................... 8 3.0 Project Data Sets ............................................................................................................................. 10 3.1 About the Data ............................................................................................................................ 10 3.1.1 Stock Data ............................................................................................................................ 10 3.1.2 Cryptocurrency Data ............................................................................................................ 11 3.1.3 Twitter Data ......................................................................................................................... 11 3.2 Data Selection Process ................................................................................................................ 12 3.3 Narrowing the Data for Analysis ................................................................................................. 12 4.0 Analytical Tools and Techniques ..................................................................................................... 13 4.1 Data Cleansing............................................................................................................................. 13 4.2 Prediction Model Data ................................................................................................................ 14 4.3 Sentiment Analysis Data ............................................................................................................. 15 5.0 Analysis of Objectives ..................................................................................................................... 16 5.1 Open Values Analysis .................................................................................................................. 17 5.1.1 Stratis Open vs. Bitcoin Open .............................................................................................. 17 5.1.2 Stratis Open vs. Bitcoin Prediction....................................................................................... 19 5.1.3 Ethereum Classic Open vs. GSBD Open ............................................................................... 20 5.1.4 Dash Open vs. Stratis Open.................................................................................................. 21 5.1.5 Dash Open vs. Stratis Prediction .......................................................................................... 23 5.1.6 Open Stratis vs. Prediction IBM ........................................................................................... 24 5.1.7 Open Analysis Summary ...................................................................................................... 25 5.2 Prediction Model Analysis........................................................................................................... 26 5.2.1 Stratis Predicted Values vs. IBM Open Values ..................................................................... 26 5.2.2 Ethereum Classic Predicted Values vs. GSBD Open Values ................................................. 28 5.2.3 Dash Predicted Values vs. Stratis Open Values .................................................................... 30 5.2.4 Prediction Analysis Summary............................................................................................... 31 5.3 Sentiment Analysis ...................................................................................................................... 31 5.3.1 Bitcoin vs. Sentiment ........................................................................................................... 31 5.3.2 Ethereum vs. Sentiment....................................................................................................... 32 3

5.3.3 Ripple vs. Sentiment ............................................................................................................ 33 5.3.4 Other Cryptocurrencies vs. Sentiment................................................................................. 33 5.3.5 Sentiment Analysis Summary .............................................................................................. 34 5.4 Project Analysis Summary and Future Endeavours .................................................................... 34 Appendices............................................................................................................................................ 37 A. Skills Learned ................................................................................................................................ 37 B. Additional Analytics ...................................................................................................................... 37 B.1 Visualisations .......................................................................................................................... 37 B.2 Open Bitcoin vs. Prediction IBM ............................................................................................. 39 B.3 Stratis Predicted Values vs IBM Smooth Error ........................................................................ 40 B.4 Monero Predicted Values vs Tableau Open Values ................................................................ 42 C. Other Sentiment Findings ............................................................................................................. 43 D. Code Used ..................................................................................................................................... 44 D.1 GLM and Sub-Setting Variables Code ..................................................................................... 44 D.2 Sentiment Analysis Code ........................................................................................................ 45 E. Terminology .................................................................................................................................. 46 F. About the Tools ............................................................................................................................. 47 Citations ................................................................................................................................................ 49

4

Table of Figures: Figure 1: Data Analytics Life Cycle .......................................................................................................... 9 Figure 2:Project Data Analytics Lifecycle .............................................................................................. 10 Figure 3: Stock Headings ....................................................................................................................... 10 Figure 4: Cryptocurrency Headings ....................................................................................................... 11 Figure 5: Twitter Headings .................................................................................................................... 11 Figure 6: Prediction Errors .................................................................................................................... 12 Figure 7: VBA Analysis Prediction ......................................................................................................... 13 Figure 8: Missing Values........................................................................................................................ 13 Figure 9: Forecasting Model ................................................................................................................. 15 Figure 10: Open Variables ..................................................................................................................... 16 Figure 11: Pairing Observations ............................................................................................................ 16 Figure 12: Open Value Correlations ...................................................................................................... 17 Figure 13: Stratis Open vs Bitcoin Open Scatterplot............................................................................. 18 Figure 14: Stratis Open vs Bitcoin Open Time Series Analysis .............................................................. 18 Figure 15: Stratis Open vs. Bitcoin Prediction Scatterplot .................................................................... 19 Figure 16: Stratis Open vs. Bitcoin Prediction Time Series Analysis ..................................................... 19 Figure 17: Ethereum Classic vs. GSBD Open Scatterplot ...................................................................... 20 Figure 18: Ethereum Classic Open vs. GSBD Open Line ........................................................................ 21 Figure 19: Ethereum Classic Open vs. GSBD Open (Highlighted) ......................................................... 21 Figure 20: Dash Open vs. Stratis Open Scatterplot ............................................................................... 22 Figure 21: Dash Open vs. Stratis Open Time Series Analysis ................................................................ 22 Figure 22: Dash Open vs. Stratis Open Time Series Analysis ................................................................ 23 Figure 23: Dash Open vs. Stratis Prediction Scatterplot ....................................................................... 23 Figure 24: Dash Open vs. Stratis Prediction Time Series Analysis ........................................................ 24 Figure 25: Stratis Open vs. IBM Prediction Scatterplot ........................................................................ 24 Figure 26: Stratis Open vs. IBM Prediction Time Series Analysis .......................................................... 25 Figure 27: Stratis Open vs. IBM Prediction (Highlighted) ..................................................................... 25 Figure 28: Prediction Value Correlations .............................................................................................. 26 Figure 29: Stratis Open vs. IBM Open Scatterplot ................................................................................ 27 Figure 30: Stratis Prediction vs. IBM Open Time Series Analysis .......................................................... 27 Figure 31: Stratis Prediction vs. IBM Open (Highlighted) ..................................................................... 28 Figure 32: Ethereum Classic Prediction vs. GSBD Open Scatterplot ..................................................... 29 Figure 33: Ethereum Classic Prediction vs GSBD Open Time Series Analysis ....................................... 29 Figure 34: Dash Prediction vs Stratis Open Scatterplot ........................................................................ 30 Figure 35: Dash Prediction vs. Stratis Open (Highlighted) .................................................................... 30 Figure 36: Bitcoin vs. Sentiment ........................................................................................................... 32 Figure 37: Ethereum vs. Sentiment ...................................................................................................... 32 Figure 38: Ripple vs. Sentiment ............................................................................................................ 33 Figure 39: Other Cryptocurrency Correlations ..................................................................................... 33 Figure 40: Other Sentiment Values ....................................................................................................... 34 Figure 41: Correlation Rankings ............................................................................................................ 35 Figure 42: Average Stock Price Difference ............................................................................................ 38 Figure 43: Cryptocurrency Price Difference .......................................................................................... 39 Figure 44: Bitcoin Open vs. IBM Prediction Scatterplot ....................................................................... 39 5

Figure 45: Bitcoin Open vs. IBM Prediction Time Series Analysis ......................................................... 40 Figure 46: Stratis Prediction vs. IBM Smooth Error Scatterplot ........................................................... 41 Figure 47: Stratis Prediction vs. IBM Smooth Error (Highlighted) ........................................................ 41 Figure 48: Monero Prediction vs. Tableau Smooth Error Scatterplot .................................................. 42 Figure 49: Monero Prediction vs. Tableau Smooth Error Time Series Analysis .................................... 42 Figure 50: Bitcoin Daily Change vs. Sentiment ..................................................................................... 43 Figure 51: Bitcoin Smooth Error vs. Sentiment ..................................................................................... 44

6

1.0 Introduction Based on background research on the cryptocurrency market, this project will try to determine if stocks, other cryptocurrencies or Twitter sentiment influences the cryptocurrency market. This will be done through comparison of cryptocurrency daily open and predicted values to stock market open and predicted values. The cryptocurrency values will also be compared to two sets of Twitter sentiment analysis. Statistical analysis will be used to understand correlations and qualitative research will be used to determine causation for significant correlations. The goal of our project is to use a regulated platform, an unregulated platform and internal model to best understand the future of cryptocurrencies. These predictions, along with sentiment analysis, will shed light on factors that affect cryptocurrencies. This knowledge, in turn, could be used to create even better predictions and market understanding.

1.1 Background Information Cryptocurrencies are digital currencies that use encryption technologies to regulate the units of currency and the transfer of funds. They have no borders, and thus can be very easily transferred and are unregulated. On the other hand, the stock market is a collection of markets and exchanges where equities are issued and traded. Stocks are not easily transferred and sometimes involve high trading fees. The stock market and the cryptocurrency market are two separate entities that have very different features. According to current research, there seems to be somewhat of a relationship between the stock market and the cryptocurrency market. Christopher Harvey, who is head of equity strategy at Wells Fargo, believes there is a stronger correlation between the stock market and the cryptocurrency market as of recently. Harvey believes the Bitcoin market would feel the effects of a stock market crash due to this correlation. If there was a blow to the market, he believes investors would begin to panic, and thus begin selling Bitcoin. Because of this, it can be concluded that investor sentiment can be carried over from the stock market to the Bitcoin market (Pollock, 2018).

1.2 Ideation To better understand the data, we separated it into three different analysis types. First, stock market data was analysed to find correlations, if any, with cryptocurrencies. Stock market data was chosen because it is a regulated investment platform, compared to cryptocurrencies, which are not regulated. Next, a cryptocurrency forecasting model was created to predict future exchange rates based on historical data. Finally, because cryptocurrency is a peer-to-peer open viewed transaction platform, Twitter sentiment analysis was conducted. The Twitter data from this analysis was used to identify what impact another peer-to-peer open platform could have on cryptocurrency exchange rates.

7

2.0 Project Methodologies To complete a rigorous analysis and enhance our critical thinking, two main methodologies were used. The Socratic questioning method helped develop the foundation for critical thinking in our analysis and the data analytics life cycle helped plan our project and promote success.

2.1 Socratic Questions To develop a foundation for critical thinking the 6 Socratic Questions were used. This methodology helped develop important questions to be answered and reviewed throughout the project. The most useful results of this were from the 3 sections regarding perspectives, outcomes and meta questions. The following are the resulting questions from the Socratic questioning and their solutions within the frame of this project. Questions about viewpoints and perspectives: ____________________________________________ •



Question: Are the dependencies reversed, where stock or Twitter is affected by cryptocurrencies? Solution: Complete statistical modeling, where the independent and dependent variables are reversed, changing the dependent variables to be independent and vice versa. Question: Who benefits from the results of this project? Solution: Bankers and investors of cryptocurrencies benefit through gained knowledge because it creates a more predictable market.

Questions that probe implications and consequences: ______________________________________ •

What if we are completely wrong in our assumption? Solutions: o Cryptocurrencies are truly standalone and volatile. o A stock market crash should not affect the cryptocurrencies. o Banks and/or governments can’t regulate or stabilize it as currently done in traditional currencies implying that it will never be a government sanctioned currency and will not yield long term investments. Meta Questions: ____________________________________________________________________ • • • •

Is one type of sentiment analysis better than another? What are the correlations between cryptocurrency predictions and other datasets? What are the correlations between cryptocurrency open values and other datasets? What are the correlations between sentiment analysis and other datasets? Solution: Results are completed in the analysis section.

From this methodology, our main analysis questions evolved and improved our qualitative analysis results. The meta questions created an easy approach to breaking down our main question, what affects cryptocurrencies. Then the other questions and solutions promoted a clearer idea of causation and consequences for any correlations found.

2.2 Data Analytics Lifecycle Plan The data analytics lifecycle provides a framework to organise project work. The lifecycle was used to guide our project’s development through each of the six phases: discovery, data prep, model planning, model building, communicate results and operationalization. 8

Figure 1: Data Analytics Life Cycle

Discovery-Phase 1 • Research cryptocurrencies, stocks and sentiment analysis • Learn from the past by researching previous studies, if any Resources • Stock and cryptocurrency data readily available • Tableau, R, Excel (VBA developer), SPSS, Excel (XLMiner) • Professors—Murray Scott, Trevor Clohessy, Michael Lang, Eoin Whelan, Other – Steve Risher Frame the Problem • Socratic questioning—see above section Pain Points • Narrow down data and options to review • Basic familiarity with technology to use • XLMiner – need activation key • When to stop analysing? Success Criteria • Understanding if there are any correlations with/without causation • Ability to present the findings to stakeholder in an easy to understand manner • Thoroughly complete all tasks utilizing a variety of resources and determine key findings to explore in future projects Data Prep-Phase 2 Tools Used • R • Excel • Access Variable Selection • All sentiment analysis • All cryptocurrencies we have • Stocks that have correlations • GLMs to remove stocks • Iterative testing to confirm the most significant data 9

Model Planning-Phase 3 • Decide techniques and methods (GLMs, correlation, research) • Decide on a basis for a predictive model Model Building- Phase 4 • Predictive data model (using Access) • Analysing data (using R, Excel) Communicate Results-Phase 5 • Decide if the project succeeded or failed at its objectives • Document key findings from analysis Operationalize-Phase 6 • Report • Poster presentation • Online presentation Figure 2:Project Data Analytics Lifecycle

3.0 Project Data Sets Our data sets are comprised of stock data, cryptocurrency data and Twitter data. Both the stock and cryptocurrency data sets have similar structures, whereas the Twitter data is less structured because of its nature. Selecting specific data was largely based on availability, model requirements, benefiters and influencers of the market. Eventually, important subsets of the data were identified, selected and continually narrowed, through testing and evaluation, for analysis.

3.1 About the Data As explained above, the data selected had structured data of stocks and cryptocurrencies while the Twitter data was unstructured. Open data fields were chosen for analysis, instead of the others available, because the prediction model required data from the day it was ran. If someone were to run it in the morning, then they would only have the open prices available.

3.1.1 Stock Data These are the fields and descriptions for the stock data that was used.

Figure 3: Stock Headings

10

3.1.2 Cryptocurrency Data The cryptocurrency data follows the same general structure as the stock data used, with a few additions.

Figure 4: Cryptocurrency Headings

3.1.3 Twitter Data This table shows the general structure of the Twitter data that was pulled using Twitonomy.

Figure 5: Twitter Headings

11

3.2 Data Selection Process Both stock and cryptocurrency were retrieved from the Kaggle data repository. The stocks dataset title was “Huge Stock Market Dataset” and the cryptocurrency dataset title was “Every Cryptocurrency Daily Market Price”. Twitter data was gathered using Twitonomy, allowing Twitter data to be downloaded. Forecasting data was created in Microsoft Access from the stock and cryptocurrency datasets which will be explained more in the Tools and Techniques section. Cryptocurrency data was limited to what was available through free resources. The datasets we chose to use included dates which were written in format MM/DD/YY. Throughout the report, to maintain consistency with the datasets, this format will be used when referring to dates. From that, two years of data was selected for analysis based on prediction model requirements. From that, 2 years of predictions were created thus setting the selection range of data for analysis. The dates of data are 1/1/16 to 11/7/17. This is slightly less than 2 years due to the date we initially pulled the data on 11/7/17. There were various methods to select cryptocurrencies, stocks and Twitter feeds. The cryptocurrencies were chosen based on their date range availability regarding the forecasting model requirements. The stocks and Twitter feeds were selected based on research. Stock companies chosen were from the following sectors: banking, computing, and technology. They were chosen because they would most likely benefit or influence the market. Similarly, Twitter feeds were chosen by those who founded cryptocurrencies and related technologies.

3.3 Narrowing the Data for Analysis A large variety of data was selected for initial review and needed to be narrowed for completing a rigorous and detailed analysis. From the prediction model an error value (details completed in Tool and Techniques: Forecasting) was produced. This gives the understanding of how accurate the predicted value was compared to the actual open value for the prediction date. For each stock and cryptocurrency prediction set, an average smooth error was produced. The sets that had the highest and lowest errors were selected for further analysis.

Figure 6: Prediction Errors

12

This methodology was based on the idea that the world does not fit the average and the best solution is to model based on the extremes to create a better overall product (Rose, 2016). After the initial selection process, GLMs were run on groupings of the datasets (details completed in Tools and Techniques: Statistical Analysis). From the GLMs, statistically significant variables were observed. The (dependent, independent) variable pairings that produced the highest count of statistically significant results were then selected for a detailed analysis.

4.0 Analytical Tools and Techniques Various tools were used to cleanse data, to create prediction and sentiment analysis data. Excel and Access were the primary tools used to cleanse data and create prediction data. Sentiment analysis was completed using MeaningCloud and RStudio. Various other tools were utilised throughout the duration of the project and will also be outlined. (See Appendix D for code used in relation to the tools and Appendix F for more information on the tools used.)

4.1 Data Cleansing Some data needed to be cleansed before use. In Excel, the VBA developer tool was applied to quickly fill in missing data for stocks. The cryptocurrency data has an open value for every day of the year, however, due to the nature of stocks there were various days missing. Weekend and holiday values were needed for the forecasting model. A formula was created to count the number of missing days. Then, using the VBA macro, code was created to add the number of lines based on the Excel missing days formula.

Figure 7: VBA Analysis Prediction

After the lines were added, a series of “if” statements added the missing data. The data added was the most recent open value.

Figure 8: Missing Values

13

After this was completed, the stock and cryptocurrency data structure matched, making it ready to be used in the forecasting model. The sentiment analysis data also needed some transformations before it was ready for analysis. There were multiple scores per day and days without scores. To directly compare to other data, it needed to be transformed into a daily value for each date in the comparison set by aggregation in Access. The data was also joined to a dates table to account for missing dates. The aggregation of the data was grouped by day and the analysis scores were calculated in the following ways: ● ● ● ●

Summed score Average score Average of confidence Count of all scores

4.2 Prediction Model Data A forecasting tool was created in Access to produce predicted open prices for the stock and cryptocurrency datasets. This tool was based on the X-11 forecasting model utilizing seasonality (Wallis, 1983). The final model was a simplified version of X-11 due to technical capabilities, deep understanding, and time constraints to build, test and run. To create a predicted value, an average open value from the current and previous years were taken for the previous 7 days. These values were used to create a prediction ratio. This ratio represents the change from the previous year to the current year. The ratio was then multiplied by the previous year’s open value for the prediction date creating an initial prediction. This initial prediction accuracy was measured by taking the difference of the predicted value and the actual open value over the actual value then multiplied by 100 to give the error percentage. After the initial prediction model was created, errors were reviewed for accuracy. Upon review, a pattern was observed in areas with the highest errors. Areas with the highest errors had a greater change in ratios from day to day than low error areas. From this, a smoothing ratio was added to the model. This ratio is the current date’s ratio over the previous date’s ratio. Since areas with low errors had similar ratios, then a new ratio would be close to 1 and the larger the difference the farther from 1 the smoothing ratio would be. A new prediction was created by multiplying the initial prediction by the smoothing ratio. Areas with low errors would remain the same or change slightly based on the ratio being close to one, whereas the large errors were reduced by the smoothing ratio. On average, errors were reduced by 3.1% using the smoothing ratio to create a smoothed prediction.

14

Forecasting Model Mathematics:

Figure 9: Forecasting Model

4.3 Sentiment Analysis Data Sentiment analysis was performed on the tweets of influential Twitter accounts relative to cryptocurrencies using Twitonomy to download the tweets. They were then analysed through the MeaningCloud Excel extension and R Studio. Sentiment Analysis is the process of computationally identifying and categorizing opinions expressed in a piece of text. Particularly, to determine whether the writer's attitude towards a particular topic, product, etc., is positive, negative, or neutral. Two distinct types of sentiment analysis were used for a better indication of the polarity of the tweets. First, MeaningCloud was used for analysis, giving an indication of result confidence, irony and agreement. However, this tool only results in a score from -2 (very negative) to +2 (very positive) which is a small range. The second sentiment analysis involved using RStudio to get a better range of scores. However, this was proven difficult due to the knowledge needed. The analysis was less streamlined because it involved finding appropriate user-created dictionaries to differentiate between positive and negative words. Of the many package available, the one used gave a range of -7 to 7 to give a better indication of how positive or negative a tweet was. However, this tool didn’t include the extra features mentioned in the previous tool. After completion of the analysis from both tools, using MS Access, it was more beneficial to join all user scores together to get an overall score for each day by aggregating. These scored were compared against the open price of stocks and cryptocurrencies to look for significant correlations between them. 15

5.0 Analysis of Objectives The main topic of this project is: What key factors influence the cryptocurrency market? To review this, the topic can be broken down into three main areas of analysis: 1. What a the correlations between cryptocurrency open values and stocks or other cryptocurrency datasets? 2. What are the correlations between cryptocurrency predictions and stocks or other cryptocurrencies data sets? 3. What are the correlations between cryptocurrency and sentiment of Twitter feeds? a. Is one sentiment analysis type better than another? Each section will use similar analysis procedures to complete a rigorous review and analysis of the datasets. However, based on statistical testing the pairings for review will be different per section. The sub-pairing in Open Values and Prediction sections will come from the initial pairing shown in Figure 10. This figure was created from narrowing data from GLM results to those containing the highest count of significant results. Pairings chosen in the Sentiment section will be based on most prevalent cryptocurrency types and these variables.

Figure 10: Open Variables

For each dependent and independent variable pairing, laid out in the section, the following observations will be made:

Figure 11: Pairing Observations

In completing the analysis section, a final report and analysis summary will review overall findings and success criteria previously outlined in the report. (Due to low correlations, additional analytics can be found in Appendix B.)

16

5.1 Open Values Analysis The main objective for this section is to answer the question: What are the correlations between cryptocurrency open values and stocks or other cryptocurrency datasets? This is important in detailing how cryptocurrencies can be influenced. Correlations made in this section will show what stocks could influence the cryptocurrency market. The outcomes from this section could lead to interested parties such as banks, governments or large investors influencing the market through other cryptocurrencies or stocks determined by correlation and causation. Based on previously outline methodologies and research, the following pairs were selected for analysis:

Figure 12: Open Value Correlations

5.1.1 Stratis Open vs. Bitcoin Open The highest correlation is between Stratis and Bitcoin, which are both cryptocurrencies. Stratis is an altcoin, which is a spin-off of Bitcoin. Bitcoin has been around since 2009, while Stratis is a newer type of cryptocurrency. Figure 13 below shows a scatterplot of Stratis Open vs. Bitcoin Open. From the trendline on the data in Figure 13, there is a positive correlation between Stratis Open and Bitcoin Open. The correlation coefficient is 0.896. This means that the data has a strong positive correlation, and thus for a positive increase in the Open Stratis price, there is also a positive increase in the Open Bitcoin price. R2 is .896, which means that the linear trendline fits the data well.

17

Open Bitcoin

Stratis Open (dep) vs. Bitcoin Open(ind) 900 800 700 600 500 400 300 200 100 0

y = 3673.8x + 498.51 R² = 0.8039

0

0.01

0.02

0.03

0.04

0.05

0.06

Open Stratis

Figure 13: Stratis Open vs Bitcoin Open Scatterplot

The time series analysis in Figure 14 shows the variation in open prices of Bitcoin and Stratis between 8/20/17 and 11/5/17 instead of 2 years because there is less data for Stratis since it is relatively new. This figure shows that the open values of Bitcoin and Stratis have been increasing at a similar rate. This shows that the opening price of Bitcoin has remained relatively stable from this period. On the other hand, the open price of Stratis has fluctuated, which is most likely due to it being newer. Stratis and Bitcoin do seem to act similarly around 10/29/17 and 11/5/17.

Stratis Open (dep) vs. Bitcoin Open (ind) 0.06

700

0.05

600 0.04

500 400

0.03

300

0.02

200 0.01

100 0

Stratis Open Value

Bitcoin Open Value

800

0

Open Bitcoin

Open Stratis

Figure 14: Stratis Open vs Bitcoin Open Time Series Analysis

Overall, there is a high, positive correlation between open Stratis values and open Bitcoin values. Their prices have also increased over time. Thus, it is not surprising that the prices of Stratis and Bitcoin are correlated. Altcoins, like Stratis, tend to follow Bitcoin’s price. When Bitcoin goes up quickly, then it is likely that altcoins will be suppressed as money flows into Bitcoin. When Bitcoin goes down quickly, then it is likely that the altcoins will be depressed as money flows into fiat currency, or altcoins boom as money flows into them (CryptoCurrency Facts, 2018).

18

5.1.2 Stratis Open vs. Bitcoin Prediction Due to the correlation from Stratis and Bitcoin mentioned above, it seems that there should be a relationship between Stratis Open and Bitcoin Prediction. The prediction price has been calculated from open prices, so there should be some correlation between prediction and open prices. Figure 15 shows the scatterplot of Stratis open prices with Bitcoin prediction prices. It shows that these two variables are positively correlated, with the upward sloping trendline. However, the R2 value is a little low, with 43% of the data falling within the trendline. The correlation coefficient is 0.6556108, which is moderately strong.

Stratis Open (dep) vs. Bitcoin Prediction (ind) 8000

Bitcoin Prediction

7000 6000 5000 4000 3000 2000

y = 60163x + 2715.4 R² = 0.4298

1000 0 0

0.01

0.02

0.03

0.04

0.05

0.06

Stratis Open

Figure 15: Stratis Open vs. Bitcoin Prediction Scatterplot

The time series analysis in Figure 16 below shows how the open price of Stratis and the prediction price of Bitcoin have fluctuated throughout the period. Bitcoin is steadier, while Stratis has more peaks and troughs, which is similar to the Open/Open pairing described in the previous analysis.

Stratis Open (dep) vs. Bitcoin Prediction (ind) Bitcoin Prediction Value

8000

0.06

7000

0.05

6000 0.04

5000 4000

0.03

3000

0.02

2000 0.01

1000 0

0

Bitcoin Prediction

Stratis Open

Figure 16: Stratis Open vs. Bitcoin Prediction Time Series Analysis

19

Overall, the two have a moderately strong, positive correlation. Similarly, over time, both prices have increases, with Stratis being more variable with Bitcoin less volatile. As mentioned above, it is not surprising that the two are correlated due to altcoins’ relationship with Bitcoin.

5.1.3 Ethereum Classic Open vs. GSBD Open Ethereum Classic Open and GSBD Open also have high correlations. Ethereum Classic is a decentralized cryptocurrency which split from Ethereum (Madeira, 2018). GSBD is a stock for the United States investment company, Goldman Sachs BDC, Inc. GSBD focuses on investments in midsized companies usually located in the United States (Goldmansachsbdc.com, n.d.). Figure 17 shows the scatterplot for open price of Ethereum Classic and the open price of GSBD are negatively correlated. 61% of the data falls within the linear regression line. The correlation coefficient of the two is -0.7812995, which means that they are strongly correlated.

Ethereum Classic Open (dep) vs. GSBD Open (indep) 30

GSBD Open

25 20 y = -1.1309x + 23.026 R² = 0.6104

15 10 5 0 0

0.5

1

1.5

2

2.5

3

Ethereum Classic

Figure 17: Ethereum Classic vs. GSBD Open Scatterplot

Figure 18 below shows the time series analysis for the open price of Ethereum Classic and the open price for GSBD. This graph shows that both the open price of Ethereum Classic and GSBD have decreased over time.

20

30

10

25

8

20

6

15

4

10 5

2

0

0

GSBD Open

Ethereum Classic Open

Linear (GSBD Open)

Linear (Ethereum Classic Open)

Ethereum Classic Open Values

GSBD Open Value

Ethereum Classic Open (dep) vs. GSBD Open (indep)

Figure 18: Ethereum Classic Open vs. GSBD Open Line

Figure 19 shows that for this period, while Ethereum Classic decreases, GSBD increases.

22.5 22 21.5 21 20.5 20 19.5 19 18.5

3 2.5 2 1.5 1 0.5 0

GSBD Open

Ethereum Classic Open Values

GSBD Open Value

Ethereum Classic Open (dep) vs. GSBD Open (indep)

Ethereum Classic Open

Figure 19: Ethereum Classic Open vs. GSBD Open (Highlighted)

Overall, Ethereum Classic Open and GSBD Open have a high, negative correlation. Between 8/1/17 and 11/7/17, when Ethereum Class Open decreases, GSBD increase. There seems to be no apparent reason for correlation between Ethereum Classic Open and GSBD Open based on the research done.

5.1.4 Dash Open vs. Stratis Open Dash Open vs. Stratis Open is the third pairing which is tied for highest p-value. Dash and Stratis are both cryptocurrencies, but both focus on different strategies. Dash was created to enhance privacy, while Stratis offers Blockchain-as-a-service (BaaS). In Figure 20 below, the scatterplot of the two variables is shown. Here, data is negatively correlated. R2 (.6362) is moderately high, meaning that over half of the data fits with the trendline. The correlation coefficient is -0.7976457, meaning there is a strong relationship between the two variables. 21

Dash Open (dep) vs. Stratis Open (ind) 30

Stratis Open

25 20

y = -190.59x + 29.021 R² = 0.6362

15 10 5 0 0

0.01

0.02

0.03

0.04

0.05

0.06

Dash Open

Figure 20: Dash Open vs. Stratis Open Scatterplot

Figure 21 below shows the time series analysis of Dash Open and Stratis Open, both are increasing over time. There are peaks and troughs, but on average, each of the prices have increased over time.

30

0.1

25

0.05 0

20

-0.05

15

-0.1

10

-0.15

5

-0.2

0

-0.25

Dash Open

Stratis Open

Linear (Dash Open)

Linear (Stratis Open)

Stratis Open Value

Dash Open Value

Dash Open (dep) vs. Stratis Open (ind)

Figure 21: Dash Open vs. Stratis Open Time Series Analysis

Figure 22 below shows the same graph but narrowed down on the dates that have Stratis open data. Looking at this more focused graph, there doesn’t seem to be any obvious trends in both data sets. Dash seems to be more stable in its price, while Stratis tends to vary.

22

30

0.06

25

0.05

20

0.04

15

0.03

10

0.02

5

0.01

0

0

Dash Open

Stratis Open Value

Dash Open Value

Dash Open (dep) vs. Stratis Open (ind)

Stratis Open

Figure 22: Dash Open vs. Stratis Open Time Series Analysis

Overall, Dash Open and Stratis Open have a strong, negative correlation. There is not much information to answer why these two correlate.

5.1.5 Dash Open vs. Stratis Prediction Due to the pairing of Dash Open and Stratis Open, there should be a relationship between Dash Open and Stratis Prediction Figure 23 shows the scatterplot of the opening prices of Dash with the prediction prices of Stratis. According to R2 (.102), this data does not fit the linear trendline well. The correlation is .3201, which indicates that the data has a weak positive relationship. This implies an unlikely relationship between the variables.

Dash Open (dep) vs. Stratis Prediction (ind)

12 10

y = 0.2076x + 0.1483 R² = 0.1028

Stratis Open

8 6 4 2 0

0

5

10

Dash15Open

20

25

30

Figure 23: Dash Open vs. Stratis Prediction Scatterplot

Figure 24 below shows the time series analysis between Dash Open and Stratis Prediction prices. This time series analysis shows that the open price of Dash has steadily increased over the roughly two-year range, while Stratis has decreased overall. 23

Dash Open Value

30

40 35 30 25 20 15 10 5 0

25 20 15 10 5 0

Dash Open

Stratis Prediction

Linear (Dash Open)

Linear (Stratis Prediction)

Stratis Prediction Value

Dash Open (dep) vs. Stratis Prediction (ind)

Figure 24: Dash Open vs. Stratis Prediction Time Series Analysis

Overall, Dash Open and Stratis Prediction have a low, positive correlation. This is fascinating because of the high correlation between Dash Open and Stratis Open. There is not much information to explain this relationship. It would be interesting to investigate further.

5.1.6 Open Stratis vs. Prediction IBM Both Stratis and IBM are known for offering BaaS, or Blockchain-as-a-Service. Due to this similarity, there should be correlations between the two. Figure 25 shows the scatterplot of the open prices for Stratis and the prediction prices for the stock IBM. There is a positive relationship between the data. Around half of the data fits within the linear trend line, based on R². The correlation between the two is 0.7010822 meaning that there is a strong relationship between the two variables.

Stratis Open (dep) vs. IBM Prediction (ind) 250

IBM Prediction

200 150 y = 406.48x + 133.12 R² = 0.4915

100 50 0 0

0.01

0.02

0.03

0.04

0.05

0.06

Stratis Open

Figure 25: Stratis Open vs. IBM Prediction Scatterplot

24

Figure 26 shows both IBM Prediction and Stratis Open have increased and IBM has increased more slowly than Stratis.

Stratis Open (dep) vs. IBM Prediction (ind) 0.1 0.05

200

0

150

-0.05

100

-0.1 -0.15

50

-0.2

0

Stratis Open Value

IBM Prediction Value

250

-0.25

IBM Prediction

Stratis Open

Linear (IBM Prediction)

Linear (Stratis Open)

Figure 26: Stratis Open vs. IBM Prediction Time Series Analysis

Figure 27 has narrowed dates and the circled points on the graph show the dates when Stratis increases, and IBM decreased. This pattern became stronger as the year went on.

170 165 160 155 150 145 140 135 130 125

0.06 0.05 0.04 0.03 0.02 0.01

Stratis Open Value

IBM Prediction Value

Stratis Open (dep) vs. IBM Prediction (ind)

0

IBM Prediction

Stratis Open

Figure 27: Stratis Open vs. IBM Prediction (Highlighted)

Overall, Stratis Open and IBM Prediction have a medium-low, positive correction, which possibly could be explained by both Stratis and IBM offering Blockchain-as-a-Service.

5.1.7 Open Analysis Summary For the completed analysis between open values and other datasets, there are some different correlations between pairings. Stratis Open and Bitcoin Open had the highest correlation at .897. This can be explained because altcoins are generally affected by price changes in Bitcoin. The highest 25

correlation (-0.781) between a cryptocurrency and a stock is Ethereum Classic Open and GSBD Open. There is not enough information, based on research, to explain why this correlation is so high. Due to the analysis on open values, there does seem to be a relationship between cryptocurrency open values and stock and other cryptocurrency datasets. However, this relationship would need to be investigated further because it is not strong enough to say stocks directly impact cryptocurrencies.

5.2 Prediction Model Analysis The main objective for this section is to answer the question: What are the correlations between cryptocurrency predictions and stocks or other cryptocurrency datasets? This is important in detailing how cryptocurrencies can be influenced. Correlations made in this section will show what stocks or other cryptocurrencies could be influential to cryptocurrency predictions. This will be proven through quantitative and qualitative analysis. Additionally, correlations in this section could lead to creating a better understanding for future analysis. Derived from methodologies and research, the following pairs were selected for analysis:

Figure 28: Prediction Value Correlations

5.2.1 Stratis Predicted Values vs. IBM Open Values IBM is known for their use of Software-as-a-Service (SaaS) and similarly the creators of the Stratis cryptocurrency are working towards being the household name regarding Blockchain-as-a-Service (BaaS) (Khatwani, 2018). IBM has recently entered the BaaS market creating direct competition with Stratis to be a leader in BaaS technologies (Miller,2017). Fig 29 shows that there is a positive correlation of 0.591 between the Stratis predicted values and the IBM open values. This has the highest correlation by roughly 0.3 degrees and is the only one above 0.5. Meaning that even though it may be considered medium-low, it is the strongest of the group. It also has the only positive correlation, showing that the interconnection between the two datasets moves in tandem. This is visually shown by the trend line pointing up and to the right.

26

IBM Open Values

Stratis Prediction (dep) vs. IBM Open (ind) 200 180 160 140 120 100 80 60 40 20 0

y = 6.8365ln(x) + 145.26 R² = 0.4222

0

2

4

6

8

10

12

Stratis Prediction Values

Figure 29: Stratis Open vs. IBM Open Scatterplot

To see what is happening in more detail, Figure 30 shows the two datasets over time. Due to the lack of data available for Stratis, the dates have been limited. It is easy to see that both sets are moving in tandem as mentioned by the correlations coefficient above. Both datasets are decreasing in value over the timespan. However, the IBM decrease is happening at a faster rate of -0.1067 compared to -0.0459 based on the trendline slope.

163 161 159 157 155 153 151 149 147 145

12 y = -0.1067x + 4746.4 R² = 0.6452

10 8 6 4

y = -0.0459x + 1977.9 R² = 0.4867

2

Stratis Prediction

IBM Open

Stratis Prediction (dep) Vs IBM Open (ind)

0

IBM Open

Stratis Prediction

Linear (IBM Open)

Linear (Stratis Prediction)

Figure 30: Stratis Prediction vs. IBM Open Time Series Analysis

Narrowing in, with Figure 31, both datasets fit closer to a polynomial trendline than a linear. Noting arrows 1, 2 and 3, it seems there are opposing correlations on a smaller scale. As one rises, the other falls and vice versa. This would be interesting to observe over a longer time-period.

27

Figure 31: Stratis Prediction vs. IBM Open (Highlighted)

From this analysis there are two main observations. Overall, Stratis predictions and IBM open values appear to be working in tandem. For the selected date range, they are both decreasing together. The second observation looks closer at the two and notice they are decreasing in a polynomic fashion where their rise and falls are opposing each other. Qualitative analysis for causation revealed that, Stratis was developed by a BaaS company. Therefore, it makes sense that the value would correspond with a similar company over extended periods. This is due to investor faith, someone who believes that Stratis will be profitable in the BaaS market could make an inference that they would also be profitable in the cryptocurrency market or more stable than other currencies. Since their market is similar, it can be assumed that when one prospers the other would decline. Overall, there is a medium-low, positive correlation between the Stratis predictions and the IBM open values. Over long periods of time they are both decreasing together but in a detailed view they are opposing forces.

5.2.2 Ethereum Classic Predicted Values vs. GSBD Open Values Ethereum Classic is a decentralized cryptocurrency that split from its originator Ethereum (Madeira, 2018). GSBD is a stock for the United States investment company of Goldman Sachs BDC, Inc. This investment company focuses investments on the mid-sized companies, which usually reside in the United States (Goldmansachsbdc.com, n.d.). Figure 32 shows the negative correlation of -0.355, which is a low correlation between datasets. Based on the coefficient sign, there may be negligible opposing effects from one value to another.

28

Ethereum Classic Prediction (dep) vs. GSBD Open (ind) 30

GSBD Open

25 20 15 10 5 0 0

5

10

15

20

25

Ethereum Classic Prediction

Figure 32: Ethereum Classic Prediction vs. GSBD Open Scatterplot

Over time GSBD has been a stable stock that is slowly increasing with little variation in open values. Alternatively, the Ethereum Classic predictions are decreasing with significant variation over time as they cyclically rise and fall.

Ethereum Classic Prediction (dep) vs. GSBD Open (ind) y = -0.0028x + 138.54 R² = 0.0903

30 25 20 15

Ethereum Classic Prediction

gsbd Open

Linear (Ethereum Classic Prediction)

Linear (gsbd Open )

11/7/2017

10/31/2017

10/24/2017

10/17/2017

10/10/2017

10/3/2017

9/26/2017

9/19/2017

9/12/2017

9/5/2017

8/22/2017

8/15/2017

8/8/2017

8/1/2017

5

8/29/2017

y = -0.0584x + 2525 R² = 0.3953

10

Figure 33: Ethereum Classic Prediction vs GSBD Open Time Series Analysis

Based on the research, there doesn’t seem to be a connecting factor between the two. Since there is not a known connection, it might explain the weak correlation.

29

5.2.3 Dash Predicted Values vs. Stratis Open Values Dash and Stratis are both cryptocurrencies, however they are competing on different strategies. Dash is similar to Bitcoin but with enhanced privacy. Stratis’ aim is towards enhancing the feasibility of Blockchain-as-a-service (BaaS). Stratis’s BaaS has underlying features like Bitcoin’s but is a more user-friendly platform. (CryptosRUs, 2018) There is a negative low correlation of -0.354 between Dash predictions and Stratis open values shown using a trendline in Figure 34.

Dash Prediction (dep) vs. Stratis Open (ind) 800

Dash Prediction

700 600 500 400

y = -2322x + 691.98 R² = 0.1253

300 200 100 0 0

0.01

0.02

0.03

0.04

0.05

0.06

Stratis Open

Figure 34: Dash Prediction vs Stratis Open Scatterplot

Viewing Figure 35, both datasets are increasing over time but Stratis is more steadily increasing whereas Dash seems to be plateauing as the year ends. This dataset seems to more closely correlate towards the end of the year. The circled areas highlight where the two rise and fall contrary to each other. Taking a new correlation, starting at 9/1/17 instead of 8/20/17, increases the correlation coefficient by almost double from -0.365 to -0.676. This moved the correlation from low to a medium correlation.

Dash Prediction (dep) vs. Stratis Open (ind) y = -1E-10x + 3E-05x - 2.692x + 114896x - 2E+09x + 2E+13 4

3

2

R² = 0.9645

0.06

600

0.04

400 0.02

200 0 8/1/2017

y = 0.0004x - 15.142 R² = 0.6881 9/1/2017

10/1/2017

Stratis Open

Dash Prediction

5

800

0

11/1/2017

Dash Prediction

Stratis Open

Poly. (Dash Prediction)

Linear (Stratis Open)

Figure 35: Dash Prediction vs. Stratis Open (Highlighted)

30

There is not much information to provide causation for correlation, especially towards the end of the year. One possibility why they have a negative, opposing, correlation is due to their strategic differences. Based on the unpredictability of the market, it is possible that investors are switching from one currency to another based on change in market trends of what strategies are the most viable long term.

5.2.4 Prediction Analysis Summary For all completed analysis between predicted values and other datasets, there are minimal correlations. The only one worth noting is between the Stratis predicted value dataset and the IBM open dataset that reached a correlation of 0.591. This correlation stands out because it is the only one above the bar of low to no correlations at +/- 0.5 degrees. Based on the qualitative analysis of the two entities they share a commonality the others don’t. The cryptocurrency Stratis is also a company that is selling Blockchain-as-a-Service (BaaS). IBM is a competitor in the same market making the two datasets more closely related than the other sets. An honourable mention is the pairing of Dash predictions and Stratis open values. Upon initial review their correlation is low but nearly doubles to -0.676 when the correlation is narrowed closer to the end of the year. These two act opposingly, possibly due to the changing market view on what cryptocurrency strategy is most viable. Overall, no correlation in this section is high enough to definitively say that stocks or other cryptocurrencies directly impact the cryptocurrency market.

5.3 Sentiment Analysis For this analysis, we will be mainly focusing on the 3 major cryptocurrencies, Bitcoin, Ethereum and Ripple, with additional results on other cryptocurrencies. Those three were chosen because they are the most well-known, widely used and best documented currencies from our narrowed dataset. This sentiment analysis involved a few different stages to get an overall view of correlations between the open price and the total score of sentiment for two sentiment analysis types. First the open price against sentiment was graphed to visually review for correlations. First the open price against sentiment was graphed to visually review for correlations. The next stage involved looking at the open price and comparing it to the overall sentiment score of that day from both MeaningCloud and R sentiment analysis. Looking at this data, the correlations are insignificant because each of the cryptocurrencies values are below 0.5. This data was investigated further by only looking at 2017 because 2016 prices for all cryptocurrencies were very low. More accurate correlations may be shown as the currencies become more popular. Additionally, the scores were set to the previous day to see if there were any differences in the results, as the previous day’s sentiment may affect the next day’s price.

5.3.1 Bitcoin vs. Sentiment Bitcoin is the world’s first cryptocurrency and is now by far the biggest and most popular cryptocurrency with 39% of the total market according to coinmarketcap.com. As this is the most popular cryptocurrency it would be expected to have correlations with the other smaller currencies as they are usually purchased with Bitcoin. However, when comparing Bitcoin to sentiment it would be presumed that there would be a high correlation as it would be highly influenced by the market. First, open price against sentiment was reviewed in a graph to look for visual correlations. Shown in Figure 36 there seems to be little to no correlation between sentiment and price.

31

Figure 36: Bitcoin vs. Sentiment

This was then proven true in the analysis looking for correlations between sentiment and Bitcoin through the correlation formula in Excel. This gave correlation scores of 0.347854 for the MeaningCloud sentiment and 0.173723 for the R sentiment, for both years, resulting in a low correlation. When changing the dates to only 2017, there was an unexpected result, as the correlation dropped to 0.224041 for MeaningCloud but R sentiment increased to 0.2252.

5.3.2 Ethereum vs. Sentiment Ethereum would be considered the next biggest cryptocurrency. It was thought that it may not have the same influence as Bitcoin, but may still be correlated with sentiment. When comparing the sentiment to the open price in Figure 37, there is a similar trend where negative sentiment outweighs positive when there is already a large decline in price.

Figure 37: Ethereum vs. Sentiment

32

Next when looking at the Excel correlations there was little to no correlation as there is a value of 0.36513512 for the MeaningCloud sentiment and 0.055837071 for R. This was also the same result for just 2017 with scores of 0.247703643 and 0.062501101 which also resulted in little to no correlation.

5.3.3 Ripple vs. Sentiment Ripple was the next coin analysed and this coin was slightly different. It is very popular with banks and financial institutions, in addition to having a very low price compared to other major coins. Figure 38 would indicate that there may be more correlation compared to previous graphs where there is a dominant negative sentiment during the decline of price at the end of May 2017.

Figure 38: Ripple vs. Sentiment

Looking at the Excel correlations, similar to previous analyses, there is little correlation between sentiment and open price as the values are 0.357703271 and -0.01097257. For only 2017, these results were 0.237766255 and -0.029232142. Interestingly, there is a negative correlation, but the results were insignificant.

5.3.4 Other Cryptocurrencies vs. Sentiment Below are the correlation coefficient values, ordered by average correlation, of the other cryptocurrencies that were analysed. Figure 39 shows, the trend continued with low correlations between the price and sentiment.

Figure 39: Other Cryptocurrency Correlations

33

Shown in Figure 40, there is little variance in range between significant values for all cryptocurrencies studied. The R sentiment comparison lead to a lower average significance than the MeaningCloud sentiment and this may be due to it being a more robust text analysis tool.

Figure 40: Other Sentiment Values

In addition to comparing the open price to sentiment, other areas were investigated to get a better understanding of the relationship sentiment has with a cryptocurrency.

5.3.5 Sentiment Analysis Summary From performing this analysis, there is only a small value of significance when comparing the sentiment analysis of influential Twitter entities to the open price of cryptocurrencies using mathematical correlation. There are some similarities in the graphical analysis. While this result may have been different from our original hypothesis that social media sentiment is influential to the price of a cryptocurrency, further research is required for complete validation.

5.4 Project Analysis Summary and Future Endeavours To answer the question of, “What influences cryptocurrencies?”, it was divided into three main areas of analysis: cryptocurrency open values, cryptocurrency predicted values and Twitter sentiment. Cryptocurrency open values against other datasets yielded the highest correlations, next was the predicted values and finally sentiment analysis. Based on the degree of correlations, it seems that cryptocurrency open values may be influenced by stocks and other cryptocurrencies. However, based on the current analysis, predictions are not influenced by other sources and cryptocurrencies are not influenced by Twitter sentiment. Figure 41 is the order of correlations, from highest to lowest among all analysed datasets. It shows that the highest correlations come from open to open, then open to prediction followed by prediction to open with sentiment ranking lowest.

34

Correlation Dep Variable and Type Ranking 1 Stratis-Open 2 Dash-Open 3 Ethereum Classic-Open 4 Stratis-Open 5 Stratis-Open 6 Stratis-Prediction 7 Ethereum Classic-Prediction 8 Dash-Prediction 9 Dash-Open 10 Monero-Prediction 11 Stratis-Prediction 12 Bitcoin 13 Monero 14 Dash 15 Litecoin 16 Neo 17 Ethereum 18 Nem 19 Waves 20 Ripple 21 Bitcoin-Open 22 Stratis 23 Eth Classic

Ind Variable and Type Bitcoin-Open Stratis-Open GSBD-Open IBM-Prediction Bitcoin-Prediction IBM-Open GSBD-Open Stratis-Open Stratis-Prediction Tableau-Open IBM-Smooth Error Sentiment Sentiment Sentiment Sentiment Sentiment Sentiment Sentiment Sentiment Sentiment IBM-Prediction Sentiment Sentiment

Correlation Coefficient 0.897 -0.798 -0.781 0.701 0.656 0.591 -0.355 -0.354 0.321 -0.318 -0.295 0.261 0.233 0.231 0.226 0.222 0.21 0.191 0.182 0.173 0.13 0.125 0.117

Figure 41: Correlation Rankings

An overall observation of the analysis shows that if the dependent variable is type open and correlated to an independent variable, the same is true for dependent variables of type prediction. This is shown by looking at ranking 4 and 6 in Figure 41. Stratis versus IBM showed up as a medium to medium-high correlation in both analysis categories. Through qualitative analysis, causation may be due to both IBM and Stratis operating in similar markets, BaaS. Sentiment analysis, on the other hand, has a very low correlation coefficient with all the cryptocurrencies with the highest average value being sentiment versus Bitcoin with a score of 0.261. In total, it does seem that there is a relationship between cryptocurrencies regarding other cryptocurrencies and stocks but not regarding Twitter sentiment. This is not a definitive answer to the initial question because more rigorous research is needed to fully understand cause and effect over time. If this project were to be completed again limitations and ideas for the future derived from this project should be acknowledged. The main limitations encountered were time, knowledge and money. Taking such a large dataset of stocks, currencies and Twitter from the beginning of the project, time was limited to complete an indepth analysis on each dataset. Instead, much of the analysis time was used to narrow what datasets should be reviewed more rigorously. Time was also used in creating a domain knowledge of 35

cryptocurrencies, stocks and statistical tools. Even after completing this research there is still much left unknown. More experts could have been consulted in these areas of research. As for money, datasets retrieved were from free sources however there were other datasets that may have been better but cost money to use. Additionally, there was statistical software that would have been easier to use and understand but purchases were needed to utilize those resources. To be exact, XLMiner’s variable selection tool would have been more ideal than using R’s subset regression codes. Overall the main limitation of this project was that the scope was too large. A more detailed analysis could have been completed if the project narrowed to one type of analysis instead of the 3 covered, open, predictions and sentiment. Future project suggestions include alternative choices of comparison for the cryptocurrency. Areas that would have be interesting to explore but were not due to time constraints were in regard to fiat currencies, the S&P 500 and sentiment. Since cryptocurrency is a digital currency, it would have been interesting to analyse them against different world currencies to see if there are similar fluctuations. Another interesting comparison to complete, based on research, would have been to compare currencies to the stock S&P 500 in regard to the daily change rate of the currency or open values. From the perspective of sentiment, a more robust R model through using a larger dictionary and varying packages may produce higher correlations. An additional strategy would be to choose hashtags related to cryptocurrency instead of specific influencers for comparison. If this project is to be researched further, one area should be chosen instead of the many seen in this project Overall, this project was able to create an understanding of correlations through analysis. Then the information was presented to stakeholders in this report. Throughout the project a variety of processes and resources were utilized (as outlined in section 4), these were used to determine key findings that can be explored in future projects. Based on our criteria (outlined in section 2.2) this project was successful.

36

Appendices The following sections of information are to supplement the report and are considered non-essential to the outcome of the project. This section will include suggestions for future researchers in regard to the covered subject matter, with limitation that were found throughout the project lifecycle. This section will also include analytics that were removed from the initial report because they were nonessential findings.

A. Skills Learned From this project key skills learned and applied were time management, narrowing large sets of data to most relevant and coordination of tasked based on skills. When first starting this project, the topic selected was broad enough that a large volume of data was available to retrieve. Slowly throughout the project the questions were asked between group members of best approach to remove data to review most relevant. From this, a system data narrowing was learned throughout the group. Tasks like theses also helped gain insight on what group members were most skilled at. When dividing work these skills were used to assign the best member to the task. Throughout the project each of the members were expected to complete these tasks in a timely manner, however, by the end of the project there was a realisation that working in college at the same time was more productive than working individually. Also, the project timelines first set by the group were deemed unrealistic. Often tasked took longer than anticipated. Overall, these key skills learned became valuable to the completion of the project and will serve the members in future careers.

B. Additional Analytics This section is to provide analysis covered in the project that were not paramount to our final project therefore removed from the initial report.

B.1 Visualisations Year-over-year growth analysis was performed on both the stock and cryptocurrency data using Tableau. In Tableau, the average of the closing prices was chosen for analysis. The data for both was filtered by average closing price and name. The year-over-year growth analysis was shown for each month. Data was taken from the dates 1/1/2016 to 11/7/2016. The stocks that were chosen were based on the bottom 12 and top 13 stocks according to the lowest and highest prediction errors. Year-over-year growth is calculated by subtracting the average closing price for a month in 2016 from the average closing price for the same month in 2017. Then, that answer is divided by the average closing price from 2016 and that is multiplied by 100. The cryptocurrencies that were chosen were 11 of the most well-known cryptocurrencies. Year-over-year growth analysis is useful because it facilitates the comparison of each of the data sets. Year-over-year growth will show us the percent change in the prices of various cryptocurrencies and also stocks on the stock market and is useful in helping to visualise growth patterns and trends with our data between the years of 2016 and 2017. These analyses will be useful to our project because they will allow us to see if there are any sorts of trends or relationships between the different markets in this two-year period. After these two separate analyses are created, we will then use Tableau again to compare both the stock data and the cryptocurrency data together.

37

Figure 42: Average Stock Price Difference

38

Figure 43: Cryptocurrency Price Difference

B.2 Open Bitcoin vs. Prediction IBM Figure 44 below shows the scatterplot between the open price of Bitcoin with the prediction prices of IBM stock.

Bitcoin Open (dep) vs. IBM Prediction (ind) 250

IBM Prediction

200 150 100

y = 0.0113x + 149.69 R² = 0.0169

50 0 0

100

200

300

400

500

600

700

800

900

Bitcoin Open

Figure 44: Bitcoin Open vs. IBM Prediction Scatterplot

The R2 value is very low (.0169), which means that most of the data does not fall close to the linear regression line. However, when there is more data, R2 gets lower. Similarly, the correlation coefficient is 0.13014095, which is very low and therefore means that it is not likely that there is a relationship between the two variables.

39

Figure 45 below shows the time series analysis for the open price of Bitcoin and the prediction price of IBM.

900 800 700 600 500 400 300 200 100 0

250 200 150 100 50

IBM Prediction Value

Bitcoin Open Value

Bitcoin Open (dep) vs. IBM Prediction (ind)

0

Openbitcoin Predictionibm Linear (Openbitcoin) Linear (Predictionibm)

Date Figure 45: Bitcoin Open vs. IBM Prediction Time Series Analysis

This graph shows that both Bitcoin Open and IBM Prediction have increased. These two variables don’t seem to have any similar trends based on the graph. The lack of correlation between Bitcoin open prices and IBM prediction prices makes sense. There isn’t a relationship because IBM offers blockchain-as-a-services, and Bitcoin does not offer BaaS.

B.3 Stratis Predicted Values vs IBM Smooth Error Due to the previous correlation between Stratis predicted values and IBM open values it would seem there should be other connections between Stratis predictions and IBM smooth errors. The smooth error is a direct result of the IBM predicted value which are created by the open values. Preliminary research notes that there are higher error values where there is a large increase or decrease in predicted or actual open values. One weak connection found between Stratis predictions and IBM was IBM’s smooth error values. The correlation coefficient is -0.295 which implies a weak, negative correlation between the two. This can be visually observed in Figure 46 with the trend line pointing down to the left for the negative correlation and the loose fit of data to the line implying a weak connection between the two datasets.

40

Stratis Prediction (dep) vs. IBM Smooth Error (ind) 10 y = -0.3438x + 3.1692 R² = 0.0872

Smooth Error IBM

8 6 4 2 0 0

2

4

-2

6

8

10

12

Stratis Prediction

Figure 46: Stratis Prediction vs. IBM Smooth Error Scatterplot

The negative correlation means that as one dataset increases the other decrease. This was originally noted in the comparison to the IBM open value dataset. The areas where the opposing forces are greatest have been circled in the below Figure 47. Visually it seems that this opposing pattern is stronger near the end of the year. Therefore, another correlation test was taken from 9/16/17 to the end of the year and the new coefficient is -0.442, which is 1.5 times more than the original coefficient. This proves there is a higher correlation towards the end of the year than throughout the whole year.

Stratis Prediction (dep) vs. IBM Smooth Error (ind) 12 10 8 6 4 2 0

Stratis Prediction

Smooth Error IBM

Linear (Stratis Prediction)

Linear (Smooth Error IBM)

Figure 47: Stratis Prediction vs. IBM Smooth Error (Highlighted)

Overall there is a weak connection between Stratis predictions and IBM smooth errors, but that correlation increased as the year progressed. The connection between the two is of opposition, as one increases the other decreases. The causation behind these stems from the open predictions oscillations creating oscillations in the errors.

41

B.4 Monero Predicted Values vs Tableau Open Values Monero’s cryptocurrency foundation lies in its use of the CryptoNight algorithm created to add various privacy features for the user. Currently this coin is quickly growing in popularity (CryptosRUs, 2018). Tableau is popular business intelligence tool to provide easy visualizations of their data to increase understanding. There is a very low, negative, correlation between Monero predictions and Tableau open values. The correlation coefficient for them is -0.103 which is the lowest of all the other datasets that were analysed. Figure 48 below shows the correlations and downward trendline.

Monero Prediction (dep) vs. Tableau Smooth Error (ind) Smooth Error Tableau

12 y = -0.004x + 1.6887 R² = 0.0107

10 8 6 4 2 0 0

50

100

150

200

250

Monero Prediction

Figure 48: Monero Prediction vs. Tableau Smooth Error Scatterplot

One reason for the low correlation could be the high variability of the smooth error for tableau. As shown in Figure 49 the variations oscillate between about 0% and 8% error over time whereas there is much less variation shown for the prediction data.

250 200 150 100 50 0

12 10 8 6 4 2 0

Monero Prediction

Smooth Error Tableau

Expon. (Monero Prediction )

Linear (Smooth Error Tableau)

Smooth Error Tableau

Monero Prediction

Monero Prediction (dep) vs. Tableau Smooth Error (ind)

Figure 49: Monero Prediction vs. Tableau Smooth Error Time Series Analysis

42

Based on qualitative and quantitative analysis of the two datasets, there is low correlation between Monero predictions and Tableau open values.

C. Other Sentiment Findings C.1 Daily Change First, we looked at how sentiment affects the daily change of price of Bitcoin. This would lead to a better indication of how the price changes each day and to look for correlations with sentiment. In Figure 50 below, we can see that there is negative sentiment appearing when the prices start to drop, and this would indicate that when there is negative sentiment the prices are starting to drop. This may be investors panicking about the prices dropping and wanting to sell or there may be negative news that is affecting the price. However, this cannot be seen as largely significant as there are still many drops with positive sentiment.

Figure 50: Bitcoin Daily Change vs. Sentiment

C.2 Smooth Error Additionally, we compared the smooth error rate of our Bitcoin predictions with sentiment. Similar to previous analyses, Figure 51 below shows that the negative sentiment is strongest when the lines start to decline.

43

Figure 51: Bitcoin Smooth Error vs. Sentiment

D. Code Used This section houses examples of the coding used throughout the report.

D.1 GLM and Sub-Setting Variables Code This code is an example of what was used to load the data into RStudio then complete GLMs and regression sub-setting for variables. install.packages("leaps") install.packages("MASS") require(leaps) require(MASS) AllDataRUpdated