Getting the data and selecting coding language
It’s not the most exciting part, but you need the data to analyze it :) Cheapest (and sometimes fastest) way is to get them yourself. Exchanges have APIs and popular languages like python often have open source packages on github allowing to download what you need.
Nevertheless some important code snippets you can easily download - there is always part where you need to make your hands dirty. For example I wanted to aggregate data to seconds, while I was downloading them aggregated to milliseconds. Doubt I will used higher then seconds resolution and also storage can exceed what is available to me.
For initial analysis decided to get 1 year BTC-USDT trades from Binance. To avoid abusing Binance and blocking my IP I did set up some reasonable interruptions in data download. So overall it took over 24h to get 240 mb tgz file of over 1 year trades. 16 mln rows in total.
In parallel I’m also getting data from Bittrex since Nov 2017 , which are more detailed and also include order books, however this requires constant repetitive downloading of data and sometimes get some gaps (due to the exchange API or me not seeing there was and error sometimes...) Those data maybe I will use later if ordinary trades will be not sufficient to forecast anything.
Next chapter is going to be about initial analysis of the data and why I find it a necessary step.