The Necessity of Coding for Efficiency

During module 3 project I was constantly running different machine learning algorithms, running for loops for optimal parameters, and fine tuning them. I quickly realized why it is so import to code with efficiency for time sake. When, I initial started coding this did not occur to me, nor was I running anything more complex than addition. In the beginning how I coded was not that important. If I did an extravagant nested loop it never took more than a second to run because I was only dealing with small amounts of data, and I had not yet begun to do statistics with python yet. Learning about how to use certain library’s with efficiency like NumPy did not make much sense then because I could code any way I wanted and the difference in time was milliseconds.

Move along to this project and things began to change. As the math got more complex, I started to notice cells of code would take a couple seconds to load. It progressively got slower as I was working it harder. When using a data set with thousands of data points, you do not want to have to loop through it several times. I learned this one the hard way. I wanted to run a for loop for K-Nearest Neighbor to find the best K parameter. Once the I made the code and ran it, the cell took 52 minutes to run just to come back and give me k=5, the default parameter. A similar instance occurred with decision trees. When no parameters are specified, the tree takes 15 minutes to run and prints a picture so large you can’t even read it. Learning to write code concisely has helped me cut down on run times. For the decision tree, setting max depth parameters helped me not only improve the model but change the runtime from minutes down to seconds. At its worse, I tried creating and running a grid search on XGBoost. I had at least 3 parameters set for 4 different categories meaning I was trying to run over 200 different models. Needless to say, that code ran for 6 hours and never finished. I had to pull the plug on it. Its then you realize that maybe you should optimize how you are doing your wok.

From all this, its easy to see how when running complex math models how they are set up is important. It could mean the difference of saving you minutes to hours. Now I understand this is not always the case, as statistic functions become increasing intricate it will just always be computationally taxing, but even then there are methods to help things run more smoothly. I find that efficient code is also less writing and more aesthetically pleasing. Things as simple as one line for loops and lambda functions greatly increase efficiency. The efficiency of and individuals code is what I find to be a large distinguishable factor between novice and experienced coders.