Python vs R

Blake Tolman
3 min readApr 2, 2021

In the world of working with data there are a couple different ways to handle the data and perform the necessary analysis on them. The two most common coding languages seen used are Python and R, but what is the difference? In terms of Data Science both can be used to perform statistical analysis and creation of data visualizations, but the differences are seen in specific functionality.

Python:

Python can pretty much do the same tasks as R: data wrangling, engineering, feature selection web scrapping, app and so on. Python is a tool to deploy and implement machine learning at a large-scale. Python codes are easier to maintain and more robust than R. Years ago; Python didn’t have many data analysis and machine learning libraries. Recently, Python is catching up and provides cutting-edge API for machine learning or Artificial Intelligence. Most of the data science job can be done with five Python libraries: Numpy, Pandas, Scipy, Scikit-learn and Seaborn.

R:

Academics and statisticians have developed R over two decades. R has now one of the richest ecosystems to perform data analysis. There are around 12000 packages available in CRAN (open-source repository). It is possible to find a library for whatever the analysis you want to perform. The rich variety of library makes R the first choice for statistical analysis, especially for specialized analytical work.

Data Analysis vs Machine Learning:

One major difference in the utilities of Python and R is that the former is an extremely versatile language, compared to the later. Python is a full-fledged programming language, which means you can collect, store, analyze, and visualize data, while also creating and deploying Machine Learning pipelines into production or on websites, all using just Python. On the other hand, R is purely for statistics and data analysis, with graphs that are nicer and more customizable than those in Python. R uses the Grammar of Graphics approach to visualizing data in its #ggPlot2 library and this provides a great deal of intuitive customizability which Python lacks. Perhaps a little oversimplified, but it may be justified to say that if you want to be a Data Analyst R should be your preferred choice, while if you want to be a Data Scientist Python is the better option. It’s the dilemma of generalization vs. specialization.

Conclusion:

Even though they seem to offer different things, both of the languages have advantages and disadvantages that needs careful understanding.

  • If you are looking to get into programming in general and looking for something that may be used in other areas of software development such as web development then Python, being a general-purpose programming language, seems to be a better choice.
  • If you are familiar with other scientific programming languages like MATLAB, it might be easier for you to learn R and get productive with it. There are many similarities between those languages, especially with vector operations and the general mindset about matrix operations rather than procedural methods.
  • If you need to do ad-hoc analysis and occasionally share them with other data scientists / technical people, it might be good to use Python along with Jupyter Notebooks. If you are looking for ways to build quick dashboards for non-technical stakeholders and internal usage, it might be a good idea to utilize R with the amazing Shiny library.

--

--