Python vs R vs SAS - Which is the best Tool in Big Data?
Python, SAS and R are the three main languages used in Big Data. It’s this small universe of languages which lends itself to the classic question that haunts nearly all software languages and frameworks - which is the best? The answer tends to be, well... it depends. So, not wanting to pre-empt any use cases, we thought we’d present a simple comparison between all three.
SAS, for example, was developed by North Carolina State University from 1966 - 1976 (over sixty years ago!) and was initially used to analyse agricultural data and improve crop yields. What might seem a weird quirk of history has meant that SAS, unlike Python and R its open source counterparts which were founded in circa 1995, is a proprietary piece of software. This has cost implications which we’ll get on to later, but it also has significant developmental implications too.
SAS has been developed for enterprise use in mind. Prior to the 90s it was the only statistical software solution out there, and as such has had major take-up in more established industries over the years. It can be used via a simple GUI and doesn’t require prior programming knowledge to use. This probably makes it the quickest to learn (if you can afford the licence fee!).
R on the other hand is probably the least intuitive of the three; it’s a low-level coding language that requires a strong working knowledge of code, where even relatively simple statistical procedures require extensive lines. It is generally the preserve of research departments and statistician. Python sits squarely in the middle from an ease of use point of view. It’s without a doubt the most widely used language of the three, but this is because of its use in software engineering and web development - a high level language, it’s easy to pick up and use, but falls short of SAS GUI ease of use.
It’s worth mentioning here that one of the real strengths of Python and R, is that they are open source languages. There are some limitations, which we’ll touch on, but the open source aspect of these languages - Python in particular - is that there seems to be a package for nearly everything, and this ecosystem of packages has developed and continues to develop at an incredible rate, as data science matures.
On the data visualisation fronts for example, Python has matplotlib, seaborn and vispy which are powerful tools for graphical analysis. R has ggplot, lattice, ggvis amongst others. We’d argue that R provides the best out of the box data visualisation capability, and a combo we’ve often come across is Python used for data manipulation, and R used for the data vis.
Open-source ecosystems improve the breadth and capability of these languages, which often threatens to leave SAS a number of releases behind. Recently, in a bid to catch up with the AI and ML toolsets offered by Python and R, SAS pledged to invest $1 billion toward AI over the next three years. TensorFlow and Keras are now hugely popular Python libraries which provide deep learning capabilities, and KerasR has been released as an interface to the original Python package. It really isn’t an understatement to say that Python has become a phenomenon in the field of data analytics.
As we mentioned though, there are certain upsides to the enterprise offering of SAS. Whilst there might be developmental lags, SAS makes up for it with all round quality of the provision of its product. SAS practitioners have access to incredibly rigorous documentation and can count on 24/7 support for customer queries. There is also consistency across the toolset, whereas Python and R might have multiple competing packages or libraries, which might output different statistical results given the same input.
This consistency and support are incredibly important for larger corporates, in the insurance or finance industries, for example, who rely on predictable and unified analysis in order to make fully auditable decisions. Users of R and Python, on the other hand, tend to have to rely on Google or active communities to field queries!
In the UK and Europe at least, there is significant demand for data scientists who are conversant in these technologies. Data driven decision making has gained traction beyond traditional corporates and tech giants, and has now become the preserve of various start-ups and less traditionally data rich companies. Whilst SAS is predominant at these more established corporates, partly owing to vendor lock-in, but also strength of offering, more lean organisations have opted for Python - a cheaper and just as powerful alternative - which has synergies with software and web development, more generally.
Browse Our Latest Tech RolesCurrent Vacancies
Simply provide us your contact details and we will be in touch
Empiric is a dynamic technology and transformation recruitment agency specialising in data, digital, cloud, security and transformation. We supply technology and change recruitment services to businesses looking for both contract and permanent professionals.
Empiric are committed to changing the gender and diversity imbalance within the technology sector. In addition to Next Tech Girls we proactively target skilled professionals from minority groups which in turn can help you meet your own diversity commitments. Our active investment within the tech community allows us to engage with specific talent pools and deliver a short list of relevant and diverse candidates.
For more information contact
02036757777To view our latest job opportunities click here.