A Glimpse into The Daily Life of a Data Scientist

A couple of weeks ago, I had a discussion with a co-worker regarding a project I was involved in, I felt that there was no clear understanding of the daily challenges data scientists face. Few days later, I was at Rstudio::Conf 2017 where I met lots of data scientists from academia and industry. Later on, I described one of the conference’s positive side effects as “group therapy”, where one could see how others face the same challenges and struggle with similar issues. [Read More]

Yet Another Post on Logistic Regression

Everyday statisticians, analysts and data enthusiasts perform data analysis for different purposes. But when it comes to presenting analyses to wider audience, the good work is not the complex one with big words. It is the one that highlights interesting relations, answers business questions or predict outcomes, and explain all that in the simplest way through data visualization or simple concepts. So if one throws numbers, model coefficients and complex graphs to impress the audience, it might fireback if the audience are not familiar with a certain concept. [Read More]

The Power of (purrr, tidy, broom)-Exploring Climate Change Trends

Few days ago, I wanted to explore the Climate Change: Earth Surface Temperature Data dataset published on Kaggle and originally compiled by Berkeley Earth. The dataset is relatively large as it contains entries from 1750-2014! This was shortly after watching Hadley Wickham’s talk about managing many models with R. So I thought about using the power of purrr,tidy and broom to handle the climate change dataset and I decided to focus on the change in the average temprature in the 100 pre-selected major citis in the dataset. [Read More]

Lessons Learnt About Data Viz - Why a Boxplot Is Sometimes The Worst Choice?

Data visualization is a means of visual communication that should help people understand the significance of data easily and see interesting trends, patterns, distributions,..etc. If your audience fails to grasp the message that was intended to be conveyed by the graph, they are not to be blamed. You are! or to be precise, your choice of the graphical representation of the data. I knew all that, and I used to spend time thinking about the best chart to convey a certain message or to highlight an interesting behavior. [Read More]

R googleVis Line Motion Charts with Modified Options

Using googleVis via R provides lots of options to create nice google visualizations. I was trying to create some charts while exploring the Annual Nominal Fish Catches Data on Kaggle. I wanted to create a line motion chart and exclude the default bubble chart. So I played with the options to get the desired result. The following is a quick explanation of how to do that. Fish Catches Dataset The dataset provides the annual TLW (tonnes live weight) catches of fish and shellfish in the Northeast Atlantic region. [Read More]

Leverage and Influence in a Nutshell

In regression models, we frequently face the situation where we need to look at outliers and influential observations. We know that a common practice is to perform diagnostics checks to dig deeper and see how different points affect the fitted model or its coeffecients. In this post, we will focus on two concepts (leverage and influence), but we will not dig deep into the math behind them. We will try to visualize and catch the intuition behind them first. [Read More]

A shout Out to R bloggers

Since I started to work with R, I became a frequent visitor to R-bloggers web site where I find a variety of helpful tips and tutorials. Now, as I started my own blog, it is time to give a shout-out to them!