A weekly social data project in R
Over the past month or so, the r4ds online learning community founded by Jesse Maegan has been developing projects intended to help connect mentors and learners. One of the first projects born out of this collaboration is #TidyTuesday, a weekly social data project focused on using
tidyverse packages to clean, wrangle, tidy, and plot a new dataset every Tuesday.
If you are interested in joining the r4ds online learning community check out Jesse Maegan’s post here!
Every Monday we will release a new dataset on our GitHub that has been tamed, but does not always adhere to “tidy” data principles. This dataset will come from an article with an interesting plot. Our goal is to have you take a look at the raw data, and generate either a copy of the original plot or a novel take on the data! You can obviously use whatever techniques you feel are appropriate, but the data will be organized in a way that
tidyverse tools will work well!
tidyverse is an “opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.” The
tidyverse is at the core of the the R for Data Science text written by Garrett Grolemund and Hadley Wickham. This book is aimed to be beginner-friendly but also deep enough to empower R experts as well. The framework of both the book and the
tidyverse package is seen above.
We focus on the
tidyverse package as the r4ds online learning community was founded “with the goal of creating a supportive and responsive online space for learners and mentors to gather and work through the R for Data Science text”. Beyond that, the
tidyverse is consistent, powerful, and typically more beginner friendly. It is a good framework to get started with, and is complementary to base R (or the 1000s of other R packages).
To participate in TidyTuesday, you need to do a few things:
#TidyTuesdayhashtag (you can also tag me @thomas_mock)
However, that might seem like a lot! So at minimum please submit your plot with the hashtag
All data will be posted on the data sets page on Monday. It will include the link to the original article (for context) and to the data set.
If you want to work on GitHub (a useful data science skill) feel free to post your code on GitHub! This will allow others to see and use your code, whereas an image of the code means they would have to re-type everything! Additionally, hosting on GitHub gives you a Data Science Portfolio to talk about/show in interviews, and allows you to access your code across different computers easily!
You can also upload your code into Carbon, a website the generates a high-quality image of your code.
Lastly, if you create your plot with the
tidyverse you can save high quality
We welcome all newcomers, enthusiasts, and experts to participate, but be mindful of a few things:
Everyone did such a great job! I’m posting all the ones that I can find through the hashtag, you can always tag me in your post to make sure you get noticed in the future.
If you have an apple and I have an apple and we exchange these apples then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.— Thomas Mock (@thomas_mock) April 11, 2018
— George Bernard Shaw#TidyTuesday - spreading ideas!
I plotted the costs for the last 5 years in the data pic.twitter.com/jNQwHI1mqu— Umair Durrani (@umairdurrani87) April 2, 2018
@srini_meen) April 3, 2018
#TidyTuesday Prices always go up, but if you compare it to the annual average then interesting things happen. Something happened in Arizona, Ohio, Hawai.— Son M (@SonGeo) April 3, 2018
code: https://t.co/xJ5kD185Os pic.twitter.com/i4BMDorq3c
Just having a little bit of R fun this Tuesday. Found this #TidyTuesday and thought I could give my contribution. I gather() and summarise() all the Year variables though…makes a different result.— Brenborbs (@brenborbon) April 3, 2018
Thanks @thomas_mock for this good idea. #rstats pic.twitter.com/q54fI9LZRl