Crossing 10,000 - Tidy simulation

Tidy simulation with dice rolls

tidyverse
purrr
simulation
probability
Author

Thomas Mock

Published

April 3, 2020

10,000 - a game of chance

Been following Dave Robinson’s Tidy Simulation Series in #rstatsand decided to try and give it a go.

We play a dice game called “10,000” in my wife’s family. The family of games also includes Ten Grand, Farkle, Zilch, and others according to Wikipedia - with the common thread being that you have 5-6 dice, roll them, and add up scoring dice until you stop with a set score or “zippy”, which is a roll with zero scoring die.

Basic gist: - Roll 5 dice - If you “score” with at least one die, you can stay or keep rolling - If you roll zero scoring die, you lose your turn

A basic scoring table can be seen below.

Warning: `columns = TRUE` has been deprecated in gt 0.3.0:
* please use `columns = everything()` instead
Scoring Table for '10,000'
Roll Score

Each 1

100

Each 5

50

3 x 1

1,000

4 x 1

5,000

5 x 1

10,000

3 x n

n * 100

4 x n

n * 200

5 x n

n * 300

Short Straight (1:5, 2:6)

1000

You add all the scores up so rolling 5 die for example:
You could get 3 x 1 (1000), a 4 (0), and a 5 (50
This gets you 1050, with 1 die in hand (1x 4 doesn't score)

We played 10,000 on New Year’s Day, and we had a disagreement about if you have 2/5 scoring die whether to roll 3 or 4.

I’ve literally never done probability math EVER… but I felt that it was more effective to roll 4 than 3.

So I attempted to simulate rolls in #rstats!

Simulating 10,000 rolls

So we need to create a dataset of possible rolls, and how common these are when randomly created. We’ll set a seed so that we can reproduce the code, and for puns sake we’ll roll the dice 10,000 times to match our game’s title.

Importantly, we also will be breaking this down into 5x 10,000 samples which is: - Rolling 1 dice all the way to rolling 5 dice (max possible).

Generating this is fairly straightforward using tidyr::crossing() and purrr::map() + sample().

Crossing

tidyr::crossing() takes multiple vectors and creates a tibble of all the possible combinations. Thus we can use it to generate the number of dice in hand along with our 10,000 simulations per number of dice in hand. Our toy example below shows the expected behavior, and then we’ll assign the rolls to a stored dataframe.

# Toy example
# 1 to 5 dice in hand x 5 trials/per dice group = 25 observations
crossing(
  dice_in_hand = seq(1, 5, 1),
  trial = 1:5
)
# A tibble: 25 × 2
   dice_in_hand trial
          <dbl> <int>
 1            1     1
 2            1     2
 3            1     3
 4            1     4
 5            1     5
 6            2     1
 7            2     2
 8            2     3
 9            2     4
10            2     5
# … with 15 more rows

For, the actual dataframe we’ll do an additional step, which is creating a vector of rolls with random sampling based on how many dice are in hand. For our 1:5 dice in hand + 10,000 samples per dice in hand we get 50,000 simulations.

set.seed(37)
n_rolls <- 10000

# Generate a sample of rolls (10000 sims)
dice_rolls <- crossing(dice_in_hand = seq(1, 5, 1),
                       trial = 1:n_rolls) %>%
  mutate(roll = map(dice_in_hand, ~ sample(1:6, ., replace = TRUE)))

dice_rolls
# A tibble: 50,000 × 3
   dice_in_hand trial roll     
          <dbl> <int> <list>   
 1            1     1 <int [1]>
 2            1     2 <int [1]>
 3            1     3 <int [1]>
 4            1     4 <int [1]>
 5            1     5 <int [1]>
 6            1     6 <int [1]>
 7            1     7 <int [1]>
 8            1     8 <int [1]>
 9            1     9 <int [1]>
10            1    10 <int [1]>
# … with 49,990 more rows

Check rolls

We can take a look at a few of these simulations. In this case, I’ll grab a random trial and see what we got for each of the dice in hand (1, 2, 3, 4 or 5).

dice_rolls %>% 
  filter(trial == 3737) %>% 
  pull(roll)
[[1]]
[1] 3

[[2]]
[1] 2 1

[[3]]
[1] 3 3 3

[[4]]
[1] 6 5 2 3

[[5]]
[1] 4 1 6 5 3

Now that we have a sense of our 10,000 simulations and our possible outcomes we need to generate some logic for scoring! We could simply count observations, but that won’t tell us the whole story.

Counting & Scoring functions

I tried this several ways but ended up sticking with vectors. The goal was to have programmatic logic to convert raw dice rolls like 1, 1, 3, 4, 1 to 1,000 (3x 1 = 1,000, 3 and 4 do not score).

Step 1: Convert raw rolls to counts

We need to take the raw rolls (1, 1, 3, 4, 1) to counts of each number. I created a counting function which now tells us how many of each possible dice side (1, 2, 3, 4, 5, 6) there are.

counting_function <- function(dice_in){
  x1 <- sum(dice_in == 1)
  x2 <- sum(dice_in == 2)
  x3 <- sum(dice_in == 3)
  x4 <- sum(dice_in == 4)
  x5 <- sum(dice_in == 5)
  x6 <- sum(dice_in == 6)
  
  roll_vec <- c(x1, x2, x3, x4, x5, x6)
  roll_vec
}

A test example can be seen below. The output can be read as One 1, One 2, Three 3, Zero 4, Zero 5, and Zero 6. We can then feed this into our downstream functions to get scoring.

counting_function(c(1, 2, 3, 3, 3))
[1] 1 1 3 0 0 0

Step 2: Scoring Function

There are actually many steps to this, but the bulk of it is just logical matching potential outcomes to observations. Each type of scoring (number of ones, number of fives, short straights, triples, quadruplpes, and quintuples) has it’s own check, and then I add the scoring at the end based on the various combos/inputs. Using the same input as our previous example, we should get a total score of 400, which is One 1 at 100 points, and Three 3s at 100 * face value (100 x 3).

counting_function(c(1, 2, 3, 3, 3)) %>% 
  rigid_scoring_function()
[1] 400

Let’s do a few more checks for sanity. A short straight (1:5) should get you 1000, and then let’s try for a zero score with no scoring die.

# should be 1000
counting_function(c(5, 1, 3, 4, 2)) %>%
  rigid_scoring_function()
[1] 1000
# zero
counting_function(c(2, 2, 4, 4, 6)) %>%
  rigid_scoring_function()
[1] 0

The actual logic can be seen below.

# TOO much logic to get actual scores
rigid_scoring_function <- function(dice_in) {
  dice_side <- c(1:6)

  n_of_ones <- sum(dice_in[1])

  n_of_fives <- sum(dice_in[5])

  straight_one <- c(1, 1, 1, 1, 1, 0)
  straight_two <- c(0, 1, 1, 1, 1, 1)

  straight <- any(all(dice_in == straight_two), all(dice_in == straight_one))

  score_fives <- if_else(
    n_of_fives <= 2 & straight == FALSE,
    (n_of_fives * 50),
    0
  )

  score_ones <- case_when(
    n_of_ones <= 2 & straight == FALSE ~ (n_of_ones * 100),
    n_of_ones == 3 ~ 1000,
    n_of_ones == 4 ~ 5000,
    n_of_ones == 5 ~ 10000,
    TRUE ~ 0
  )

  three_or_more <- sum(dice_in >= 3)

  three_or_more_subset <- dice_in >= 3

  multiply_dups <- case_when(
    sum(dice_in == 3) == 1 & n_of_ones < 3 ~ 1L,
    sum(dice_in == 4) == 1 & n_of_ones < 3 ~ 2L,
    sum(dice_in == 5) == 1 & n_of_ones < 3 ~ 3L,
    TRUE ~ 0L
  )

  multi_score <- ifelse(
    three_or_more == TRUE,
    dice_side[three_or_more_subset],
    0
  ) * 100 * multiply_dups


  straight_score <- if_else(straight == TRUE, 1000, 0)

  three_or_more <- any(dice_in >= 3)

  total_score <- sum(score_ones + multi_score + straight_score + score_fives, na.rm = TRUE)
  total_score
}

Get scoring from the sample

Generating the scores is actually extremely simple now, the hard part was defining the logic above!

We use purrr::map() to generate a vector of the rolls using our counting_function(), and then get a score using purrr::map_dbl() to call our rigid_scoring_function() on each vector input of the die rolls! This will take about 15 seconds for calculating scores across the 50,000 total observations.

# Generate the actual scoring from existing sample
dice_df <- dice_rolls %>%
  mutate(
    dice_vector = map(roll, counting_function),
    rigid_scoring = map_dbl(dice_vector, rigid_scoring_function)
  )

# sanity check the scoring to get a few different scores
dice_df %>%
  select(dice_in_hand, rigid_scoring) %>%
  sample_n(10)
# A tibble: 10 × 2
   dice_in_hand rigid_scoring
          <dbl>         <dbl>
 1            2             0
 2            4            50
 3            4           200
 4            5           200
 5            5          1000
 6            5           150
 7            4             0
 8            5           250
 9            3            50
10            4           200

Plotting

Now that we have the outcome scoring, we can see how the probabilities stack up! Our thesis was that rolling 4x die was more likely to get you both a better score AND less likely to zippy out or get zero. We can see that rolling 4x die vs 3x die is about half as likely to get zero (15% vs 30%) and 2-3x more likely to get good scores (300 to 1,000).

Success! If you roll 5 die, get 2 scoring die, it makes sense to only keep 1 of those scoring dice and and roll the other 4!

Please note that I chose a specific set of scores (0, 50, 100, 150, 200, 300, 400, 500, 600, 1000) - as they are possible to get with either 3 or 4 die. There are scores in between 600 - 1000 that can only be achieved with 4 - 5 die in hand so it didn’t make sense to plot them against 3 die.

# Quick plot
prob_plot <- dice_df %>%
  filter(dice_in_hand %in% c(3, 4)) %>%
  filter(rigid_scoring %in% c(0, 50, 100, 150, 200, 300, 400, 500, 600, 1000)) %>%
  count(rigid_scoring, dice_in_hand) %>%
  mutate(
    freq = n / n_rolls,
    dice_in_hand = factor(dice_in_hand)
  ) %>%
  ggplot(aes(x = rigid_scoring, y = freq, color = dice_in_hand)) +
  geom_path(aes(group = dice_in_hand)) +
  geom_point(size = 4) +
  scale_y_log10(labels = scales::percent, breaks = c(0.005, 0.01, 0.015, 0.03, 0.05, 0.075, 0.1, 0.15, 0.3)) +
  scale_color_manual(
    name = "Dice in Hand",
    values = c("#003399", "#ff2b4f", "#fcab27")
  ) +
  scale_x_continuous(breaks = c(0, 50, 100, 150, 200, 300, 400, 500, 600, 1000)) +
  theme_light() +
  theme(
    panel.grid.minor = element_blank(),
    legend.position = c(0.8, 0.8),
    plot.title = element_text(face = "bold"),
    axis.title = element_text(face = "bold")
  ) +
  labs(
    x = "\nScore",
    y = "Log10 Percentage Observed\n",
    title = "The dice game '10,000' consists of rolling up to 5 dice",
    subtitle = "Rolling 4 dice is 2-3x more likely to generate 'good' scores than 3 dice",
    caption = "Plot: @thomas_mock"
  )

prob_plot


We can finally also show the total observations for all possible combinations for rolling 3 or 4 die.

Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
* please use `columns = c(...)` instead

Warning: `columns = vars(...)` has been deprecated in gt 0.3.0:
* please use `columns = c(...)` instead
All possible scores and frequencies for 3-4 die in hand
Score Dice in Hand Observations Percent Observed
0 3 2709 27.09%
0 4 1588 15.88%
50 3 2169 21.69%
50 4 1824 18.24%
100 3 2868 28.68%
100 4 2559 25.59%
150 3 1107 11.07%
150 4 1514 15.14%
200 3 780 7.80%
200 4 1200 12.00%
250 3 150 1.50%
250 4 390 3.90%
300 3 34 0.34%
300 4 175 1.75%
350 4 32 0.32%
400 3 57 0.57%
400 4 129 1.29%
450 4 35 0.35%
500 3 49 0.49%
500 4 155 1.55%
600 3 45 0.45%
600 4 119 1.19%
650 4 39 0.39%
700 4 40 0.40%
800 4 10 0.10%
1000 3 32 0.32%
1000 4 147 1.47%
1050 4 26 0.26%
1200 4 8 0.08%
5000 4 10 0.10%
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.0 (2022-04-22)
 os       macOS Monterey 12.2.1
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/Chicago
 date     2022-04-28
 pandoc   2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown)
 quarto   0.9.294 @ /usr/local/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 dplyr       * 1.0.8      2022-02-08 [1] CRAN (R 4.2.0)
 forcats     * 0.5.1      2021-01-27 [1] CRAN (R 4.2.0)
 ggplot2     * 3.3.5      2021-06-25 [1] CRAN (R 4.2.0)
 gt          * 0.5.0.9000 2022-04-27 [1] Github (rstudio/gt@0d4c83d)
 purrr       * 0.3.4      2020-04-17 [1] CRAN (R 4.2.0)
 readr       * 2.1.2      2022-01-30 [1] CRAN (R 4.2.0)
 sessioninfo * 1.2.2      2021-12-06 [1] CRAN (R 4.2.0)
 stringr     * 1.4.0      2019-02-10 [1] CRAN (R 4.2.0)
 tibble      * 3.1.6      2021-11-07 [1] CRAN (R 4.2.0)
 tictoc      * 1.0.1      2021-04-19 [1] CRAN (R 4.2.0)
 tidyr       * 1.2.0      2022-02-01 [1] CRAN (R 4.2.0)
 tidyverse   * 1.3.1      2021-04-15 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────