Solution to Project Euler 14 in the R language: Which starting number, under one million, produces the longest Collatz sequence?

Project Euler 14: Longest Collatz sequence

Peter Prevos | 2 March 2017
Last Updated | 9 October 2021
832 words | 4 minutes

Share this content

Euler Problem 14 looks at the Collatz Conjecture. These playful sequences, named after German mathematician Lothar Collatz (1910–1990), cause mathematicians a lot of headaches. What is most interesting about this problem is how a simple set of rules can create intricate complexity.

Project Euler 14 Definition

The following iterative sequence is defined for the set of positive integers:

$f(n) = \begin{cases} \frac{n}{2} &\text{if } n \equiv 0 \pmod{2}\\[4px] 3n+1 & \text{if } n\equiv 1 \pmod{2} .\end{cases}$

Using the rule above, and starting with 13, we generate the following sequence:

$$13 \rightarrow 40 \rightarrow 20 \rightarrow 10 \rightarrow 5 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1 $$

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts, the terms are allowed to go above one million.

Uncrackable? The Collatz Conjecture - Numberphile.

Proposed solutions

This problem is highly computationally intensive, and it highlights R's lack of speed. Generating one million Collatz sequences and finding the longest one requires a lot more than a minute of processing time allowed for in Project Euler.

Brute Force

The brute force solution tests every answer and takes a long time to run.

  ## Project Euler 14: Longest Collatz Sequence
  ## Brute Force
  collatz.chain <- function(n) {
      chain <- vector()
      i <- 1
      while (n != 1) {
          if (n%%2 == 0)
              n <- n / 2
          else
          n <- 3 * n + 1
          chain[i] <- n
          i <- i + 1
      }
      return(chain)
  }
  answer <- 0
  collatz.max <- 0
  for (n in 1:1E6) {
      collatz.length <- length(collatz.chain(n))
      if (collatz.length > collatz.max) {
          answer <- n
          collatz.max <- collatz.length
      }
  }
  print(answer)

Optimised solution

The optimised version of the code stores the length of all sequences in an array. When the code generates a sequence and lands on a number already analysed, then it adds that previous number to the current one and moves on to the next step. This approach requires more memory but saves a lot of computation time. A minor tweak to the code optimises the rule for uneven numbers. Tripling an uneven number and adding one will always result in an even number so we can skip one step. This solution is more than twice as fast as the first version.

  ## Optimised method
  collatz.length <- vector(length = 1E6)
  collatz.length[1] <- 0
  for (n in 2:1E6) {
      x <- n
      count <- 0 
      while (x != 1 & x >= n) {
          if (x %% 2 == 0) {
              x <- x / 2
              count <- count + 1
          }
          else {
              x <- (3 * x + 1) / 2
              count <- count + 2
          }
      }
      count <- count + collatz.length[x]
      collatz.length[n] <- count
  }
  answer <- which.max(collatz.length)
  print(answer)

Visualising Collatz Sequences

The Collatz sequence is an example of a simple mathematical rule that can create an unpredictable pattern. The number of steps required to reach one is listed in sequence A006577 of the Online Encyclopedia of Integer Sequences. The image below visualises the number of steps for the first 1000 positive numbers. The scatterplot shows some unusual patterns. Does this visualisation show that the Collatz Sequence does have a pattern after all?

Number of steps to terminate a Collatz sequence.

  ## Visualise
  library(ggplot2)
  collatz <- data.frame(n = 1:10000,
                        steps = collatz.length[1:10000])
  ggplot(collatz, aes(n, steps)) +
    geom_point(size = .5, col = "darkblue") +
    theme_minimal(base_size = 10) +
    labs(title = "Collatz Sequence",
         subtitle = "Number of steps to reach 1")
  ggsave("../../../../static/images/project-euler/problem-014a.png", width = 6, height = 4)

Collatz Chains

The Collatz sequences can also be visualised using networks. Each step between two numbers is an edge, and the numbers are the vertices. For example, the network for the Collatz sequence for number 10 is 5–16, 16–8, 8–4, 4–2, 2–1. When generating subsequent sequences, the network will start to overlap, and a tree of sequences appears. The tree below combines the Collatz sequences for the numbers 2 to 26. Number 27 has a very long sequence, making the tree much harder to read. The code to create this network uses the igraph package. The edge list is a data frame where each row indicates a connected set of numbers.

Collatz network 1-26. — Collatz network $(n \leq 26)$.

## Collatz Chains
library(igraph)
edgelist <- data.frame(a = 2, b = 1)
for (n in 3:26) {
   chain <- as.character(c(n, collatz.chain(n)))
   chain <- data.frame(a = chain[-length(chain)], b = chain[-1])
   edgelist <- rbind(edgelist, chain)
}
g <- graph.edgelist(as.matrix(edgelist))
g <- simplify(g)

png("../../../../static/images/project-euler/problem-014b.png", 
    width = 1024, height = 768)
par(mar = rep(0,4))
V(g)$color <- degree(g, mode = "out") == 0
plot(g,
     layout = layout.auto,
     vertex.color = V(g)$color,
     vertex.frame.color = NA,
     frame.color = "white",
     vertex.size = 6,
     vertex.label.cex = 1.5,
     vertex.label.color = "black",
     edge.arrow.size = 0.2,
     edge.color = "grey"
     )
 dev.off()