
Project Euler 14: Longest Collatz sequence

Peter Prevos |
832 words | 4 minutes
Share this content
Euler Problem 14 looks at the Collatz Conjecture. These playful sequences, named after German mathematician Lothar Collatz (1910–1990), cause mathematicians a lot of headaches. What is most interesting about this problem is how a simple set of rules can create intricate complexity.
Project Euler 14 Definition
The following iterative sequence is defined for the set of positive integers:
$f(n) = \begin{cases} \frac{n}{2} &\text{if } n \equiv 0 \pmod{2}\\[4px] 3n+1 & \text{if } n\equiv 1 \pmod{2} .\end{cases}$
Using the rule above, and starting with 13, we generate the following sequence:
$$13 \rightarrow 40 \rightarrow 20 \rightarrow 10 \rightarrow 5 \rightarrow 16 \rightarrow 8 \rightarrow 4 \rightarrow 2 \rightarrow 1 $$
It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.
Which starting number, under one million, produces the longest chain?
NOTE: Once the chain starts, the terms are allowed to go above one million.
Proposed solutions
This problem is highly computationally intensive, and it highlights R's lack of speed. Generating one million Collatz sequences and finding the longest one requires a lot more than a minute of processing time allowed for in Project Euler.
Brute Force
The brute force solution tests every answer and takes a long time to run.
## Project Euler 14: Longest Collatz Sequence
## Brute Force
collatz.chain <- function(n) {
chain <- vector()
i <- 1
while (n != 1) {
if (n%%2 == 0)
n <- n / 2
else
n <- 3 * n + 1
chain[i] <- n
i <- i + 1
}
return(chain)
}
answer <- 0
collatz.max <- 0
for (n in 1:1E6) {
collatz.length <- length(collatz.chain(n))
if (collatz.length > collatz.max) {
answer <- n
collatz.max <- collatz.length
}
}
print(answer)
Optimised solution
The optimised version of the code stores the length of all sequences in an array. When the code generates a sequence and lands on a number already analysed, then it adds that previous number to the current one and moves on to the next step. This approach requires more memory but saves a lot of computation time. A minor tweak to the code optimises the rule for uneven numbers. Tripling an uneven number and adding one will always result in an even number so we can skip one step. This solution is more than twice as fast as the first version.
## Optimised method
collatz.length <- vector(length = 1E6)
collatz.length[1] <- 0
for (n in 2:1E6) {
x <- n
count <- 0
while (x != 1 & x >= n) {
if (x %% 2 == 0) {
x <- x / 2
count <- count + 1
}
else {
x <- (3 * x + 1) / 2
count <- count + 2
}
}
count <- count + collatz.length[x]
collatz.length[n] <- count
}
answer <- which.max(collatz.length)
print(answer)
Visualising Collatz Sequences
The Collatz sequence is an example of a simple mathematical rule that can create an unpredictable pattern. The number of steps required to reach one is listed in sequence A006577 of the Online Encyclopedia of Integer Sequences. The image below visualises the number of steps for the first 1000 positive numbers. The scatterplot shows some unusual patterns. Does this visualisation show that the Collatz Sequence does have a pattern after all?

## Visualise
library(ggplot2)
collatz <- data.frame(n = 1:10000,
steps = collatz.length[1:10000])
ggplot(collatz, aes(n, steps)) +
geom_point(size = .5, col = "darkblue") +
theme_minimal(base_size = 10) +
labs(title = "Collatz Sequence",
subtitle = "Number of steps to reach 1")
ggsave("../../../../static/images/project-euler/problem-014a.png", width = 6, height = 4)
Collatz Chains
The Collatz sequences can also be visualised using networks. Each step between two numbers is an edge, and the numbers are the vertices. For example, the network for the Collatz sequence for number 10 is 5–16, 16–8, 8–4, 4–2, 2–1. When generating subsequent sequences, the network will start to overlap, and a tree of sequences appears. The tree below combines the Collatz sequences for the numbers 2 to 26. Number 27 has a very long sequence, making the tree much harder to read. The code to create this network uses the igraph package. The edge list is a data frame where each row indicates a connected set of numbers.

## Collatz Chains
library(igraph)
edgelist <- data.frame(a = 2, b = 1)
for (n in 3:26) {
chain <- as.character(c(n, collatz.chain(n)))
chain <- data.frame(a = chain[-length(chain)], b = chain[-1])
edgelist <- rbind(edgelist, chain)
}
g <- graph.edgelist(as.matrix(edgelist))
g <- simplify(g)
png("../../../../static/images/project-euler/problem-014b.png",
width = 1024, height = 768)
par(mar = rep(0,4))
V(g)$color <- degree(g, mode = "out") == 0
plot(g,
layout = layout.auto,
vertex.color = V(g)$color,
vertex.frame.color = NA,
frame.color = "white",
vertex.size = 6,
vertex.label.cex = 1.5,
vertex.label.color = "black",
edge.arrow.size = 0.2,
edge.color = "grey"
)
dev.off()
Share this content