
Export WordPress to Hugo Markdown or Org Mode with R

Peter Prevos |
899 words | 5 minutes
Share this content
I started my first website in 1996 with hand-written HTML. That became a bit of a chore, so for about fifteen years, WordPress became my friend. WordPress has been great to me, but it is slowly becoming a pain to keep updating plugins, security issues, slow performance and the annoying block editor. I am also always looking for additional activities I can do with Emacs. Hugo takes a lot of the pain of managing site away as you can focus on the content and Emacs provides me with powerful editing functionality.
I recently returned to a static website using Hugo. This article explains how to export a WordPress blog to Hugo and customise it with R code. The only reason I used R is because it is the only programming language I know well enough.
You will also need to install the mighty Pandoc software to convert the content to Org mode and the WP All Export WordPress plugin to export your website to a CSV file.
Convert the content to Markdown or Org Mode
The first step is to export the WordPress posts database to a CSV file. Several plugins are available that help you with this task. I have used the WP All Export plugin to export the data. You need to download the ZIP file and install this plugin manually in your WordPress setup. Follow the steps in the All Export plugin and create a CSV file from your posts with at least these fields:
- Title
- Slug
- Date
- Content
- Categories
- Tags
Alternatively, you can link directly to the WordPress database and extract the data with the RMySQL package.
The content files for Hugo are either Markdown or Org Mode. I prefer to use Org Mode as it provides me with access to the extensive functionality that Emacs has to offer, including writing and evaluating R code.
The Content
field in the exported CSV file contains HTML code of the article. The code below reads the CSV file and saves each content field as an HTML file, using the post's slug as the filename. The mighty Pandoc software converts this file to Org mode. Any draft posts or pages in the export file will have NA
as the file name and are as such skipped.
Now that we have some content, we need to add the Org mode front matter so that Hugo can build a site. The last part of the code generates the front matter for each post, prepends it to the exported Org mode file and cleans some entries.
Copy the code below and save it as wp2org.R
. You need to change the filename in the line that starts with file
to the name of your export file. The script also creates two subdirectories to store the HTML and Org files.
You run this code with Rscript wp2org.R
from the same directory where the CSV file is stored. The result will be a collection of Org mode files.
This new site will not be perfect just yet. To show the images, you need to download your wp-content
folder and move it to the static/images
folder in Hugo.
The internal links in your blogs will be hard-coded, which means that you need to configure Hugo to ensure your slugs stay the same.
There will be other bits and pieces that might not have adequately converted, so do check your pages.
All you have to do now is to add a theme to your website, and your blog is fully converted. The Hugo website has a great Quick Start page that will get you going.
You can create new posts and edit your content with your favourite text editor. I use Org mode in Emacs to develop this website.
Summary
In summary, you need to take the following steps:
- Install pandoc software and WP All Export WordPress plugin.
- Download your website as a
CSV
file with the WordPress plugin. - Copy the R script in a file called
wp2org.R
and save it in the same location as theCSV
file. - Open your console and move to the folder with the script and
CSV
file - Run
Rscript wp2org.R
- Review the Org mode files and clean-up any issues
Script
## Export WP to Hugo
## Read exported WP content
library(tibble)
library(readr)
library(dplyr)
library(stringr)
## Replace the filename with the exported file
posts <- read.csv("filename", skipNul = TRUE)
## Create subdirectories
if (!dir.exists("tmp")) dir.create("tmp")
if (!dir.exists("org")) dir.create("org")
## Read posts
for (i in 1:nrow(posts)) {
## Save content as temporary html file
filename <- paste0(posts$Slug[i], ".html")
writeLines(posts$Content[i], paste0("tmp/", filename))
## Convert to Org mode with Pandoc
pandoc <- paste0("pandoc -o ", paste0("org/", posts$Slug[i],
".org ", paste0("tmp/", filename)))
system(pandoc)
}
## Create front matter for all posts
fm <- tibble(title = paste("#+title:", posts$Title),
date = paste0("#+date: [", as.POSIXct(posts$Date, origin = "1970-01-01"), "]"),
lastmod = paste0("#+lastmod: [", Sys.Date(), "]"),
categories = paste("#+categories[]:", str_replace_all(posts$Categories, " ", "-")),
tags = paste("#+tags[]:", str_replace_all(posts$Tags, " ", "-")),
draft = "#+draft: true") %>%
mutate(categories = str_replace_all(categories, "\\|", " "),
tags = str_replace_all(tags, "\\|", " "))
## Load Hugo files an prepend front matter
for (f in 1:nrow(posts)) {
filename <- paste0("org/", posts$Slug[f], ".org")
post <- c(paste(fm[f, ]), "", readLines(filename))
## Repoint images
post <- str_replace_all(post, paste0("http.*wp-content"), "/images")
## Cleanup LaTeX
post <- str_replace_all(post, "\\$latex ", "$")
## Remove remaining Wordpress artefacts
post <- str_remove_all(post, ':::|\\{.wp.*|.*\\"\\}')
## Write to disk
writeLines(post, filename)
}
Share this content