Creating a mirror of the SVN (Subversion) repositories hosting the source code of the WordPress core, plugins and themes is the first step to replicating the install and update APIs used by the WordPress itself.
There are several SVN repositories that need to be mirrored:
- WordPress core at core.svn.wordpress.org
- Plugins at plugins.svn.wordpress.org
- Themes at https://themes.svn.wordpress.org
After cloning the repositories, it is possible to keep them in sync with the origin using svnsync
which would periodically pull in just the latest changes.
Download SVN Dumps
I’ve gone through the steps outlined in this guide and created the SVN dumps of all three repositories and published them on the Internet Archive so you can get right to local import and sync:
- WordPress core (1.9GB gzip, up until revision r58509)
- Themes (198GB tar gzip)
- Plugins (558GB tar gzip)
Be sure to use a download manager that enables resuming interrupted downloads such as wget
:
wget --continue https://archive.org/download/wp-org-svn-themes-dump/wp-org-svn-themes-dump.gz
or curl
:
curl --location --remote-name --continue-at - https://archive.org/download/wp-org-svn-themes-dump/wp-org-svn-themes-dump.gz
Extract the Dumps
All dumps are compressed with gzip and both themes and plugins are also packed as tar bundles since they contain multiple dumpstream files.
Use the following commands to extract the single core dump (remove the --keep
flag if you don’t want to preserve the compressed original):
gunzip --keep wp-org-core-svn-dump.gz
and the following command for tar.gz
archives of theme and plugin dumps:
tar --extract --gunzip --file wp-org-svn-themes-dump.gz
tar --extract --gunzip --file wp-org-svn-plugins-dumps.gz
Below are steps for how to recreate the above dumps from the origin SVN repositories.
How about a Checkout?
The first idea might be to use svn checkout ...
for each of the repositories but that doesn’t work due to the following reasons:
- Network errors and disconnects leave the local repository in a broken state and require
svn cleanup
before resuming the checkout, which often fails. - The SVN checkout allows only a single process and can’t be parallelized due to locking of commits to the same files.
Alternatives to Checkout
Ideally, the process would:
- download the whole SVN revision history as a single file,
- or allow specifying a range of revisions to enable parallel downloads.
The svnadmin dump
is a tool that creates a dump stream for specific ranges of revisions. Since it only works with local repositories, there is also svnrdump dump
that supports remote repositories.
So the final workflow is this:
- Create a file with ranges of revisions like
XXX:YYY
to download in each process. - Use
parallel
to callsvnrdump dump --incremental --revision {} > repo-{}.dump
where{}
is replaced with one of the revision ranges. - Import individual dumps into a new local repository using
svnadmin load --file repo-XXX:YYY.dump repo-directory
Run this in a screen
session to ensure the processes keep running even if you log out of the computer.
Requirements
Use Homebrew to install the required tooling:
brew install subversion parallel pv
while tools like bash
, grep
, cut
and gzip
are already included i macOS by default.
Step 1: Define Revision Ranges
The examples below are for the WordPress core SVN repository. Replace https://core.svn.wordpress.org
with other repository URLs as needed.
Save this bash script as rev-ranges.sh
:
#!/usr/bin/env bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <repository-url>"
exit 1
fi
# Set the number of revisions included in a single dump.
REV_STEP=10000
LATEST_REV=$(svn info "$1" | grep "Revision:" | cut -c11-)
for (( rev_start = 0; rev_start < $LATEST_REV; rev_start += $REV_STEP )); do
if (( $rev_start + $REV_STEP < $LATEST_REV )); then
echo "$rev_start:$(($rev_start + $REV_STEP - 1))"
else
echo "$rev_start:$LATEST_REV"
fi
done
Customise the REV_STEP
as necessary, and mark it as executable:
chmod +x rev-ranges.sh
Finally, run it to generate core-revs.txt
with a revision range per line:
./rev-ranges.sh https://core.svn.wordpress.org > core-revs.txt
which produces the following contents:
0:9999
10000:19999
20000:29999
30000:39999
40000:49999
50000:58547
Step 2: Dump Revision Ranges
Create a new directory to store the revision dumps:
mkdir core-dumps
Then to start parallel downloads of the revision range dumps pass the contents of core-revs.txt
to parallel
using cat
along with the command above:
cat core-revs.txt | parallel "svnrdump dump --revision {} --incremental https://core.svn.wordpress.org > core-dumps/core-{}.dump"
where:
--revision {}
specifies the revision range piped from the file,--incremental
makes the dumps standalone for incremental import.
This starts one svnrdump dump
process per CPU. Pass a -j NN
flag to parallel
to specify a custom job count.
Note, that there is no output to terminal while the commands are running as all of the stdout
is sent to the dump files. Use watch "ls -lh core-dumps"
in another window to monitor the size of the individual dumps:
Every 2.0s: ls -lh core-dumps
total 734M
-rw-r--r-- 1 root root 28M Sep 28 15:24 core-0:9999.dump
-rw-r--r-- 1 root root 34M Sep 28 15:24 core-10000:19999.dump
-rw-r--r-- 1 root root 67M Sep 28 15:24 core-20000:29999.dump
-rw-r--r-- 1 root root 45M Sep 28 15:24 core-30000:39999.dump
-rw-r--r-- 1 root root 284M Sep 28 15:24 core-40000:49999.dump
-rw-r--r-- 1 root root 279M Sep 29 05:29 core-50000:58547.dump
For reference — here is the source code of the svnrdump — the dump_cmd
function invokes replay_revisions
which in turn calls svn_ra_replay_range
.
Each dump file should be anywhere from 20MB to 1.5GB depending on the repository. The combined size of all dumps for WP core is 730MB.
In order to save the disk space, you can compress the dump stream with gzip
before streaming to a file:
cat core-revs.txt | parallel "svnrdump dump --revision {} --incremental https://core.svn.wordpress.org | gzip > core-dumps/core-{}.dump.gz"
Remember to decompress the files when importing!
Step 3: Import Dumps Locally
Unfortunately, the import process can’t be parallelized because the revisions are referring to previous revisions which must exist in the SVN database before the later ones can be inserted.
Therefore, we must ensure that svnadmin load
is called sequentially with dump ranges from the lowest revisions to the highest. We use sort --version-sort
to list the dump file names in the natural order.
Save this bash script as load-dumps.sh
(adjust this if using gzipped dumps):
#!/usr/bin/env bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <dumps-source-dir>"
exit 1
fi
DUMPS_DIR=$1
SVN_DIR="$DUMPS_DIR-svn"
if [ -d $SVN_DIR ]; then
echo "SVN directory $SVN_DIR already exists. Not sure how to merge dumps."
exit 1
fi
svnadmin create "$SVN_DIR"
for dumpfile in $(ls "$DUMPS_DIR" | sort --version-sort); do
pv "$DUMPS_DIR/$dumpfile" | svnadmin load --quiet --no-flush-to-disk --bypass-prop-validation --force-uuid --memory-cache-size 2048 "$SVN_DIR"
done
where you must customize the value of --memory-cache-size
argument based on the available RAM.
Make it executable:
chmod +x load-dumps.sh
and run it by specifying the source directory containing the dump files as the first argument:
./load-dumps.sh core-dumps
which produces the following output for each file:
Loading core-0:9999.dump:
3.12MiB 0:00:05 [ 422KiB/s] [> ] 28% ETA 0:00:07
If one of the imports fails, you can attempt to manually restart the specific import and specify --revision AAA:BBB
to the known remaining revision range. You might also need to run svnadmin recover repo-directory
if the repo was left in a corrupt state.
Import Performance
The best load performance I’ve seen is 20MB/s on average which leads to the following import times for each repository:
Repository | Dump Size Uncompressed | Repository Size | Import Time |
---|---|---|---|
WordPress Core | 779 MB | 935 MB | 5 minutes |
WordPress Themes | 353 GB | TBD | TBD |
WordPress Plugins | 1258 GB | TBD | TBD |
Syncing with Origin
After creating the repositories locally, use svnsync
to keep it with sync with the remotes. It is important to note that this tool works with repository URLs so we must specify the local repository as file:///Users/yourname/to/core-dumps-svn
. Run pwd
to print the full path to the current directory and append it to file://
.
First, set the contents of hooks/pre-revprop-change
in the repository directory to do nothing:
#!/bin/sh
exit 0
and make it executable:
chmod +x core-dumps-svn/hooks/pre-revprop-change
Then associate the local repository with a remote origin:
svnsync init --allow-non-empty file:///Users/yourname/wp-svn/core-dumps-svn https://core.svn.wordpress.org
which returns:
Copied properties for revision 58547.
And finally, run the actual sync:
svnsync sync file:///Users/yourname/wp-svn/core-dumps-svn
You can setup a cronjob to run this at regular intervals.
Checkout Working Copy
The repository we’ve created only contains the revision history and the associated meta data. To get the actual files or a working copy, we run:
svn checkout file:///Users/yourname/wp-svn/core-dumps-svn core-svn-checkout
where the last argument is the directory path for the working copy. If you skip the last argument, it will dump the working files in the SVN repository directory.
Leave a Reply