User Tools

Site Tools


parallel_rsync

Description

This set of scripts will parallelize the transfer of a huge directory tree keeping in mind a maximum number of simultaneous transfers.

Instructions

I suggest you tu launch psync with the following line:

./psync.sh /path/to/folder

Don't launch it with the FINAL SLASH:

  • NOP: ./psync.sh /path/to/folder/
  • YES: ./psync.sh /path/to/folder

Pre-Reqs

  • gnu screen
  • rsync
  • ssh

psync.sh

Description

This script will:

  • Check if the directory to transfer exists
  • Calculate the directories to transfer at the maximum deep of ${MAXDEPTH}
  • Parallel Transfer of the upper directories from deep 1 to deep ${MAXDEPTH} (It will show a message each 100 directories)
  • Parallel transfer of the directories at deep ${MAXDEPTH} (It will show a message for each folder)
  • Think that the ${MAXPARALEL} is flexible because of the “sleep 1” in the “check_max_processes()” function.

Code

#!/bin/bash
[ ! $1 ] && echo "Usage: $0 /path/to/run" && exit 1

TARGET="$1"

[[ ! "${TARGET}" ]] && echo -e "$TARGET\n not a directory" && exit 1
[ ! -d ${TARGET} ] && echo -e "$TARGET\n not a directory" && exit 1

LOGDIR=$(dirname $0)/$(basename ${TARGET})
[ -d ${LOGDIR} ] && echo "Cleanup" && rm -fr ${LOGDIR}
mkdir -p ${LOGDIR}/transferlogs

check_max_processes()
{
    local let MAXPARALEL=$1
    while [ $(ps waux | egrep ":[0-9]{2} rsync" | wc -l) -gt ${MAXPARALEL} ] ; do
    printf "%s" .
    sleep 1
    done
}

sync_this()
{
    local let MAXDEPTH=3
    local let MAXPARALEL=20

    LAUCHRSYNC="$(dirname $0)/launch_rsync.sh"
    local let y=0
    for FOLDER in $(find ${TARGET} -mindepth ${MAXDEPTH} -maxdepth ${MAXDEPTH} -type d) ; do
        DIRLIST[$y]="${FOLDER}"
        let y++
    done

    echo "Copying files and directories NOT recursively"
    for ((i=0;i<${MAXDEPTH}; i++));do
        let x=0
        for ITEM in $(find ${TARGET} -mindepth $i -maxdepth $i -type d) ; do
            check_max_processes ${MAXPARALEL}
            screen -S ${x} -d -m ${LAUCHRSYNC} -nr ${ITEM} nr_${x} ${LOGDIR}
            let x++
            [[ $x =~ [0-9]{1,2}00$ ]] && printf "\n%s\n" "$x Directories Copied Not recursively"
        done
        echo "Deep $i DONE, going upper"
    done
    echo "Launching recursive rsyncs in deep ${MAXDEPTH}"
    let x=0
    for ((i=0;i<${#DIRLIST[@]}; i++ )); do
        printf "\n%s" "Launching rsync $i of ${#DIRLIST[@]}"
        check_max_processes ${MAXPARALEL}
        screen -S ${i} -d -m ${LAUCHRSYNC} -r ${DIRLIST[$i]} r_${i} ${LOGDIR}
    done
}

sync_this ${TARGET}

Script Variables

Variable Description
TARGET="$1"
De directory that will be transferred
LOGDIR=$(dirname $0)/$(basename ${TARGET})
The directory in will you'll find the ressults of the sync's
local let MAXDEPTH=3
The deep in which the script will parallelize the sync.
let MAXPARALEL=20
Maximum number of rsync's launched at a time
LAUCHRSYNC="/root/autosync/launch_rsync.sh"
The rsync script itself

launch_rsync.sh

Description

This script will:

  • Launch rsync non-parallel or parallel
  • Log the exit code of rsync to know if everything gones fine or not

Code

#!/bin/bash
# launch_rsync.sh
RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]')
TARGET=$2
SCREENNAME=$3
LOGDIR=$4
DSTSERVER="1.1.1.1"
DESTINATION="${TARGET}"


if [[ "${RECURSIVE}" =~ ^\-{1,2}(nr|non-recursive)$ ]] ; then
	rsync -cdlptgoDv --partial ${TARGET}/* ${DSTSERVER}:${DESTINATION}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}_NOTRECURSIVE.log
	RES=$?
elif [[ "${RECURSIVE}" =~ ^\-{1,2}(r|recursive)$ ]] ; then
	rsync -cazv --partial ${TARGET}/* ${DSTSERVER}:${DESTINATION}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}.log
	RES=$?
else
	echo "$0 -nr|-r|--non-recursive|--recursive"
	exit 1
fi

if [ $RES -eq 0 ] ; then
	echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.OK
else
	echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.FAIL
fi

Variables

Variable Description
RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]')
Parallel or not, DON'T MODIFY
TARGET=$2
The directory that will be transferred, DON'T MODIFY
SCREENNAME=$3
Name of the screen in which that script is running, DON'T MODIFY
LOGDIR=$4
Where the ressults will be logged, DON'T MODIFY
DSTSERVER="1.1.1.1"
Destination server
DESTINATION="${TARGET}"
Destination folder, actually is the same of ${TARGET}, but you will wish to modify it :-)
parallel_rsync.txt · Last modified: 2013/09/09 12:04 by dodger