User Tools

Site Tools


linux:parallel_rsync

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
parallel_rsync [2012/12/06 17:10] – created dodgerlinux:parallel_rsync [2022/02/11 11:36] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +====== [SCRIPT] psync (parallel rsync) ======
 +
 ====== Description ====== ====== Description ======
 This set of scripts will parallelize the transfer of a huge directory tree keeping in mind a maximum number of simultaneous transfers. This set of scripts will parallelize the transfer of a huge directory tree keeping in mind a maximum number of simultaneous transfers.
  
 ====== Instructions ====== ====== Instructions ======
 +I suggest you tu launch psync with the following line:
 +<code bash>
 +./psync.sh /path/to/folder
 +</code>
 +Don't launch it with the FINAL SLASH:
 +  * NOP: <del>./psync.sh /path/to/folder/</del>
 +  * YES: ./psync.sh /path/to/folder
 +
 +
 ===== Pre-Reqs ===== ===== Pre-Reqs =====
   * gnu screen   * gnu screen
Line 8: Line 19:
   * ssh   * ssh
  
-===== Headline =====+===== psync.sh ===== 
 +==== Description ==== 
 + 
 +This script will: 
 +  * Check if the directory to transfer exists 
 +  * Calculate the directories to transfer at the maximum deep of //${MAXDEPTH}// 
 +  * Parallel Transfer of the upper directories from deep 1 to deep //${MAXDEPTH}// (It will show a message each 100 directories) 
 +  * Parallel transfer of the directories at deep //${MAXDEPTH}// (It will show a message for each folder) 
 +  * Think that the //${MAXPARALEL}// is flexible because of the "//sleep 1//" in the "//check_max_processes()//" function. 
 + 
 +==== Code ==== 
 + 
 +<file bash psync.sh> 
 +#!/bin/bash 
 +[ ! $1 ] && echo "Usage: $0 /path/to/run" && exit 1 
 + 
 +TARGET="$1" 
 + 
 +[[ ! "${TARGET}" ]] && echo -e "$TARGET\n not a directory" && exit 1 
 +[ ! -d ${TARGET} ] && echo -e "$TARGET\n not a directory" && exit 1 
 + 
 +LOGDIR=$(dirname $0)/$(basename ${TARGET}) 
 +[ -d ${LOGDIR} ] && echo "Cleanup" && rm -fr ${LOGDIR} 
 +mkdir -p ${LOGDIR}/transferlogs 
 + 
 +check_max_processes() 
 +
 +    local let MAXPARALEL=$1 
 +    while [ $(ps waux | egrep ":[0-9]{2} rsync" | wc -l) -gt ${MAXPARALEL} ] ; do 
 +    printf "%s"
 +    sleep 1 
 +    done 
 +
 + 
 +sync_this() 
 +
 +    local let MAXDEPTH=3 
 +    local let MAXPARALEL=20 
 + 
 +    LAUCHRSYNC="$(dirname $0)/launch_rsync.sh" 
 +    local let y=0 
 +    for FOLDER in $(find ${TARGET} -mindepth ${MAXDEPTH} -maxdepth ${MAXDEPTH} -type d) ; do 
 +        DIRLIST[$y]="${FOLDER}" 
 +        let y++ 
 +    done 
 + 
 +    echo "Copying files and directories NOT recursively" 
 +    for ((i=0;i<${MAXDEPTH}; i++));do 
 +        let x=0 
 +        for ITEM in $(find ${TARGET} -mindepth $i -maxdepth $i -type d) ; do 
 +            check_max_processes ${MAXPARALEL} 
 +            screen -S ${x} -d -m ${LAUCHRSYNC} -nr ${ITEM} nr_${x} ${LOGDIR} 
 +            let x++ 
 +            [[ $x =~ [0-9]{1,2}00$ ]] && printf "\n%s\n" "$x Directories Copied Not recursively" 
 +        done 
 +        echo "Deep $i DONE, going upper" 
 +    done 
 +    echo "Launching recursive rsyncs in deep ${MAXDEPTH}" 
 +    let x=0 
 +    for ((i=0;i<${#DIRLIST[@]}; i++ )); do 
 +        printf "\n%s" "Launching rsync $i of ${#DIRLIST[@]}" 
 +        check_max_processes ${MAXPARALEL} 
 +        screen -S ${i} -d -m ${LAUCHRSYNC} -r ${DIRLIST[$i]} r_${i} ${LOGDIR} 
 +    done 
 +
 + 
 +sync_this ${TARGET} 
 +</file> 
 + 
 +==== Script Variables ==== 
 +^ Variable ^ Description ^ 
 +|<code>TARGET="$1"</code> | De directory that will be transferred | 
 +|<code>LOGDIR=$(dirname $0)/$(basename ${TARGET})</code> | The directory in will you'll find the ressults of the sync's | 
 +|<code>local let MAXDEPTH=3</code> | The deep in which the script will parallelize the sync. | 
 +|<code>let MAXPARALEL=20</code> | Maximum number of rsync's launched at a time | 
 +|<code>LAUCHRSYNC="/root/autosync/launch_rsync.sh"</code> | The rsync script itself | 
 + 
 +===== launch_rsync.sh ===== 
 +==== Description ==== 
 +This script will: 
 +  * Launch rsync non-parallel or parallel 
 +  * Log the exit code of rsync to know if everything gones fine or not 
 + 
 + 
 +==== Code ==== 
 +<file bash launch_rsync.sh> 
 +#!/bin/bash 
 +# launch_rsync.sh 
 +RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]'
 +TARGET=$2 
 +SCREENNAME=$3 
 +LOGDIR=$4 
 +DSTSERVER="1.1.1.1" 
 +DESTINATION="${TARGET}" 
  
 +if [[ "${RECURSIVE}" =~ ^\-{1,2}(nr|non-recursive)$ ]] ; then
 + rsync -cdlptgoDv --partial ${TARGET}/* ${DSTSERVER}:${DESTINATION}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}_NOTRECURSIVE.log
 + RES=$?
 +elif [[ "${RECURSIVE}" =~ ^\-{1,2}(r|recursive)$ ]] ; then
 + rsync -cazv --partial ${TARGET}/* ${DSTSERVER}:${DESTINATION}/ 2>&1 > ${LOGDIR}/transferlogs/${SCREENNAME}.log
 + RES=$?
 +else
 + echo "$0 -nr|-r|--non-recursive|--recursive"
 + exit 1
 +fi
  
 +if [ $RES -eq 0 ] ; then
 + echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.OK
 +else
 + echo "$RES : ${TARGET}" >> ${LOGDIR}/${RECURSIVE//-/}_TRANSFERS.FAIL
 +fi
 +</file>
 +==== Variables ====
 +^ Variable ^ Description ^
 +|<code>RECURSIVE=$(echo $1 | tr '[[:upper:]]' '[[:lower:]]')</code> | Parallel or not, **DON'T MODIFY** |
 +|<code>TARGET=$2</code> | The directory that will be transferred, **DON'T MODIFY** |
 +|<code>SCREENNAME=$3</code> | Name of the screen in which that script is running, **DON'T MODIFY** |
 +|<code>LOGDIR=$4</code> | Where the ressults will be logged, **DON'T MODIFY** |
 +|<code>DSTSERVER="1.1.1.1"</code> | Destination server |
 +|<code>DESTINATION="${TARGET}"</code> | Destination folder, actually is the same of //${TARGET}//, but you will wish to modify it :-) |
  
-Some days ago I was on the chance to transfer a huge directory. 
  
-Huge means ~50TB with +1million files and a deep of only 6 folders under the parent one. 
-As I must do that kind of transfer more than 10 times with the same amount of folders… I decided to implement some kind of parallel function which launch parallel rsync’s at a given deep of my choose. 
  
-The ressult was that “pure bash” little script (the only dependency is “screen”)… You’ll notice that the main function “sync_this()” will run alone in your script only changing 2 or 3 variables ;-) 
linux/parallel_rsync.1354813815.txt.gz · Last modified: 2012/12/06 17:10 by dodger