Sunday, July 11, 2021

git_find_big Contrib Back

I am contributing back some modifications as my way to thanks Antony Stubbs for the git_find_big.sh script.

As per Antony Stubbs recommendations, I have added some documentation for the changes I made and also made more changes so this should be the latest version.

#!/bin/bash
#set -x

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# Did some modifications on the script - 08-July-2021 @author Khamis Siksek
# [KS] changed the size to kilo bytes
# [KS] added KB=1024 constant to be used later in size calculations
# [KS] changed " to ' where applicable
# [KS] added a check for the pack file if it exists or not
# [KS] made the number of returned big files become a passable parameter
# [KS] used topBigFilesNo=10 as default value if not passed
# [KS] changed `command` to $(command) where applicable
# [KS] put the output in formattedText and echo that variable
# [KS] added exit 0 in case of success and exit -1 in case of an error
# [KS] packFile might hold multiple idx files thats why I used $(echo ${packFile}) in verify-pack
# [KS] added a check on the size and compressedSize since if they are too small they will show wrong output
# [KS] changed the variable "y" to "object" to make more readable
# [KS] enclosed all variables with {} wherever applicable
# [KS] changed sort to regular sort instead of reverse and used tail instead of head
# [KS] added more types to grep -v in objects (was only chain now it contains commit and tree)
# [KS] added informative message for the user that this may take few minutes

# make the number of returned big files configurable and can be passed as a parameter
topBigFilesNo=${1};
[[ -z "${1}" ]] && topBigFilesNo=10;

# check if the pack file exists or not
packFile=$(ls -1S .git/objects/pack/pack-*.idx 2> /dev/null);
[[ $? != 0 ]] && echo "index pack file(s) in .git do not exist" && exit -1;

# informative message for the user
echo 'This may take few seconds(minutes) depending on the size of the repository, please wait ...';

objects=$(git verify-pack -v $(echo "${packFile}") | grep -v 'chain\|commit\|tree' | sort -k3n | tail -"${topBigFilesNo}");

# as they are big files its more reasonable to show the size in KiB
echo 'All sizes are in KiBs. The pack column is the size of the object, compressed, inside the pack file.';

# constant
KB=1024;

# set the internal field seperator to line break, to iterate easily over the verify-pack output
IFS=$'\n';

# preparing the header of the output
output='Size,Pack,SHA,Location';

# loop goes through the objects to check their sizes
for object in $objects
do
    # extract the size in kilobytes
    size=$(echo ${object} | cut -f 5 -d ' ');
    [[ ! -z ${size} ]] && size=$((${size}/${KB})) || size=0;

    # extract the compressed size in kilobytes
    compressedSize=$(echo ${object} | cut -f 6 -d ' ');
    [[ ! -z ${compressedSize} ]] && compressedSize=$((${compressedSize}/${KB})) || compressedSize=0;

    # extract the SHA
    sha=$(echo ${object} | cut -f 1 -d ' ');

    # find the objects location in the repository tree
    other=$(git rev-list --all --objects | grep ${sha});
    
    #lineBreak=$(echo -e "\n")
    output="${output}\n${size},${compressedSize},${other}";
done

formattedOutput=$(echo -e ${output} | column -t -s ', ');
echo "${formattedOutput}";

exit 0;