Find duplicate files in a directory

in code

When I photographed heavily/professionally, I was rigorous in how I handled my imported raw files, and master processed (PSD/XCF) files. I was much less rigorous in how I sorted and stored my processed JPG files, to the point that I’ve found several directories with anywhere between hundreds and thousand of images, some or many of them straight duplicates.

For the hell of it, and also because I haven’t touched C# since early 2013, I drew up a simple console application in C# to search for duplicate file in a given directory. I made a good start on it in Bash, but…fuck. Bash is slow and interacting with arrays in Bash leaves me wanting to murder somebody.

Order of the program:

  1. Check directory was provided. Check directory exists. Check it has more than one file.
  2. Get list of files in directory.
  3. Generate MD5 checksums for each given file.
  4. For each checksum:
    i. Check each file after this in the list to see if it has the same sum.
    ii. If a duplicate is found, check if it is on the recorded dupe list.
    iii. If it isn’t on the dupe list, add it.
  5. Run through the file list once for each dupe checksum. Print all file names with the same checksum.

I need to find more little projects like this in C#; it was fun to dust off what I knew.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Security.Cryptography;

public class findDupes {
    static void Main(string[] args) {
        CheckBeforeProceeding(args);

        string[] files = Directory.GetFiles(args[0]);
        List<string> filesums = new List<string>();

        foreach (string file in files)
            filesums.Add(GetFileSum(file));

        List<string> dupes = SearchForDupes(filesums);
        PrintDupes(filesums, dupes, files);
    }

    static void PrintDupes(List<string> sums, List<string> dupes, string[] files) {
        // Print output.
        foreach (string dupe in dupes) {
            Console.WriteLine("{0}\n----------", dupe);

            for (int i = 0; i <= (files.Length - 1); i++)
                if (sums[i] == dupe)
                    Console.WriteLine(files[i]);

            Console.WriteLine();
        }
    }

    static List<string> SearchForDupes(List<string> sums) {
        // Search for duplicate files within the given list of sums.
        List<string> dupes = new List<string>();

        for (int i = 0; i <= (sums.Count - 2); i++)
            for (int j = (i + 1); j <= (sums.Count - 2); j++)
                if (sums[i] == sums[j])
                    if (!dupes.Contains(sums[i]))
                        dupes.Add(sums[i]);

        return dupes;
    }

    static void CheckBeforeProceeding(string[] args) {
        // Check things are good with the target dir before proceeding.
        if (args.Length == 0) {
            Console.WriteLine("Error: No directory provided");
            Environment.Exit(1);
        }

        if (!Directory.Exists(args[0])) {
            Console.WriteLine("Error: '{0}' is not a valid directory", args[0]);
            Environment.Exit(2);
        } 

        if (Directory.GetFiles(args[0]).Length == 0) {
            Console.WriteLine("Error: '{0}' does not contain any files", args[0]);
            Environment.Exit(3);
        }

        if (Directory.GetFiles(args[0]).Length == 1) {
            Console.WriteLine("Error: '{0}' only contains 1 file", args[0]);
            Environment.Exit(3);
        }
    }

    static string GetFileSum(string file) {
        // Function scalped from http://stackoverflow.com/a/10520086/1433400
        using (var sum = MD5.Create())
            using (var stream = File.OpenRead(file))
                return BitConverter.ToString(sum.ComputeHash(stream)).Replace("-","").ToLower();
    }
}

Here is some example output:

[mark][new_instagram] $ ~/dupe_find.exe .
d89b812d61bb41d037b1e6704d146d11
----------
./1.jpg
./17.jpg

4c6cc2794010fe3b018038b0e1162744
----------
./132.jpg
./312.jpg

...

Neater output from the program is left as an exercise to the reader.


Makeout Point

in the website

Makeout Point, menu closed

Thank my housemate Alanna for this theme. Alanna has wanted to blog for a while, but Alanna has also wanted to procrastinate and play video games in her free time. :) She hopes that because she asked for a unique theme for her site, and drove me to build it, she might be more likely to blog out of a sense of responsibility to the work undertaken.

And here I am two weeks later, just about finished with Makeout Point. The theme is clean, minimal, responsive, and not half-bad compared to my earlier efforts. The theme is built for Anchor, a PHP Facebook-lite. The available functions are bare bones compared to WordPress, the upshot is that you don’t have WordPress’ cruft. Want to loop comments? Loop comments. Want to get a count of comments? Get a count of comments. There’s a really nice straightforwardness to Anchor that I’ve enjoyed working with.

The design is clean, responsive (smartphone through to desktop), fast, has a strong focus on code snippets, and makes use of burger menus for both site navigation and article comments.

Makeout Point, menu open

Anchor isn’t perfect-you can freely use a mix of both HTML and Markdown, and this leads to occasions where previously-escaped HTML and XML code snippets in <pre> tags are parsed. Whoops. It’s nice though, don’t mistake me! I’d love to work with Anchor more in future.

You can fork or pull Makeout Point from it’s GitHub repo.


Merde

in the website

Boilerplate can be bad, and I was an idiot for using it. I used the same @font-face boilerplate code across three sites: Here, 091 Labs, and Alanna’s new Anchor site. The boilerplate is:

@font-face {
    font-family: 'Source Code Pro Regular';
    src: 
        url('fonts/source_code_pro/scp-r.eot?') format('embedded-opentype'),
        url('fonts/source_code_pro/scp-r.woff') format('woff'),
        url('fonts/source_code_pro/scp-r.otf')  format('opentype'),
        url('fonts/source_code_pro/scp-r.ttf')  format('truetype'),
        url('fonts/source_code_pro/scp-r.svg')  format('svg');
}

Here is a typical piece of the font-family selection code I recycled:

h1, h2, h3, h4, h5, h6 {
    color: #343537;
    font: bold 2.75em 'Source Code Pro', impact;
    letter-spacing: 0;
    text-align: left;
}

I declare the @font-face, but never actually use it. And even if the font-face would happen to load, it is also the incorrect file: The typeface file in question is actually the bold version. It was pointed out to me in a Reddit thread last night when a user complained about ugly typefaces on a site I submitted for critique.

So, fuck. Lesson learned. The solution was to move from self-hosted @font-face’s to Google Fonts, for at least common typefaces, and then triple-check @font-face boilerplate in future.


I feel strangely proud about my first recursive function

in code

I need to move the bottom-most of a given set of divs as part of a parallax effect, so I progress down through them until I hit bottom.

function left(amount, obj) {
    $(obj).children().each(function() {
        if ($(this).children().length > 0) {
            left(amount, this);
        } else {
            $(this).css('left', parseInt($(this).css('left')) - amount + 'px');
            wrap(this);
        }
    });
}

function wrap(obj) {
    var x = $(obj).offset().left;
    var y = $(obj).offset().top;
    var w = $(obj).width();
    var h = $(obj).height();

    if (y + h < 0) {
        $(obj).css('top', $(window).height() + 'px');
    } else if (x + w < 0) {
        $(obj).css('left', $(window).width() + 'px'); 
    } else if (x > $(window).width()) {
        $(obj).css('left', 0 - w + 'px');
    } else if (y > $(window).height()) {
        $(obj).css('top', 0 - h + 'px');
    }
}

/toot