Find duplicate files in a directory

posted in code with 0 comments

When I photographed heavily/professionally, I was rigorous in how I handled my imported raw files, and master processed (PSD/XCF) files. I was much less rigorous in how I sorted and stored my processed JPG files, to the point that I’ve found several directories with anywhere between hundreds and thousand of images, some or many of them straight duplicates.

For the hell of it, and also because I haven’t touched C# since early 2013, I drew up a simple console application in C# to search for duplicate file in a given directory. I made a good start on it in Bash, but…fuck. Bash is slow and interacting with arrays in Bash leaves me wanting to murder somebody.

Order of the program:

  1. Check directory was provided. Check directory exists. Check it has more than one file.
  2. Get list of files in directory.
  3. Generate MD5 checksums for each given file.
  4. For each checksum:
    i. Check each file after this in the list to see if it has the same sum.
    ii. If a duplicate is found, check if it is on the recorded dupe list.
    iii. If it isn’t on the dupe list, add it.
  5. Run through the file list once for each dupe checksum. Print all file names with the same checksum.

I need to find more little projects like this in C#; it was fun to dust off what I knew.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Security.Cryptography;

public class findDupes {
    static void Main(string[] args) {
        CheckBeforeProceeding(args);

        string[] files = Directory.GetFiles(args[0]);
        List<string> filesums = new List<string>();

        foreach (string file in files)
            filesums.Add(GetFileSum(file));

        List<string> dupes = SearchForDupes(filesums);
        PrintDupes(filesums, dupes, files);
    }

    static void PrintDupes(List<string> sums, List<string> dupes, string[] files) {
        // Print output.
        foreach (string dupe in dupes) {
            Console.WriteLine("{0}\n----------", dupe);

            for (int i = 0; i <= (files.Length - 1); i++)
                if (sums[i] == dupe)
                    Console.WriteLine(files[i]);

            Console.WriteLine();
        }
    }

    static List<string> SearchForDupes(List<string> sums) {
        // Search for duplicate files within the given list of sums.
        List<string> dupes = new List<string>();

        for (int i = 0; i <= (sums.Count - 2); i++)
            for (int j = (i + 1); j <= (sums.Count - 2); j++)
                if (sums[i] == sums[j])
                    if (!dupes.Contains(sums[i]))
                        dupes.Add(sums[i]);

        return dupes;
    }

    static void CheckBeforeProceeding(string[] args) {
        // Check things are good with the target dir before proceeding.
        if (args.Length == 0) {
            Console.WriteLine("Error: No directory provided");
            Environment.Exit(1);
        }

        if (!Directory.Exists(args[0])) {
            Console.WriteLine("Error: '{0}' is not a valid directory", args[0]);
            Environment.Exit(2);
        } 

        if (Directory.GetFiles(args[0]).Length == 0) {
            Console.WriteLine("Error: '{0}' does not contain any files", args[0]);
            Environment.Exit(3);
        }

        if (Directory.GetFiles(args[0]).Length == 1) {
            Console.WriteLine("Error: '{0}' only contains 1 file", args[0]);
            Environment.Exit(3);
        }
    }

    static string GetFileSum(string file) {
        // Function scalped from http://stackoverflow.com/a/10520086/1433400
        using (var sum = MD5.Create())
            using (var stream = File.OpenRead(file))
                return BitConverter.ToString(sum.ComputeHash(stream)).Replace("-","").ToLower();
    }
}

Here is some example output:

[mark][new_instagram] $ ~/dupe_find.exe .
d89b812d61bb41d037b1e6704d146d11
----------
./1.jpg
./17.jpg

4c6cc2794010fe3b018038b0e1162744
----------
./132.jpg
./312.jpg

...

Neater output from the program is left as an exercise to the reader.

by Mark -

Makeout Point

posted in the website with 0 comments

Makeout Point, menu closed

Thank my housemate Alanna for this theme. Alanna has wanted to blog for a while, but Alanna has also wanted to procrastinate and play video games in her free time. :) She hopes that because she asked for a unique theme for her site, and drove me to build it, she might be more likely to blog out of a sense of responsibility to the work undertaken.

And here I am two weeks later, just about finished with Makeout Point. The theme is clean, minimal, responsive, and not half-bad compared to my earlier efforts. The theme is built for Anchor, a PHP Facebook-lite. The available functions are bare bones compared to WordPress, the upshot is that you don’t have WordPress’ cruft. Want to loop comments? Loop comments. Want to get a count of comments? Get a count of comments. There’s a really nice straightforwardness to Anchor that I’ve enjoyed working with.

The design is clean, responsive (smartphone through to desktop), fast, has a strong focus on code snippets, and makes use of burger menus for both site navigation and article comments.

Makeout Point, menu open

Anchor isn’t perfect-you can freely use a mix of both HTML and Markdown, and this leads to occasions where previously-escaped HTML and XML code snippets in <pre> tags are parsed. Whoops. It’s nice though, don’t mistake me! I’d love to work with Anchor more in future.

You can fork or pull Makeout Point from it’s GitHub repo.

by Mark -

Merde

posted in the website with 0 comments

Boilerplate can be bad, and I was an idiot for using it. I used the same @font-face boilerplate code across three sites: Here, 091 Labs, and Alanna’s new Anchor site. The boilerplate is:

@font-face {
    font-family: 'Source Code Pro Regular';
    src: 
        url('fonts/source_code_pro/scp-r.eot?') format('embedded-opentype'),
        url('fonts/source_code_pro/scp-r.woff') format('woff'),
        url('fonts/source_code_pro/scp-r.otf')  format('opentype'),
        url('fonts/source_code_pro/scp-r.ttf')  format('truetype'),
        url('fonts/source_code_pro/scp-r.svg')  format('svg');
}

Here is a typical piece of the font-family selection code I recycled:

h1, h2, h3, h4, h5, h6 {
    color: #343537;
    font: bold 2.75em 'Source Code Pro', impact;
    letter-spacing: 0;
    text-align: left;
}

I declare the @font-face, but never actually use it. And even if the font-face would happen to load, it is also the incorrect file: The typeface file in question is actually the bold version. It was pointed out to me in a Reddit thread last night when a user complained about ugly typefaces on a site I submitted for critique.

So, fuck. Lesson learned. The solution was to move from self-hosted @font-face’s to Google Fonts, for at least common typefaces, and then triple-check @font-face boilerplate in future.

by Mark -

I feel strangely proud about my first recursive function

posted in code with 0 comments

I need to move the bottom-most of a given set of divs as part of a parallax effect, so I progress down through them until I hit bottom.

function left(amount, obj) {
    $(obj).children().each(function() {
        if ($(this).children().length > 0) {
            left(amount, this);
        } else {
            $(this).css('left', parseInt($(this).css('left')) - amount + 'px');
            wrap(this);
        }
    });
}

function wrap(obj) {
    var x = $(obj).offset().left;
    var y = $(obj).offset().top;
    var w = $(obj).width();
    var h = $(obj).height();

    if (y + h < 0) {
        $(obj).css('top', $(window).height() + 'px');
    } else if (x + w < 0) {
        $(obj).css('left', $(window).width() + 'px'); 
    } else if (x > $(window).width()) {
        $(obj).css('left', 0 - w + 'px');
    } else if (y > $(window).height()) {
        $(obj).css('top', 0 - h + 'px');
    }
}

/toot

by Mark -

Project Euler Problem #18

posted in code with 0 comments

I am beginning to feel as if my head is full of mush every time I have to deal with a jagged array in a loop. I stole the solution from here, but the code is entirely my own.

In short, you begin at the bottom-left corner of the jagged array. You move to the right and evaluate each pair of numbers. You add the higher of the two to the number above. Take the first three numbers:

63
04 62

Weigh 04 and 62. Add the higher of the two – 62 – to 63:

125
04 62

Repeat this cascade. Ultimately everything will be added to a[0][0]. Output this. Problem #67 repeats this problem exactly, and except for some additional file handling the solution below can be used.

using System;

public class Eighteen
{
	static void Main()
	{
		int[][] a = new int[15][];

		a[0]  = new int[1]  {75};
		a[1]  = new int[2]  {95,64};
		a[2]  = new int[3]  {17,47,82};
		a[3]  = new int[4]  {18,35,87,10};
		a[4]  = new int[5]  {20,04,82,47,65};
		a[5]  = new int[6]  {19,01,23,75,03,34};
		a[6]  = new int[7]  {88,02,77,73,07,63,67};
		a[7]  = new int[8]  {99,65,04,28,06,16,70,92};
		a[8]  = new int[9]  {41,41,26,56,83,40,80,70,33};
		a[9]  = new int[10] {41,48,72,33,47,32,37,16,94,29};
		a[10] = new int[11] {53,71,44,65,25,43,91,52,97,51,14};
		a[11] = new int[12] {70,11,33,28,77,73,17,78,39,68,17,57};
		a[12] = new int[13] {91,71,52,38,17,14,91,43,58,50,27,29,48};
		a[13] = new int[14] {63,66,04,68,89,53,67,30,73,16,69,87,40,31};
		a[14] = new int[15] {04,62,98,27,23,09,70,98,73,93,38,53,60,04,23};

		for (int i = 14; i > 0; i--)
			for (int j = 0; j < a[i].Length-1; j++)
				if (a[i][j] > a[i][j+1])
					a[i-1][j] += a[i][j];
				else if (a[i][j+1] > a[i][j])
					a[i-1][j] += a[i][j+1];

		Console.WriteLine("\\n{0}\\n", a[0][0]);
	}
}

by Mark -

Project Euler Problem #12

posted in code with 0 comments

This was a question of two parts:

  1. Calculate the next triangle number in sequence.
  2. Take said triangle number and calculate how many divisors it has. Loop until you find one with 500, and break.

The first part was simple:

Start with n = 1. For every iteration of the loop, add n to n, and add 1:

n = 1
n = n + n + 1

The second part…not so simple. I’ll honestly say I am struggling with understanding the correct way to test for divisors; this works, however:

Take a number, i. Find the square root of i. Loop. Check every integer up to n to see if it divides evenly. Increment the count. Double the answer and return it.

using System;

public class Twelve
{
	static void Main()
	{
		long a = 1;
		long b = 1;
		long c = 0;

		while (c < = 500)
		{
			c = factors(a);

			if (c > 500)
			{
				Console.Write("\\n{0}\\n", b);
				break;
			}
			else
			{
				a += b + 1;
				b++;
			}
		}
	}

	static long factors(long a)
	{
		long b = 1;

		for (int i = 1; i < = Math.Sqrt(a); i++)
			if (a % i == 0)
				b++;

		return 2 * b;
	}
}

by Mark -

Project Euler Problem #6

posted in code with 0 comments
using System;

public class Six
{
	static void Main()
	{
		double a = 100;
		double b = 0;
		double c = 0;
		
		for (int i = 1; i < = a; i++)
		{
			b += (i * i);
			c += i;
		}

		c = c * c;

		if (b > c)
			Console.WriteLine("\n{0}\n", b - c);
		else if (c > b)
			Console.WriteLine("\n{0}\n", c - b);
	}
}

by Mark -

Project Euler Problem #5

posted in code with 0 comments
using System;

public class Five
{
	static void Main()
	{
		int a = 20;
		int b = 0;

		while (b < = 0)
		{
			for (int i = 11; i < 20; i++)
			{
				if (a % i != 0)
				{
					a += 20;
					break;
				}
				else if (i == 19)
				{
					b = a;
					break;
				}
			}
		}

		Console.WriteLine("\n{0}\n", b);
	}
}

by Mark -

Project Euler Problem #4

posted in code with 0 comments
using System;

public class Four
{
	static void Main()
	{
		int a = 0;
		int b = -1;
		int c = 999;

		for (int i = 100; i < = c; i++)
		{
			for (int j = 100; j <= c; j++)
			{
				a = i * j;

				if ((a == Reverse(a)) && (a > b))
					b = a;
			}
		}

		Console.WriteLine("\n{0}\n", b);
	}

	static int Reverse(int m)
	{
		int n = 0;

		while (m != 0)
		{
			n *= 10;
			n += n % 10;
			m /= 10;
		}

		return m;
	}
}

by Mark -

Project Euler Problem #3

posted in code with 0 comments
using System;

public class Three
{
	static void Main()
	{
		long a = 600851475143;
		int  b = Convert.ToInt32(Math.Sqrt(a));
		int  c = 0;

		for (int i = b; i >= 2; i--)
		{
			if (a % i == 0)
				if (ChkPrime(i))
				{
					c = i;
					break;
				}
		}

		Console.WriteLine("\n{0}\n", c);
	}

	static bool ChkPrime(int c)
	{
		bool prime = true;

		for (int i = 2; i < Math.Sqrt(c); i++)
			if (c % i == 0)
			{
				prime = false;
				break;
			}

		return prime;
	}
}

by Mark -