Find duplicate files in a directory

posted in code with 0 comments

When I photographed heavily/professionally, I was rigorous in how I handled my imported raw files, and master processed (PSD/XCF) files. I was much less rigorous in how I sorted and stored my processed JPG files, to the point that I’ve found several directories with anywhere between hundreds and thousand of images, some or many of them straight duplicates.

For the hell of it, and also because I haven’t touched C# since early 2013, I drew up a simple console application in C# to search for duplicate file in a given directory. I made a good start on it in Bash, but…fuck. Bash is slow and interacting with arrays in Bash leaves me wanting to murder somebody.

Order of the program:

  1. Check directory was provided. Check directory exists. Check it has more than one file.
  2. Get list of files in directory.
  3. Generate MD5 checksums for each given file.
  4. For each checksum:
    i. Check each file after this in the list to see if it has the same sum.
    ii. If a duplicate is found, check if it is on the recorded dupe list.
    iii. If it isn’t on the dupe list, add it.
  5. Run through the file list once for each dupe checksum. Print all file names with the same checksum.

I need to find more little projects like this in C#; it was fun to dust off what I knew.

using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Security.Cryptography;

public class findDupes {
    static void Main(string[] args) {
        CheckBeforeProceeding(args);

        string[] files = Directory.GetFiles(args[0]);
        List<string> filesums = new List<string>();

        foreach (string file in files)
            filesums.Add(GetFileSum(file));

        List<string> dupes = SearchForDupes(filesums);
        PrintDupes(filesums, dupes, files);
    }

    static void PrintDupes(List<string> sums, List<string> dupes, string[] files) {
        // Print output.
        foreach (string dupe in dupes) {
            Console.WriteLine("{0}\n----------", dupe);

            for (int i = 0; i <= (files.Length - 1); i++)
                if (sums[i] == dupe)
                    Console.WriteLine(files[i]);

            Console.WriteLine();
        }
    }

    static List<string> SearchForDupes(List<string> sums) {
        // Search for duplicate files within the given list of sums.
        List<string> dupes = new List<string>();

        for (int i = 0; i <= (sums.Count - 2); i++)
            for (int j = (i + 1); j <= (sums.Count - 2); j++)
                if (sums[i] == sums[j])
                    if (!dupes.Contains(sums[i]))
                        dupes.Add(sums[i]);

        return dupes;
    }

    static void CheckBeforeProceeding(string[] args) {
        // Check things are good with the target dir before proceeding.
        if (args.Length == 0) {
            Console.WriteLine("Error: No directory provided");
            Environment.Exit(1);
        }

        if (!Directory.Exists(args[0])) {
            Console.WriteLine("Error: '{0}' is not a valid directory", args[0]);
            Environment.Exit(2);
        } 

        if (Directory.GetFiles(args[0]).Length == 0) {
            Console.WriteLine("Error: '{0}' does not contain any files", args[0]);
            Environment.Exit(3);
        }

        if (Directory.GetFiles(args[0]).Length == 1) {
            Console.WriteLine("Error: '{0}' only contains 1 file", args[0]);
            Environment.Exit(3);
        }
    }

    static string GetFileSum(string file) {
        // Function scalped from http://stackoverflow.com/a/10520086/1433400
        using (var sum = MD5.Create())
            using (var stream = File.OpenRead(file))
                return BitConverter.ToString(sum.ComputeHash(stream)).Replace("-","").ToLower();
    }
}

Here is some example output:

[mark][new_instagram] $ ~/dupe_find.exe .
d89b812d61bb41d037b1e6704d146d11
----------
./1.jpg
./17.jpg

4c6cc2794010fe3b018038b0e1162744
----------
./132.jpg
./312.jpg

...

Neater output from the program is left as an exercise to the reader.

by Mark -

Learn to program in the Linux shell from August 19!

posted in 091 labs with 2 comments

bash_classes_gif

Linux scripting and programming courses @091labs.
Four weeks, four classes, beginning August 19, 7pm.

Subjects include:

  • The fundamentals of computer programming.
  • Installation and maintenance of a Linux system.
  • Bash shell syntax, structures, and pipelines.
  • Practical tasks for shell scripts.
  • Passing data to and from binary programs.
  • Binary shell programming with C#/Mono.

Requirements:

  • Laptop computer.
  • Installation media (USB key, CD-R or DVD-R).

Cost:

  • €40 general rate for the full workshop.
  • €20 reduced rate for hackerspace members and students.

For more information, contact or visit:

Bare blurbs aside, I plan to use Ubuntu for this, and start off by showing people how to install Linux on their laptop. If they survive that (and the scary liability release for my assistance), delve into Boolean logic, basic structures, and hopefully devote most of my time to doing cool and practical things with the Bash shell.

I don’t intend to handhold through the Linux installation: If you want to program or script with Linux, I expect that you at least be comfortable enough to partition your laptop and install it. I really recommend that you come into this workshop with either a function Linux installation, OS X installation, or a Cygwin installation so you can begin working immediately – example code I use will be usable in both environments (except for the respective path differences).

The Big Topics of the shell workshop include:

  • Refreshment on elementary subjects: Boolean logic, Linux and its shell.
  • Input and output (STDIN, STDOUT, STDERR), and redirection.
  • Pipelines, and using them to build workflows.
  • Parsing, searching, and appending to files.
  • Coding standards and best practices.
  • Everyday uses and examples of shell scripts.
  • Including your own binary programs (C#/Mono) in your workflow. I will cover basics of this.

Come one, come all, and geek out to your heart’s content. :)

by Mark -

Round down a decimal of arbitrary precision to n.

posted in code with 0 comments

For Darren. :D

using System;

public class DecimalRound {
	static void Main(string[] args) {
		// args[0] is the decimal to round to.
		// args[1] is the decimal.
		int sentinel = 0;
		// 1. Split the number. decimalStr[0] is the leading number.
		string[] decimalStr = args[1].Split('.');
		// 2. decimalStr[1] is the decimal.
		int[] decimalInt = new int[decimalStr[1].Length];
		// 3. Parse sentinel value.
		if (!int.TryParse(args[0], out sentinel))
			Fail(args[0], true);
 
		// Convert the decimals characters to ints, after validation.
		int n = 0;
		foreach (char a in decimalStr[1]) {
			if (ValidChar(a))
				decimalInt[n] = CharToInt(a);
			else
				decimalInt[n] = -1;

			n++;
		}

		// 0-4, round down. 5-9, round up.
		for (int i = decimalInt.Length - 1; i > sentinel; i--) {
			if (i > 0) {
				if ((decimalInt[i] >= 0) && (decimalInt[i] <= 4))
					decimalInt[i - 1]--;
				else if ((decimalInt[i] >= 5) && (decimalInt[i] <= 9))
					decimalInt[i - 1]++;
			}
		}

		// Output. 
		Console.Write(decimalStr[0] + ".");
		for (int i = 0; i <= sentinel; i++)
			Console.Write(decimalInt[i]);
		Console.WriteLine();
	}

	static bool ValidChar(char a) {
		// Validates that it is 0-9, and not any other character value.
		switch (a) {
			case '0': return true;
			case '1': return true;
			case '2': return true;
			case '3': return true;
			case '4': return true;
			case '5': return true;
			case '6': return true;
			case '7': return true;
			case '8': return true;
			case '9': return true;
		}

		return false;
	}

	static int CharToInt(char a) {
		// Converts, if valid. 
		return Convert.ToInt32(a - '0');
	}

	static void Fail(string a, bool b) {
		// Spit out error message if you pass invalid characters.
		// b = true if this is fatal.
		Console.WriteLine("{0} is not valid.", a);

		if (b)
			Environment.Exit(1);
	}
}

by Mark -

[MUD] Muddy mudness

posted in code, me with 0 comments

I’ve thrown a lot of idle thoughts at the concept of the MUD project, and it keeps coming back to me that I would be literally out of my depth. I have, to date, scarcely finished one barely-working shooter, let alone delved into the intricacies of a graphical multiplayer role-playing game.

I am going to throw myself into an easier intermediate project, a Linux arena shooter game. The core principles of the arena game will carry over to the MUD:

  • Peer-to-peer and client-server multiplayer connectivity.
  • Persistent characters, and player progression.
  • AI pathing.
  • And more…

My first steps are to finish the design document and figure out the Monogame networking API. :)

by Mark -

[XNA] Flyatron 1.0!!

posted in college with 0 comments

You can grab the source here, and if you are so interested in trying the game then shoot me an email or leave a comment here!

Changelog:

General:
* New game icon!
* Added new font size. Game.cs:
* Collected nukes now accumulate for later use (right-mouse click).
* Tidied UpdatePlay(); some code was shuffled off to their correct methods.
* Tidied collision detection.
* Changed the soundtrack instance to a more appropriate name.

Player.cs:
* Tidied and clarified movement code.
* Slightly increased movement speed. Scores.cs:
* Quashed literally game-crashing bug in the generation of a blank/dummy score list.
* Renamed Increment() to Update() to more accurately reflect the in-use convention.
* Scores still do not collate corretly. I have identified the source for the behaviour, but a correction will have to wait.

Mines.cs:
* Fixed insta-death collision bug.
* Added a new state for being shot.
* Added mine health.
* Mines are slightly knocked back when shot.
* When shot, mine health deceases inverse to velocity.
* Mine changes colour to generally indicate remaining health.
* Added debug information on the mine’s state.

Missiles.cs:
* Renamed to Bullets.cs. Changed all mentions of “missile” to “bullet” as they had been equally mixed.

Music.cs:
* Music files are now loaded and swapped internally. Drastically reduces front-end code.

by Mark -

[XNA] Flyatron…works!

posted in code, college with 0 comments

Check out Flyatron on Github! Gameplay is barebones (you move, you shoot, you die), and buggy, but it runs. Current issues:

  1. Owing to the order in which mines despawn after detonating, bullets following the first bullet will despawn until the explosion animation concludes.
  2. Ordering, again: If the player hits a mine at their spawn point (you’ll see it in the video), you will continually hit it until you lose all of your lives.
  3. The Scores menu does not update correctly.
  4. There are no sound effects.

by Mark -