In my most recent adventures in iOS I had a string hashing problem that led me down the road of creating a machine learning algorithm in order to find the correct random hash string I was looking for. Looking back at the solution now, I was able to solve my problem successfully, but the solution I came up with to solve this problem is quite unsettling due to the variable results in time complexity derived from my solution. The ending time complexity in my solution was within in a range between 240 all the way to the upper bound of 2200 iterations. While I can certainly live with the lower bound of 240, the variable amount above 240 all the way to 2200 is quite unsettling.
The algorithm I had to solve used a specific set of letters and a specific key to generate an 8 character string outcome. In a sense, alsmost like generating a hashed string. The rules were that given these 16 characters, "acdegilmnoprstuw", and this key, "25377615533200," I needed to find an 8 character string output based upon this equation: n = (n * 37 + p). Kind of confusing at first, right? That is what I thought too as first, so to try and make this problem easier I started off by breaking this problem down into a step by step approach. The first step I took in solving this problem was to realize that I already had two very very important pieces of the puzzle, those being; the ending numerical key, "25377615533200," and the initial set of characters "acdegilmnoprstuw." These items are important because I am only dealing with 16 characters, and the key is the final result that allows me to check if I am solving the equation correctly with the right hash string. So, having these two important pieces of the puzzle allowed me to move on to the next step in my solution and that was to create what is called a brute force solution. A brute force solution is a solution that just basically gets the job done, it does not perform well under scale, and the time complexity is often the worst possible scenario possible, but what this solution does for us is give us insight into how the problem is solved and what the expected outcome is. Then once we have the brute force solution we can work on optimization techniques to make the runtime complexity as linear as possible. At this point it is often a good idea to set a benchmark goal for speed. Looking at the possibilities of the solution it looks like O(N*N) or O(256) might be initially a nice goal to shoot for.
The following code below is a brute force solution written in Swift. The entry point of the solution is the executeHashLoop() function. This function gathers a random string from the getRandomString function with a string length of 8 characters from the 16 character given list, then given that 8 character string, looks through an array called alreadyUsed to see if the string has been used as a failed attempt in a previous loop. If the string has been attempted then the while loop continues and another string is randomly generated. If the string has not been attempted it is added to the attempt array, the string is sent to the hashString function and the index integer is incremented. The index integer lets us know how many failed attempts were tried before a good match was found.
The hashString function is the function that runs the equation mentioned earlier, n = (n * 37 + p). This function takes the randomly generated string and breaks it out into a character array and then runs a for loop for the size of the character array, which is 8. Next. before we run the for loop, you will notice that the h integer was set to 7. This was also given information and h will be the value affect on each iteration when the equation is run. Here is an example of the first iteration result using the string "deilmnsw." h = (7 * 37 + 2), which in turn equals: 261. The pos variable is the index position of d in the letters array so it can be added to the final result of the multiplied h variable. After the for loop is finished running it will produce a key from the variable h and that variable will be used to determine if the random string was the correct random string or not. If it was, the found Bool value is set to true and the loop stops and the string value is output in the console. If the it was not found the brute force solution keeps running.
This solution is a very very poor way to find the final solution, but given enough time it will find the solution. I have seen it run in the simulator for up to 2 hours before. This is probably the worst case scenario for a solution. Just to put it into perspective, a real solution should take seconds to execute. However, this does what we need it to do and that is find us the correct answer.
// // BruteForce.swift // SwfitHash // // Created by Matt Eaton on 6/5/16. // Copyright © 2016 SwingShift. All rights reserved. // import Foundation class BruteForce { // Class properties // Already used array to keep record of all of our tested attempts var alreadyUsed = [String]() // Index to keep track of the total iterations var index:Int = 0 // Flag if the hash string has been found var found:Bool = false // The known good key let key: String = "25377615533200" // The letters we were given let letters:String = "acdegilmnoprstuw" /** * Utility function that multiplies 37 * calculated number plus the character index * This is in attempt to generate a number key that matches the given one (25377615533200) * If this function matches the key it will flag to stop the program and display results * * @param hash * This is the random hash that will be attempted * */ func hashString(hash:String) { // The h int to be calculated to form the potential key var h:Int = 7 // Hash array forms an array of characters based upon the input string let hashArray = [Character](hash.characters) // Loop through all characters in the hash array and perform the calculation for i in (0..<hashArray.count) { let ind = letters.characters.indexOf(hashArray[i]) let pos = letters.startIndex.distanceTo(ind!) h = (h * 37 + Int(pos)) } // Check to see if the string version of the potential key matches the given key if (key == String(h)) { // If there is a match flag to stop the program and display results print("String found on attempt \(index) using string \(hash) to match \(h)") found = true } } /** * Utility function that generates a random string based upon a specific set of characters * * @param stringLength * The length of the desired random string * * @return string * The random string generated */ func getRandomString(stringLength:Int) -> String { // The character array let characterArray = [Character](letters.characters) // Random string about to be generated var randomString: String = "" // For loop to generate a string based upon the passed in size // The string is generated at random based off the letters string which is not the characterArray for _ in (0..<stringLength) { let rand = Int(arc4random_uniform(UInt32(characterArray.count))) randomString += String(characterArray[rand]) } // Return the new string return randomString } /** * Execution function that performs five operations * 1) Gathers a random string at a dynamic length * 2) Checks to see if the random string is in the used array already * 3) Sends the the semi random string to be calculated into a numeric value in the hashString function * 4) Exits the loop before iteration 300000 */ func executeHashLoop() { // Lets the user know that we have started looking for the string print("Looking for string...") // While loop that performs the brute force functionality while (found == false && index<300000) { let attempt:String = getRandomString(8) // Make sure the attempt string has not already been tried to save processing power if !alreadyUsed.contains(attempt) { // Add to attempt array alreadyUsed.append(attempt) // Run the hash algorithm hashString(attempt) // Increment the index index+=1 } // This just lets the user know that the loop is done if for some reason the correct match is not found already if (index==299999) { print("Exiting program at index 299,999") return } // Utility check to display if the operating index is above a certain thousand if (index % 10000 == 0) { print("Currently iterating at index \(index)") } } } }
While the brute force solution worked to find the unknown string (nsanswer) it was certainly only a first step and not a final solution. The next step in the process is to try and cut down the time complexity to be as linear as possible. To do this I tried many techniques, but ultimately, when I wanted to test one of my new techniques I would have to run my semi-brute force solution to see if the latest technique had any impact on the overall time complexity of the solution. This as you can imagine started burning a lot of my time and became frustrating because the original brute force solution takes a very very long time to find the answer. So to curve this time in testing new techniques I devised an approach that would generate random strings that were 4 characters long instead of the original 8 characters and then prefix the random characters with 4 correct characters, i.e., nsan + glin. This dramatically reduced the time to find the answer and allowed me to test solutions. From here I sort of hit a wall in coming up with techniques to shave off the time complexity. My problem at this point was that I was always trying to figure out a way to reverse engineer the ending key rather than to work with the results I was generating on each iteration.
This stumped me for awhile until it dawned on me that instead of trying to come up with a better solution to generate a defined key that I could take the incorrect results that I was testing and extract little bits of possible good information from these incorrect results. This is what started me down the road of what I consider machine learning. My idea was to take the idea that I had been using with the prefixed string and try and keep generating larger and larger prefixed strings based off of keys that were generated from incorrect results. That way if I generated a key and matched for example the first 3 or 4 digits in the incorrect key against the given key then I knew I had a possible character match on my random string that I was generating. I would cut off these matched characters then and use them for a prefixed string and start generating less and less random character strings. Below is an example in Swift of the complete solution.
// // Hash.swift // SwfitHash // // Hash generation program that builds a hash string based upon failed attempts. // The failed attempts are checked for potential matches and saved as prefix characters to lessen the amount randomized strings that need to be generated downstream. // This is sort of like machine learning in a sense that everytime information is generated it is checked for potential good results and then these good results end up // allowing the program to find the unkwown string in the shortest amount of time // // On average the loop is iterated around 1000 times to find the unknown hash // I have seen results come back in on the low end of 260 iterations and on the high end of 2200 iterations // // Created by Matt Eaton on 5/1/16. // Copyright © 2016. All rights reserved. // import Foundation class Hash { // Class properties // Already used array to keep record of all of our tested attempts var alreadyUsed = [String]() // Index to keep track of the total iterations var index:Int = 0 // The match index of the potential key and the key var matchIndex:Int = 0 // Flag if the hash string has been found var found:Bool = false // The known good key let key: String = "25377615533200" // The letters we were given let letters:String = "acdegilmnoprstuw" // The blank prefix string to hold our good characters var prefixString:String = "" /** * Utility function that attempts to define a match based upon a failed attempt. * Even though the potential key was not a match there could be some good information we can extract. * This function attempts to generate prefix characters from the potential key based upon incremental string matches from the given key. * This is sort of like machine learning in a sense because we are going to take the known good information and use it to help the program out downstream so it does not have to generate a random string that is 8 characters every time. Thus cutting iterations off our total loop. * * @param hash * The generated potential key * @param randomStr * The random string generated to exrtract prefix characters from */ func attemptPrefix(hash:String, randomStr:String) { // Generate three sets of substrings to try and reduce the amount of false positives in the calculation let p_str = hash.substringFromIndex(hash.startIndex.advancedBy(matchIndex)).substringToIndex(hash.startIndex.advancedBy((matchIndex + 1))) let k_str = key.substringFromIndex(key.startIndex.advancedBy(matchIndex)).substringToIndex(key.startIndex.advancedBy((matchIndex + 1))) let p_str2 = hash.substringFromIndex(hash.startIndex.advancedBy(matchIndex)).substringToIndex(hash.startIndex.advancedBy((matchIndex + 2))) let k_str2 = key.substringFromIndex(key.startIndex.advancedBy(matchIndex)).substringToIndex(key.startIndex.advancedBy((matchIndex + 2))) let p_str3 = hash.substringFromIndex(hash.startIndex.advancedBy(matchIndex)).substringToIndex(hash.startIndex.advancedBy((matchIndex + 3))) let k_str3 = key.substringFromIndex(key.startIndex.advancedBy(matchIndex)).substringToIndex(key.startIndex.advancedBy((matchIndex + 3))) // IF the three sets of substrings match then it is safe to extract the first character at matchIndex to add to the prefixString if (p_str == k_str && p_str2 == k_str2 && p_str3 == k_str3) { prefixString += String(randomStr[randomStr.startIndex.advancedBy(matchIndex)]) matchIndex+=1 // Detect if we can run this function again try and define another prefix character if ((matchIndex * 2 + 3) < hash.characters.count) { attemptPrefix(hash, randomStr: randomStr); } } } /** * Utility function that multiplies 37 * calculated number plus the character index * This is in attempt to generate a number key that matches the given one (25377615533200) * If this function matches the key it will flag to stop the program and display results * * @param hash * This is the random hash that will be attempted * */ func hashString(hash:String) { // The h int to be calculated to form the potential key var h:Int = 7 // Hash array forms an array of characters based upon the input string let hashArray = [Character](hash.characters) // Loop through all characters in the hash array and perform the calculation for i in (0..<hashArray.count) { let ind = letters.characters.indexOf(hashArray[i]) let pos = letters.startIndex.distanceTo(ind!) h = (h * 37 + Int(pos)) } // Check to see if the string version of the potential key matches the given key if (key == String(h)) { // If there is a match flag to stop the program and display results print("String found on attempt \(index) using string \(hash) to match \(h)") found = true } else { // If there is not a match send the potential key and the passed in hash string to attempt to generate a prefix if ((matchIndex * 2 + 3) < String(h).characters.count) { attemptPrefix(String(h), randomStr:hash); } } } /** * Utility function that generates a random string based upon a specific set of characters * * @param stringLength * The length of the desired random string * * @return string * The random string generated */ func getRandomString(stringLength:Int) -> String { // The character array let characterArray = [Character](letters.characters) // Random string about to be generated var randomString: String = "" // For loop to generate a string based upon the passed in size // The string is generated at random based off the letters string which is not the characterArray for _ in (0..<stringLength) { let rand = Int(arc4random_uniform(UInt32(characterArray.count))) randomString += String(characterArray[rand]) } // Return the new string return randomString } /** * Execution function that performs five operations * 1) Gathers a random string at a dynamic length * 2) Appends that string to the prefix string to form the semi random string. * 3) Checks to see if the newly semi random string is in the used array already * 4) Sends the the semi random string to be calculated into a numeric value in the hashString function * 5) Exits the loop before iteration 5000 */ func executeHashLoop() { // Lets the user know that we have started looking for the string print("Looking for string...") // While loop that performs the brute force functionality // This loop gets a string randomly from our set of characters // After that the characters are appended to the prefix and sent // to be checked based upon the algorithm provided while (found == false && index<5000) { var randomLength:Int = 8 randomLength-=prefixString.characters.count let random:String = getRandomString(randomLength) let attempt:String = prefixString + random // Make sure the attempt string has not already been tried to save processing power if !alreadyUsed.contains(attempt) { // Add to attempt array alreadyUsed.append(attempt) // Run the hash algorithm hashString(attempt) // Increment the index index+=1 } // This just lets the user know that the loop is done if for some reason the correct match is not found already if (index==4999) { print("Exiting program at index 4999") return } // Utility check to display if the operating index is above a certain thousand if (index % 1000 == 0) { print("Currently iterating at index \(index)") } } } }
How the complete solution differs from the brute force solution is that in the hashString() function I check for the key to be correct, if it is not then I send it to the attemptPrefix() function where using an integer called matchIndex I get character string from the original key and the recently incorrect key and check them against each other to see if all three match. If they do, then I know that I can get that first character and add it to the prefix string, increment the match index, and send the key to the attemptPrefix() function again to if I can extract another good match off of the incorrect key. Once this whole process has completed running I generate a new random string based upon the new length of the prefixString now to cut the total time complexity down by a dramatic percentage with each new character.
In the end the brute force solution can take hundreds of thousands of iterations to find the correct solution while the machine learning solution I outlined above can take anywhere from 240 - 2200 iterations. While the machine learning solution is more far efficient than the brute force solution I still think that the time complexity of 2200 is a little bit high and there still could be a more elegant way to solve this problem. Some things I am wondering is if there is a way to mathematically reverse engineer the given key, or if there is a way to make my existing solution even more efficient?
Please let me know your thoughts or if you have any suggestions for this solution. I would love to hear it. Also, I have a Swift and Objective-C version of this code up on my Github repo located here, if you want to take a look.