Comparing strings multiple times

Comparing strings multiple times - c#

I'm generating random scripts, but I have to guarantee that each new one is unique (hasn't been repeated before). So basically each script that has already been generated gets compared against every new script.
Instead of just using normal string compare, I'm thinking there must be a way to hash each new script so that comparison will be faster.
Any ideas on how to hash strings to make multiple comparisons faster?

One way is to use a HashSet<String>
The HashSetclass provides high performance set operations. A set is
a collection that contains no duplicate elements, and whose elements
are in no particular order.
HashSet<string> scripts = new HashSet<string>();
string generated_script = "some_text";
if (!scripts.Contains(generated_script)) // is HashSet<String> dont contains your string already then you can add it
{
scripts.Add(generated_script);
}
Also, You can check for existence of duplicate items in the array.
But this may not be very efficient as compared to HashSet<String>
string[] array = new[] {"demo", "demo", "demo"};
string compareWith = "demo";
int duplicates_count = array.GroupBy(x => x).Count(g => g.Count() > 1);

Use HashSet like below
string uniqueCode= "ABC";
string uniqueCode1 = "XYZ";
string uniqueCode2 = "ABC";
HashSet<string> uniqueList = new HashSet<string>();
uniqueList.Add(uniqueCode);
uniqueList.Add(uniqueCode1);
uniqueList.Add(uniqueCode2);
If you see the Count of uniqueList you will 2. so ABC will not be there two times.

You could use a HashSet. a hash-set is guaranteed to never contain duplicates

Store the script along with its hash:
class ScriptData
{
public ScriptData(string script)
{
this.ScriptHash=script.GetHashCode();
this.Script=script;
}
public int ScriptHash{get;private set;}
public string Script{get;private set;}
}
Then, whenever you need to check if your new random script is unique just take the hash code of the new script and seach all your ScriptData instances for any with the same hash code. If you dont find any you know your new random script is unique. If you do find some then they may be the same and you'll have to compare the actual text of the scripts in order to see if they're identical.

You can store each generated string in a HashSet.
For each new string you will call the method Contains which runs in O(1) complexity. This is an easy way to decide if the new generated string was generated before.

Related

compare string from 2 different collections and extract the difference

I need to compare 2 sets of string which have some similar names and I need to extract the similar names, how can I do that? They are both collections and lets say one of them is "Sanjay, Race" and the other is "Let, Sanjay", I need to extract Sanjay.

Depends on what data structure you have but I suggest you work with an Array or a List if you collection is big enough to care about optimisation.
You want to go through the first of the two lists, and for each element of list1, compare to compare to every element of list2. Be careful, this might take a while (if your collection is big enough).
Might look like :
using System.Collections.Generic;
LinkedList<string> set1 = new LinkedList<string>();
LinkedList<string> set2 = new LinkedList<string>();
LinkedList<string> extracted = new LinkedList<string>();
//fill in your sets with loops if needed :
see https://learn.microsoft.com/fr-fr/dotnet/api/system.collections.generic.linkedlist-1?view=net-7.0
foreach (string name in set1){
foreach (string name2 in set2){
if(string.Compare(name,name2)==0){
extracted.AddAfter(name);
}
}
}
Please, do correct me (nicely) :)

Random String generation - avoiding duplicates

I am using the below code to generate random keys like the following TX8L1I
public string GetNewId()
{
string result = string.Empty;
Random rnd = new Random();
short codeLength = 5;
string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
StringBuilder builder = new StringBuilder(codeLength);
for (int i = 0; i < codeLength; ++i)
builder.Append(chars[rnd.Next(chars.Length)]);
result = string.Format("{0}{1}", "T", builder.ToString());
return result;
}
Every-time a key is generated a correspondent record is created on the database using the generated result as primary key. Off course this is not safe since a primary key violation might occur.
What's the correct approach to this? Should I load all existing keys and verify against tha t list if the key already exist and if yes generate another one?
Or maybe a more efficient way would be to move this logic into the database side? But even on the database side I still need to check that the random key generated does not exist on the current table.
Any ideas?
Thanks

Move the decision to the database. Add a primary key of type uniqueidentifier. Then its a simple "fire and forget" algorithm.. the database decides what the key is.

I think you should use Random instance outside of your method.
Since Random object is seeded from the system clock, which means that if you call your method several times very quickly, it will use the same seed each time, which means that you'll get the same string at the end.

well my decision to use the random is that I don't want the users to be able to know how the keys are generated (format), since the website is public and I don't want users to be trying to access keys that.
You've merged two problems into one that are really separate:
Identify a row in a database.
Have an identifier that is passed to and from clients, that matches this, but is not predictable.
This is a specific case of a more general set of two problems:
Identify a row in a database.
Have an identifier that is passed to and from clients, that matches this.
In this case, when we don't care about guessing, then the easiest way to deal with it is to just use the same identifier. E.g. for the database row identified by the integer 42 we use the string "42" and the mapping is a trivial int.Parse() or int.TryParse() in one direction and a trivial .ToString() or implicit .ToString() in the other.
And that's therefore the pattern we use without even thinking about it; public IDs and database keys are the same (perhaps with some conversion).
But the best solution to your specific case where you want to prevent guessing is not to change the key, but to change mapping between the key and the public identifier.
First, use auto-incremented integers ("IDENTITY" in SQL Server, and various similar concepts in other databases).
Then, when you are sending the key to the client (i.e. using it in a form value or appending it to a URI) then map it as so:
private const string Seed = "this is my secret seed ¾Ÿˇʣכ ↼⊜┲◗ blah balh";
private static string GetProtectedID(int id)
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
return string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(id.ToString() + Seed)).Select(b => b.ToString("X2"))) + id.ToString();
}
}
For example, with an ID of 123 this produces "989178D90470D8777F77C972AF46C4DED41EF0D9123".
Now map back to a the key with:
private static bool GetIDFromProtectedID(string str, out int id)
{
int chkID;
if(int.TryParse(str.Substring(40), out chkID))
{
using(var sha = System.Security.Cryptography.SHA1.Create())
{
if(string.Join("", sha.ComputeHash(Encoding.UTF8.GetBytes(chkID.ToString() + Seed)).Select(b => b.ToString("X2"))) == str.Substring(0, 40))
{
id = chkID;
return true;
}
}
}
id = 0;
return false;
}
For "989178D90470D8777F77C972AF46C4DED41EF0D9123" this returns true and sets the id argument to 123. For "989178D90470D8777F77C972AF46C4DED41EF0D9122" (because I tried to guess the ID to attack your site) it returns false and sets id to 0. (The correct ID for key 122 is "F8AD0F55CA1B9426D18F684C4857E0C4D43984BA122", which is not easy to guess from having seen that for 123).
If you want, you can remove some of the 40 characters of the output to produce smaller IDs. This makes it less secure (an attacker has fewer possible IDs to brute-force) but should still be reasonable for many uses.
Obviously, you should used a different value for Seed than here, so that someone reading this answer can't use it to predict your ID; change the seed, change the ID. Seed cannot be changed once set without altering every ID in the system. Sometimes that's a good thing (if identifiers are never meant to have long-term value anyway), but normally it's bad (you've just 404d every page in the site that used it).

I would move the logic to the database instead. Even though the database still have to check for the existence of the key, it's already within the database operation at that point, so there is no back-and-forth happening.
Also, if there's an index on this randomly generated key, a simple if exists will be quick enough to determine if the key has been used before or not.

You can do the following:
Generate your string as well as concatenate the following onto it
string newUniqueString = string.Format("{0}{1}", result, DateTime.Now.ToString("yyyyMMddHHmmssfffffff"));
This way, you will never have the same key ever again!
Or use
var StringGuid = Guid.NewGuid();

You've commited a typical sin with Random: one shouldn't create Random instance
each time they want to generate random value (otherwise one'll get badly skewed distribution with many repeats). Put Random away form function:
// Let Generator be thread safe
private static ThreadLocal<Random> s_Generator = new ThreadLocal<Random>(
() => new Random());
public static Random Generator {
get {
return s_Generator.Value;
}
}
// For repetition test. One can remove repetion test if
// number of generated ids << 15000
private HashSet<String> m_UsedIds = new HashSet<String>();
public string GetNewId() {
int codeLength = 5;
string chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
while (true) {
StringBuilder builder = new StringBuilder(codeLength);
for (int i = 0; i < codeLength; ++i)
builder.Append(chars[Generator.Next(chars.Length)]); // <- same generator for all calls
result = string.Format("{0}{1}", "T", builder.ToString());
// Test if random string in fact a repetition
if (m_UsedIds.Contains(result))
continue;
m_UsedIds.Add(result);
return result;
}
}
For codeLength = 5 there're Math.Pow(chars.Length, codeLength) possible
strings (Math.Pow(36, 5) == 60466176 = 6e7). According to birthday paradox
you can expect 1st repetition at about 2 * sqrt(possible strings) == 2 * sqrt(6e7) == 15000. If it's OK, you can skip the repetition test, otherwise HashSet<String> could
be a solution.

Check if Characters in ArrayList C# exist - C# (2.0)

I was wondering if there is a way in an ArrayList that I can search to see if the record contains a certain characters, If so then grab the whole entire sentence and put in into a string. For Example:
list[0] = "C:\Test3\One_Title_Here.pdf";
list[1] = "D:\Two_Here.pdf";
list[2] = "C:\Test\Hmmm_Joke.pdf";
list[3] = "C:\Test2\Testing.pdf";
Looking for: "Hmmm_Joke.pdf"
Want to get: "C:\Test\Hmmm_Joke.pdf" and put it in the Remove()
protected void RemoveOther(ArrayList list, string Field)
{
string removeStr;
-- Put code in here to search for part of a string which is Field --
-- Grab that string here and put it into a new variable --
list.Contains();
list.Remove(removeStr);
}
Hope this makes sense. Thanks.

Loop through each string in the array list and if the string does not contain the search term then add it to new list, like this:
string searchString = "Hmmm_Joke.pdf";
ArrayList newList = new ArrayList();
foreach(string item in list)
{
if(!item.ToLower().Contains(searchString.ToLower()))
{
newList.Add(item);
}
}
Now you can work with the new list that has excluded any matches of the search string value.
Note: Made string be lowercase for comparison to avoid casing issues.

In order to remove a value from your ArrayList you'll need to loop through the values and check each one to see if it contains the desired value. Keep track of that index, or indexes if there are many.
Then after you have found all of the values you wish to remove, you can call ArrayList.RemoveAt to remove the values you want. If you are removing multiple values, start with the largest index and then process the smaller indexes, otherwise, the indexes will be off if you remove the smallest first.

This will do the job without raising an InvalidOperationException:
string searchString = "Hmmm_Joke.pdf";
foreach (string item in list.ToArray())
{
if (item.IndexOf(searchString, StringComparison.OrdinalIgnoreCase) >= 0)
{
list.Remove(item);
}
}
I also made it case insensitive.
Good luck with your task.

I would rather use LINQ to solve this. Since IEnumerables are immutable, we should first get what we want removed and then, remove it.
var toDelete = Array.FindAll(list.ToArray(), s =>
s.ToString().IndexOf("Hmmm_Joke.pdf", StringComparison.OrdinalIgnoreCase) >= 0
).ToList();
toDelete.ForEach(item => list.Remove(item));
Of course, use a variable where is hardcoded.
I would also recommend read this question: Case insensitive 'Contains(string)'
It discuss the proper way to work with characters, since convert to Upper case/Lower case since it costs a lot of performance and may result in unexpected behaviours when dealing with file names like: 文書.pdf

Naming a variable from a text file

I'm making a program in C# that uses mathematical sets of numbers. I've defined the class Conjunto (which means "set" in spanish). Conjunto has an ArrayList that contains all the numbers of the set. It also has a string called "ID" which is pretty much what it sounds; the name of an instance of Conjunto.
The program have methods that applies the operations of union, intersection, etc, between the sets.
Everything was fine, but now i've a text file with sentences like:
A={1,2,3}
B={2,4,5}
A intersection B
B union A
And so on. The thing is, i don't know how many sets the text file contains, and i don't know how to name the variables after those sentences. For example, name an instance of Conjunto A, and name another instance B.
Sorry for the grammar, english is not my native language.
Thanks!

It's pretty complicated to create varaibles dynamically, and pretty useless unless you have some already existing code that expects certain variables.
Use a Dictionary<string, Conjunto> to hold your instances of the class. That way you can access them by name.

First off, If you don't target lower version than .Net 2.0 use List instead of ArrayList. If I were you I wouldn't reinvent the wheel. Use HashSet or SortedSet to store the numbers and then you can use defined union and intersection.
Secondly, what is your goal? Do want to have just the output set after all operations? Do you want to read and store all actions and them process it on some event?

First of all, your program is taken from bad side. I would advice to start making new one. One of ways to name "variables" dynamicaly is by making class objects and editing their properties.
This is what I made as a starting platform:
First af all I have crated a class called set
class set
{
public string ID { get; set; }
public List<int> numbers { get; set; }
}
Then I have made the code to sort whole textfile into list of those classes:
List<set> Sets = new List<set>();
string textfile = "your text file";
char[] spliter = new char[] { ',' }; //switch that , to whatever you want but this will split whole textfile into fragments of sets
List<string> files = textfile.Split(spliter).ToList<string>();
int i = 1;
foreach (string file in files)
{
set set = new set();
set.ID = i.ToString();
char[] secondspliter = new char[] { ',' }; //switch that , to whatever you want but this will split one set into lone numbers
List<string> data = textfile.Split(secondspliter).ToList<string>();
foreach (string number in data)
{
bool success = Int32.TryParse(number, out int outcome);
if (success)
{
set.numbers.Add(outcome);
}
}
i++;
Sets.Add(set);
}
Hope it helps someone.

Calling a list of methods in a random sequence?

I have a list of 10 methods. Now I want to call this methods in a random sequence. The sequence should be generated at runtime. Whats the best way to do this?

It is always astonishing to me the number of incorrect and inefficient answers one sees whenever anyone asks how to shuffle a list of things on StackOverflow. Here we have several examples of code which is brittle (because it assumes that key collisions are impossible when in fact they are merely rare) or slow for large lists. (In this case the problem is stated to be only ten elements, but when possible surely it is better to give a solution that scales to thousands of elements if doing so is not difficult.)
This is not a hard problem to solve correctly. The correct, fast way to do this is to create an array of actions, and then shuffle that array in-place using a Fisher-Yates Shuffle.
http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
Some things not to do:
Do not implement Fischer-Yates shuffle incorrectly. One sees more incorrect than correct implementations of this trivial algorithm. In particular, make sure you are choosing the random number from the correct range. Choosing it from the wrong range produces a biased shuffle.
If the shuffle algorithm must actually be unpredictable then use a source of randomness other than Random, which is only pseudo-random. Remember, Random only has 232 possible seeds, and therefore there are fewer than that many possible shuffles.
If you are going to be producing many shuffles in a short amount of time, do not create a new instance of Random every time. Save and re-use the old one, or use a different source of randomness entirely. Random chooses its seed based on the time; many Randoms created in close succession will produce the same sequence of "random" numbers.
Do not sort on a "random" GUID as your key. GUIDs are guaranteed to be unique. They are not guaranteed to be randomly ordered. It is perfectly legal for an implementation to spit out consecutive GUIDs.
Do not use a random function as a comparator and feed that to a sorting algorithm. Sort algorithms are permitted to do anything they please if the comparator is bad, including crashing, and including producing non-random results. As Microsoft recently found out, it is extremely embarrassing to get a simple algorithm like this wrong.
Do not use the input to random as the key to a dictionary, and then sort the dictionary. There is nothing stopping the randomness source from choosing the same key twice, and therefore either crashing your application with a duplicate key exception, or silently losing one of your methods.
Do not use the algorithm "Create two lists. Add the elements to the first list. Repeatedly move a random element from the first list to the second list, removing the element from the first list". If the list is O(n) to remove an item then this is an O(n2) algorithm.
Do not use the algorithm "Create two lists. Add the elements to the first list. Repeatedly move a random non-null element from the first list to the second list, setting the element in the first list to null." Also do not do this crazy equivalent of that algorithm.If there are lots of items in the list then this gets slower and slower as you start hitting more and more nulls.

New, short answer
Starting from where Ilya Kogan left off, totally correct after we had Eric Lippert find the bug:
var methods = new Action[10];
var rng = new Random();
var shuffled = methods.Select(m => Tuple.Create(rng.Next(), m))
.OrderBy(t => t.Item1).Select(t => t.Item2);
foreach (var action in shuffled) {
action();
}
Of course this is doing a lot behind the scenes. The method below should be much faster. But if LINQ is fast enough...
Old answer (much longer)
After stealing this code from here:
public static T[] RandomPermutation<T>(T[] array)
{
T[] retArray = new T[array.Length];
array.CopyTo(retArray, 0);
Random random = new Random();
for (int i = 0; i < array.Length; i += 1)
{
int swapIndex = random.Next(i, array.Length);
if (swapIndex != i)
{
T temp = retArray[i];
retArray[i] = retArray[swapIndex];
retArray[swapIndex] = temp;
}
}
return retArray;
}
the rest is easy:
var methods = new Action[10];
var perm = RandomPermutation(methods);
foreach (var method in perm)
{
// call the method
}

Have an array of delegates. Suppose you have this:
class YourClass {
public int YourFunction1(int x) { }
public int YourFunction2(int x) { }
public int YourFunction3(int x) { }
}
Now declare a delegate:
public delegate int MyDelegate(int x);
Now create an array of delegates:
MyDelegate delegates[] = new MyDelegate[10];
delegates[0] = new MyDelegate(YourClass.YourFunction1);
delegates[1] = new MyDelegate(YourClass.YourFunction2);
delegates[2] = new MyDelegate(YourClass.YourFunction3);
and now call it like this:
int result = delegates[randomIndex] (48);

You can create a shuffled collection of delegates, and then call all methods in the collection.
Here is an easy way of doing so using a dictionary. The keys of the dictionary are random numbers, and the values are delegates to your methods. When you iterate through the dictionary, it has the effect of shuffling.
var shuffledActions = actions.ToDictionary(
action => random.Next(),
action => action);
foreach (var pair in shuffledActions.OrderBy(item => item.Key))
{
pair.Value();
}
actions is an enumerable of your methods.
random is a of type Random.

Think that this is a list of objects and you want it to extract the objects randomly. You can get a random index using the Random.Next Method (always use current List.Count as parameter) and after that remove object from the list so it will not be drawn again.

When processing a list in a random order, the natural inclination is to shuffle a list.
Another approach is to just keep the list order, but randomly select and remove each item.
var actionList = new[]
{
new Action( () => CallMethodOne() ),
new Action( () => CallMethodTwo() ),
new Action( () => CallMethodThree() )
}.ToList();
var r = new Random();
while(actionList.Count() > 0) {
var index = r.Next(actionList.Count());
var action = actionList[index];
actionList.RemoveAt(index);
action();
}

I think:
Via reflection get Method Objects;
create an array of created Method Object;
generate random index (normalize range);
invoke method;
You can remove method from array to execute method one times.
Bye

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.