I'm not sure how best to phrase this. I have a text file of almost 80,000 words which I have converted across to a string array.
Basically I want a method where I pass it a word and it checks if it's in the word string array. To save it searching 80,000 each time I have indexed the locations where the words beginning with each letter start and end in a two dimensional array. So wordIndex[0,0] = 0 when the 'A' words start and wordIndex[1,0] = 4407 is where they end. Then wordIndex[0,1] = 4408 which is where the words beginning with 'B' start etc.
What I would like to know is how can I present this range to a method to have it search for a value. I know I can give an index and length but is this the only way? Can I say look for x within range y and z?
Look at Trie set. It can help you to store many words using few memory and quick search. Here is good implementation.
Basically you could use a for loop to search just a part of the array:
string word = "apple";
int start = 0;
int end = 4407;
bool found = false;
for (int i = start; i <= end ; i++)
{
if (arrayOfWords[i] == word)
{
found = true;
break;
}
}
But since the description of your index implies that your array is already sorted a better way might be to go with Array.BinarySearch<T>.
Related
My file numbering system just rolled over 100,000 which is causing some issues. Namely it causes programs to organize #100,000 before #99,999 because it sees the 1 first.
For example, another program would read the files in ascending order like this:
XXXX_100000_XXXXXX.file
XXXX_10001_XXXXXX.file
XXXX_99999_XXXXXX.file
But it should go:
XXXX_10001_XXXXXX.file
XXXX_99999_XXXXXX.file
XXXX_100000_XXXXXX.file
I have a function that reads all the files, sorts them by number, and puts them in a new array in order. Here's some pseudo code:
while(my directory has more files)
//this entire chunk assigns the number part of the filename to an int
string filename = my file
string num = filename[5] through filename[11]
//checks if the number is 5 digits, if yes, removes the underscore
if(num at position [11] == "_"){
num = num[5] through num[10]
}
int fileNum = num.toInteger
//now I have the number as an int
EDIT:
I just realized I could much more easily get the number by calling .Split on the filename and converting arr[1] to an int. I'll leave the old code for fun though.
Here's where I'm stuck. I want to feed these into a new array, sorted, or make the array sortable after everything is in there.
Do I need to create an object with the filename and number as elements, feed all the objects in, and then sort the array by number? I know that would work, but I can't help but thinking there's a more efficient way of doing this.
I don't need code written for me, I just need help working out the algorithm logic, or if my way is already the best way, let me know!
If you have an unsorted array of file names e.g.
string[] fileNames = ...
and a function for extracting the number from the name e.g.
public static int GetFileNumber(string myfile) {
string num = filename[5] through filename[11]
//checks if the number is 5 digits, if yes, removes the underscore
if(num at position [11] == "_"){
num = num[5] through num[10]
}
return num.toInteger
}
then you can sort them using Array.Sort:
Array.Sort(fileNames, (f1, f2) => GetFileNumber(f1).CompareTo(GetFileNumber(f2)));
Can you try using an int array for the number and string array for filename and then push it to collection and in the end sort the collection?
Create the collection with
https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/collections
And then sort the collection with
https://sankarsan.wordpress.com/2011/05/07/sorting-collections-in-c/
Sorry if I was not much of a help.
You can sort the string-list using an "Alphanumeric sort". This will put "100" after "9".
Example: https://www.dotnetperls.com/alphanumeric-sorting
I have a task, in which i have to write a function called accum, which transforms given string into something like this:
Accumul.Accum("abcd"); // "A-Bb-Ccc-Dddd"
Accumul.Accum("RqaEzty"); // "R-Qq-Aaa-Eeee-Zzzzz-Tttttt-Yyyyyyy"
Accumul.Accum("cwAt"); // "C-Ww-Aaa-Tttt"
So far I only converted each letter to uppercase and... Now that I am writing about it, I think it could be easier for me to - firstly multiply the number of each letter and then add a dash there... Okay, well let's say I already multiplied the number of them(I will deal with it later) and now I need to add the dash. I tried several manners to solve this, including: for and foreach(and now that I think of it, I can't use foreach if I want to add a dash after multiplying the letters) with String.Join, String.Insert or something called StringBuilder with Append(which I don't exactly understand) and it does nothing to the string.
One of those loops that I tried was:
for (int letter = 0; letter < s.Length-1; letter += 2) {
if (letter % 2 == 0) s.Replace("", "-");
}
and
for (int letter = 0; letter < s.Length; letter++) {
return String.Join(s, "-");
}
The second one returns "unreachable code" error. What am I doing wrong here, that it does nothing to the string(after uppercase convertion)? Also, is there any method to copy each letter, in order to increase the number of them?
As you say string.join can be used as long as an enumerable is created instead of a foreach. Since the string itself is enumerable, you can use the Linq select overload which includes an index:
var input = "abcd";
var res = string.Join("-", input.Select((c,i) => Char.ToUpper(c) + new string(Char.ToLower(c),i)));
(Assuming each char is unique or can be used. e.g. "aab" would become "A-Aa-Bbb")
Explanation:
The Select extension method takes a lambda function as parameter with c being a char and i the index. The lambda returns an uppercase version of the char (c) folowed by a string of the lowercase char of the index length (new string(char,length)), (which is an empty string for the first index). Finally the string.join concatenates the resulting enumeration with a - between each element.
Use this code.
string result = String.Empty;
for (int i = 0; i < s.Length; i++)
{
char c = s[i];
result += char.ToUpper(c);
result += new String(char.ToLower(c), i);
if (i < s.Length - 1)
{
result += "-";
}
}
It will be better to use StringBuilder instead of strings concatenation, but this code can be a bit more clear.
Strings are immutable, which means that you cannot modify them once you created them. It means that Replace function return a new string that you need to capture somehow:
s = s.Replace("x", "-");
you currently are not assigning the result of the Replace method anywhere, that's why you don't see any results
For the future, the best way to approach problems like this one is not to search for the code snippet, but write down step by step algorithm of how you can achieve the expected result in plain English or some other pseudo code, e.g.
Given I have input string 'abcd' which should turn into output string 'A-Bb-Ccc-Dddd'.
Copy first character 'a' from the input to Buffer.
Store the index of the character to Index.
If Buffer has only one character make it Upper Case.
If Index is greater then 1 trail Buffer with Index-1 lower case characters.
Append dash '-' to the Buffer.
Copy Buffer content to Output and clear Buffer.
Copy second character 'b' from the input to Buffer.
...
etc.
Aha moment often happens on the third iteration. Hope it helps! :)
I have an array of repeating letters:
AABCCD
and I would like to put them into pseudo-random order. Simple right, just use Fisher-Yates => done. However there is a restriction on the output - I don't want any runs of the same letter. I want at least two other characters to appear before the same character reappears. For example:
ACCABD
is not valid because there are two Cs next to each other.
ABCACD
is also not valid because there are two C's next to each other (CAC) with only one other character (A) between them, I require at least two other characters.
Every valid sequence for this simple example:
ABCADC ABCDAC ACBACD ACBADC ACBDAC ACBDCA ACDABC ACDACB ACDBAC ACDBCA
ADCABC ADCBAC BACDAC BCADCA CABCAD CABCDA CABDAC CABDCA CADBAC CADBCA
CADCAB CADCBA CBACDA CBADCA CDABCA CDACBA DACBAC DCABCA
I used a brute force approach for this small array but my actual problem is arrays with hundreds of elements. I've tried using Fisher-Yates with some suppression - do normal Fisher-Yates and then if you don't like the character that comes up, try X more times for a better one. Generates valid sequences about 87% of the time only and is very slow. Wondering if there's a better approach. Obviously this isn't possible for all arrays. An array of just "AAB" has no valid order, so I'd like to fail down to the best available order of "ABA" for something like this.
Here is a modified Fisher-Yates approach. As I mentioned, it is very difficult to generate a valid sequence 100% of the time, because you have to check that you haven't trapped yourself by leaving only AAA at the end of your sequence.
It is possible to create a recursive CanBeSorted method, which tells you whether or not a sequence can be sorted according to your rules. That will be your basis for a full solution, but this function, which returns a boolean value indicating success or failure, should be a starting point.
public static bool Shuffle(char[] array)
{
var random = new Random();
var groups = array.ToDictionary(e => e, e => array.Count(v => v == e));
char last = '\0';
char lastButOne = '\0';
for (int i = array.Length; i > 1; i--)
{
var candidates = groups.Keys.Where(c => groups[c] > 0)
.Except(new[] { last, lastButOne }).ToList();
if (!candidates.Any())
return false;
var #char = candidates[random.Next(candidates.Count)];
var j = Array.IndexOf(array.Take(i).ToArray(), #char);
// Swap.
var tmp = array[j];
array[j] = array[i - 1];
array[i - 1] = tmp;
lastButOne = last;
last = #char;
groups[#char] = groups[#char] - 1;
}
return true;
}
Maintain a link list that will keep track of the letter and it's position in the result.
After getting the random number,Pick it's corresponding character from the input(same as Fisher-Yates) but now search in the list whether it has already occurred or not.
If not, insert the letter in the result and also in the link list with its position in the result.
If yes, then check it's position in the result(that you have stored in the link list when you have written that letter in result). Now compare this location with the current inserting location, If mod(currentlocation-previouslocation) is 3 or greater, you can insert that letter in the result otherwise not, if not choose the random number again.
I have an array and each index contains a string with four characters. I need to select a random point in the string and then slice stringaArray[0] and stringaArray[1] at the same point and swap their sliced parts and add these to splicedStringArray[0] and splicedStringArray[1].
I know how to use split in C# and I have been experimenting with this, but it will only split the string into characters, not parts. I ask this question because my way of thinking is to create lots of variables to hold temporary strings then add them to the splicedStringArray[].
Here is my latest attempt to find the start middle and end of a string and hopefully copy whatever I want to variables to make new strings and then store these in teh second array:
string s = stringaArray[0];
char[] charArray = s.ToCharArray();
int amount = charArray.Length;
int findMiddle = amount / 2 + 1;
int midchar = findMiddle - 1;
int findLast = amount - 1;
char fchar = charArray[0];
char mchar = charArray[midchar];
char lchar = charArray[findLast];
I was also looking at the string builder class in C# and wondering if there was something there I could use, but I think I will spend a lot of time on this and and develop the worst solution so any advice on how to do this would appreciated.
For splitting at the exact position, use String.Substring. This way you can split up to certain point and from certain point. The simplest solution is similar to this:
var offset = 1;
splicedStringArray[0] = stringArray[0].Substring(0, offset) + stringArray[1].Substring(offset);
splicedStringArray[1] = stringArray[1].Substring(0, offset) + stringArray[0].Substring(offset);
Disclaimer: the code is written without testing.
Implement an algorithm that takes two strings as input, and returns the intersection of the two, with each letter represented at most once.
Algo: (considering language used will be c#)
Convert both strings into char array
take the smaller array and generate a hash table for it with key as the character and value 0
Now Loop through the other array and increment the count in hash table if that char is present in it.
Now take out all char for hash table whose value is > 0.
These are intersection values.
This is an O(n), solution but is uses extra space, 2 char arrays and a hash table
Can you guys think of better solution than this?
How about this ...
var s1 = "aabbccccddd";
var s2 = "aabc";
var ans = s1.Intersect(s2);
Haven't tested this, but here's my thought:
Quicksort both strings in place, so you have an ordered sequence of characters
Keeping an index into both strings, compare the "next" character from each string, pick and output the first one, incrementing the index for that string.
Continue until you get to the end of one of the strings, then just pull unique values from the rest of the remaining string.
Won't use additional memory, only needs the two original strings, two integers, and an output string (or StringBuilder). As an added bonus, the output values will be sorted too!
Part 2:
This is what I'd write (sorry about the comments, new to stackoverflow):
private static string intersect(string left, string right)
{
StringBuilder theResult = new StringBuilder();
string sortedLeft = Program.sort(left);
string sortedRight = Program.sort(right);
int leftIndex = 0;
int rightIndex = 0;
// Work though the string with the "first last character".
if (sortedLeft[sortedLeft.Length - 1] > sortedRight[sortedRight.Length - 1])
{
string temp = sortedLeft;
sortedLeft = sortedRight;
sortedRight = temp;
}
char lastChar = default(char);
while (leftIndex < sortedLeft.Length)
{
char nextChar = (sortedLeft[leftIndex] <= sortedRight[rightIndex]) ? sortedLeft[leftIndex++] : sortedRight[rightIndex++];
if (lastChar == nextChar) continue;
theResult.Append(nextChar);
lastChar = nextChar;
}
// Add the remaining characters from the "right" string
while (rightIndex < sortedRight.Length)
{
char nextChar = sortedRight[rightIndex++];
if (lastChar == nextChar) continue;
theResult.Append(nextChar);
lastChar = nextChar;
}
theResult.Append(sortedRight, rightIndex, sortedRight.Length - rightIndex);
return (theResult.ToString());
}
I hope that makes more sense.
You don't need to 2 char arrays. The System.String data type has a built-in indexer by position that returns the char from that position, so you could just loop through from 0 to (String.Length - 1). If you're more interested in speed than optimizing storage space, then you could make a HashSet for the one of the strings, then make a second HashSet which will contain your final result. Then you iterate through the second string, testing each char against the first HashSet, and if it exists then add it the second HashSet. By the end, you already have a single HashSet with all the intersections, and save yourself the pass of running through the Hashtable looking for ones with a non-zero value.
EDIT: I entered this before all the comments on the question about not wanting to use any built-in containers at all
here's how I would do this. It's still O(N) and it doesn't use a hash table but instead one int array of length 26. (ideally)
make an array of 26 integers, each element for a letter of the alphebet. init to 0's.
iterate over the first string, decrementing one when a letter is encountered.
iterate over the second string and take the absolute of whatever is at the index corresponding to any letter you encounter. (edit: thanks to scwagner in comments)
return all letters corresponding to all indexes holding value greater than 0.
still O(N) and extra space of only 26 ints.
of course if you're not limited to only lower or uppercase characters your array size may need to change.
"with each letter represented at most once"
I'm assuming that this means you just need to know the intersections, and not how many times they occurred. If that's so then you can trim down your algorithm by making use of yield. Instead of storing the count and continuing to iterate the second string looking for additional matches, you can yield the intersection right there and continue to the next possible match from the first string.