Split and select specific elements

Split and select specific elements - c#

I have a comma separated string which specifies the indexes. Then I have one more comma separated string which has all the values.
EX:
string strIndexes = "5,6,8,15";
string strData = "ab*bc*dd*ff*aa*ss*ee*mm*jj*ii*waa*jo*us*ue*ed*ws*ra";
Is there a way to split the string strData and select only the elements which are at index 5, 6, 8 or 15. Or will I have to split the string first then loop through the array/list and then build one more array/list with the values at indexes defined by string strIndexes (i.e. 5, 6,7,15 in this example)
Thanks

It's reasonably simple:
var allValues = strData.Split('*')
var selected = strIndexes.Split(',')
.Select(x => int.Parse(x))
.Select(index => allValues[index]);
You can create a list from that (by calling selected.ToList()) or you can just iterate over it.

It depends a bit on the length of the string. If it is relatively short (and therefore any array from "Split" is small) then just use the simplest approach that works; Split on "*" and pick the elements you need. If it is significantly large, then maybe something like an iterator block to avoid having to create a large array (but then... since the string is already large maybe this isn't a huge overhead). LINQ isn't necessarily your best approach here...
string[] data = strData.Split('*');
string[] result = Array.ConvertAll(strIndexes.Split(','),
key => data[int.Parse(key)]);
which gives ["ss","ee","jj","ws"].

call Split(','); on the first string and you get an array of strings, that array you can access by index and the same you can do on the second array. No need to loop array lists.

Related

C# subarray of strings with LINQ

I have an array of strings and I have to take only those entries, which
are starting with "81" or "82". I've tried it like this:
var lines = File.ReadAllLines(fileName); // This returns an array of strings
lines = lines.TakeWhile(item => item.StartsWith("81") ||item.StartsWith("82")).ToArray();
but this just doesn't work. It retuns an empty string array.
When I loop through lines with a for-loop and compare everytime
if (!firstline.Substring(0, 2).StartsWith("81")) continue;
and then I take the required entries, it's working just fine.
Any suggestions how to get it right with LINQ?

You need to use Where():
lines = lines.Where(item => item.StartsWith("81") || item.StartsWith("82")).ToArray();
TakeWhile will take sequence until condition becomes false, but Where will continue and find all elements matching the condition.

Removing a node from a list using a for

Here is what I have done:
// this is an example of my function, the string and the remover should be variables
string delimeter = ",";
string remover="4";
string[] separator = new string[] { "," };
List<String> List = "1,2,3,4,5,6".Split(separator, StringSplitOptions.None).ToList();
for (int i = 0; i < List.Count - 1; i++)
{
if(List[i]==remover)
List.RemoveAt(i);
}
string allStrings = (List.Aggregate((i, j) => i + delimeter + j));
return allStrings;
The problem is the retruned string is the same as the originial one, same as "1,2,3,4,5,6". the "4" is still in it.
How to fix it?
EDIT:
the solution was that i didnt check the last node of the list in that for, it doesnt seem like that in the example because it was an example i gave just now

When you remove items from a list like this you should make your for loop run in reverse, from highest index to lowest.
If you go from lowest to highest you will end up shifting items down as they are removed and this will skip items. Running it in reverse does not have this issue.

Your code, as it stands, produces the expected output. When running it it will return 1,2,3,5,6. If it doesn't it's due to a bug in how you call this method.
That's not to say that you don't have problems.
When you remove an item you still increment the current index, so you skip checking the item after any item you remove.
While there are a number of solutions, the best solution here is to use the RemoveAll method of List. Not only does it ensure that all items are evaluated, but it can do so way more efficiently. Removing an item from a list means shifting all of the items over by one. RemoveAll can do all of that shifting at the end, which is way more efficient if a lot of items are removed.
Another bug that you have is that your for loop doesn't check the last item at all, ever.
On a side note, you shouldn't use Aggregate to join a bunch of strings together given a delimiter. It's extremely inefficient as you need to copy all of the data from the first item into an intermediate string when adding the second, then both of those to a new string when adding the third, then all three of those to a new string when creating a fourth, and so on. Instead you should use string.Join(delimeter, List);, which is not only way more efficient, but is way easier to write and semantically represents exactly what you're trying to do. Win win win.
We can now re-write the method as:
string delimeter = ",";
string remover = "4";
List<String> List = "1,2,3,4,5,6"
.Split(new[] { delimeter }, StringSplitOptions.None).ToList();
List.RemoveAll(n => n == remover);
return string.Join(delimeter, List);
Another option is to avoid creating a list just to remove items from it and then aggregate the data again. We can instead just take the sequence of items that we have, pull out only the items that we want to keep, rather than removing the items we don't want to keep, and then aggregate those. This is functionally the same, but remove the needless effort of building up a list and removing items, pulling out mechanism from the requirements:
string delimeter = ",";
string remover = "4";
var items = "1,2,3,4,5,6"
.Split(new[] { delimeter }, StringSplitOptions.None)
.Where(n => n != remover);
return string.Join(delimeter, items);

Use this for remove
list.RemoveAll(f => f==remover);

Check if Characters in ArrayList C# exist - C# (2.0)

I was wondering if there is a way in an ArrayList that I can search to see if the record contains a certain characters, If so then grab the whole entire sentence and put in into a string. For Example:
list[0] = "C:\Test3\One_Title_Here.pdf";
list[1] = "D:\Two_Here.pdf";
list[2] = "C:\Test\Hmmm_Joke.pdf";
list[3] = "C:\Test2\Testing.pdf";
Looking for: "Hmmm_Joke.pdf"
Want to get: "C:\Test\Hmmm_Joke.pdf" and put it in the Remove()
protected void RemoveOther(ArrayList list, string Field)
{
string removeStr;
-- Put code in here to search for part of a string which is Field --
-- Grab that string here and put it into a new variable --
list.Contains();
list.Remove(removeStr);
}
Hope this makes sense. Thanks.

Loop through each string in the array list and if the string does not contain the search term then add it to new list, like this:
string searchString = "Hmmm_Joke.pdf";
ArrayList newList = new ArrayList();
foreach(string item in list)
{
if(!item.ToLower().Contains(searchString.ToLower()))
{
newList.Add(item);
}
}
Now you can work with the new list that has excluded any matches of the search string value.
Note: Made string be lowercase for comparison to avoid casing issues.

In order to remove a value from your ArrayList you'll need to loop through the values and check each one to see if it contains the desired value. Keep track of that index, or indexes if there are many.
Then after you have found all of the values you wish to remove, you can call ArrayList.RemoveAt to remove the values you want. If you are removing multiple values, start with the largest index and then process the smaller indexes, otherwise, the indexes will be off if you remove the smallest first.

This will do the job without raising an InvalidOperationException:
string searchString = "Hmmm_Joke.pdf";
foreach (string item in list.ToArray())
{
if (item.IndexOf(searchString, StringComparison.OrdinalIgnoreCase) >= 0)
{
list.Remove(item);
}
}
I also made it case insensitive.
Good luck with your task.

I would rather use LINQ to solve this. Since IEnumerables are immutable, we should first get what we want removed and then, remove it.
var toDelete = Array.FindAll(list.ToArray(), s =>
s.ToString().IndexOf("Hmmm_Joke.pdf", StringComparison.OrdinalIgnoreCase) >= 0
).ToList();
toDelete.ForEach(item => list.Remove(item));
Of course, use a variable where is hardcoded.
I would also recommend read this question: Case insensitive 'Contains(string)'
It discuss the proper way to work with characters, since convert to Upper case/Lower case since it costs a lot of performance and may result in unexpected behaviours when dealing with file names like: 文書.pdf

How can I get the user to input more than one value at a time in C#

I am prompting the user to enter their first middle and last name in one prompt.
So if they type for example John Ronald Doe, how can I take those values from that one string and assign them to fname, mname, lname, to create a person constructor that takes these values.
Basically my question is how to take multiple values in one prompt for a user and then assign them to different variables for a constructor.
Thank you!

Just grab the line then split on white spaces. You'll have to do some input validation to make sure they actually enter fName mName lName but I'll leave that to you because what you do is dependent on how robust the application needs to be. By that I mean, what happens if the user only enters two words? Do you just assume it's first and last name and set their middle name to String.Empty ? What happens if they enter 4 or 5 words?
string[] tokens = Console.ReadLine().Split(' ');
if (tokens.Length == 3)
{
// do assignment
}
With regard to your comment;
That is more or less correct. Really what's happening is ReadLine() is returning a string. I'm just calling Split(' ') directly on that result. You could break it into two lines if you'd like but there's no reason to. Split(char delimiter) is an instance method in the String class. It takes a char and returns an array of strings. I'm using tokens because it's kind of a common term. The line is build of three tokens, the first, middle, and last names. It's important to understand that I am not adding anything to an array, Split(' ') is returning an array. string[] tokens is just declaring a reference of type string[], I don't have to worry about the size or anything like that.
An example to get an array of ints from the input. Note, if any of the input is not a valid integer this will throw an exception.
int[] myInts = Console.ReadLine.Split(' ').Select(x => int.Parse(x)).ToArray();
In this example I'm using LINQ on the result of the split. Split returns a string array. Select iterates over that array and applies the lambda expression I pass it to each of it's values. It's best to think of x as the current value. The Select call here is roughly equivalent to;
List<int> myInts = new List<int>();
foreach (string s in tokens)
{
myInts.Add(int.Parse(s));
}
int[] myIntArray = myInts.ToArray();
If you can't trust the user input you should use TryParse. I don't know if you can use that as part of a lambda expression (it wouldn't surprise me if you could but it seems like a pain in the ass so I wouldn't bother) instead I'd do;
List<int> myInts = new List<int>();
int temp;
foreach (string s in tokens)
{
if (int.TryParse(s, temp)
myInts.Add(temp);
else
// the input wasn't an int
}
int[] myIntArray = myInts.ToArray();

C# Efficient Substring with many inputs

Assuming I do not want to use external libraries or more than a dozen or so extra lines of code (i.e. clear code, not code golf code), can I do better than string.Contains to handle a collection of input strings and a collection of keywords to check for?
Obviously one can use objString.Contains(objString2) to do a simple substring check. However, there are many well-known algorithms which are able to do better than this under special circumstances, particularly if one is working with multiple strings. But sticking such an algorithm into my code would probably add length and complexity, so I'd rather use some sort of shortcut based on a built in function.
E.g. an input would be a collection of strings, a collection of positive keywords, and a collection of negative keywords. Output would be a subset of the first collection of keywords, all of which had at least 1 positive keyword but 0 negative keywords.
Oh, and please don't mention regular expressions as a suggested solutions.
It may be that my requirements are mutually exclusive (not much extra code, no external libraries or regex, better than String.Contains), but I thought I'd ask.
Edit:
A lot of people are only offering silly improvements that won't beat an intelligently used call to contains by much, if anything. Some people are trying to call Contains more intelligently, which completely misses the point of my question. So here's an example of a problem to try solving. LBushkin's solution is an example of someone offering a solution that probably is asymptotically better than standard contains:
Suppose you have 10,000 positive keywords of length 5-15 characters, 0 negative keywords (this seems to confuse people), and 1 1,000,000 character string. Check if the 1,000,000 character string contains at least 1 of the positive keywords.
I suppose one solution is to create an FSA. Another is delimit on spaces and use hashes.

Your discussion of "negative and positive" keywords is somewhat confusing - and could use some clarification to get more complete answers.
As with all performance related questions - you should first write the simple version and then profile it to determine where the bottlenecks are - these can be unintuitive and hard to predict. Having said that...
One way to optimize the search may (if you are always searching for "words" - and not phrases that could contains spaces) would be to build a search index of from your string.
The search index could either be a sorted array (for binary search) or a dictionary. A dictionary would likely prove faster - both because dictionaries are hashmaps internally with O(1) lookup, and a dictionary will naturally eliminate duplicate values in the search source - thereby reducing the number of comparions you need to perform.
The general search algorithm is:
For each string you are searching against:
Take the string you are searching within and tokenize it into individual words (delimited by whitespace)
Populate the tokens into a search index (either a sorted array or dictionary)
Search the index for your "negative keywords", if one is found, skip to the next search string
Search the index for your "positive keywords", when one is found, add it to a dictionary as they (you could also track a count of how often the word appears)
Here's an example using a sorted array and binary search in C# 2.0:
NOTE: You could switch from string[] to List<string> easily enough, I leave that to you.
string[] FindKeyWordOccurence( string[] stringsToSearch,
string[] positiveKeywords,
string[] negativeKeywords )
{
Dictionary<string,int> foundKeywords = new Dictionary<string,int>();
foreach( string searchIn in stringsToSearch )
{
// tokenize and sort the input to make searches faster
string[] tokenizedList = searchIn.Split( ' ' );
Array.Sort( tokenizedList );
// if any negative keywords exist, skip to the next search string...
foreach( string negKeyword in negativeKeywords )
if( Array.BinarySearch( tokenizedList, negKeyword ) >= 0 )
continue; // skip to next search string...
// for each positive keyword, add to dictionary to keep track of it
// we could have also used a SortedList, but the dictionary is easier
foreach( string posKeyword in positiveKeyWords )
if( Array.BinarySearch( tokenizedList, posKeyword ) >= 0 )
foundKeywords[posKeyword] = 1;
}
// convert the Keys in the dictionary (our found keywords) to an array...
string[] foundKeywordsArray = new string[foundKeywords.Keys.Count];
foundKeywords.Keys.CopyTo( foundKeywordArray, 0 );
return foundKeywordsArray;
}
Here's a version that uses a dictionary-based index and LINQ in C# 3.0:
NOTE: This is not the most LINQ-y way to do it, I could use Union() and SelectMany() to write the entire algorithm as a single big LINQ statement - but I find this to be easier to understand.
public IEnumerable<string> FindOccurences( IEnumerable<string> searchStrings,
IEnumerable<string> positiveKeywords,
IEnumerable<string> negativeKeywords )
{
var foundKeywordsDict = new Dictionary<string, int>();
foreach( var searchIn in searchStrings )
{
// tokenize the search string...
var tokenizedDictionary = searchIn.Split( ' ' ).ToDictionary( x => x );
// skip if any negative keywords exist...
if( negativeKeywords.Any( tokenizedDictionary.ContainsKey ) )
continue;
// merge found positive keywords into dictionary...
// an example of where Enumerable.ForEach() would be nice...
var found = positiveKeywords.Where(tokenizedDictionary.ContainsKey)
foreach (var keyword in found)
foundKeywordsDict[keyword] = 1;
}
return foundKeywordsDict.Keys;
}

If you add this extension method:
public static bool ContainsAny(this string testString, IEnumerable<string> keywords)
{
foreach (var keyword in keywords)
{
if (testString.Contains(keyword))
return true;
}
return false;
}
Then this becomes a one line statement:
var results = testStrings.Where(t => !t.ContainsAny(badKeywordCollection)).Where(t => t.ContainsAny(goodKeywordCollection));
This isn't necessarily any faster than doing the contains checks, except that it will do them efficiently, due to LINQ's streaming of results preventing any unnecessary contains calls.... Plus, the resulting code being a one liner is nice.

If you're truly just looking for space-delimited words, this code would be a very simple implementation:
static void Main(string[] args)
{
string sIn = "This is a string that isn't nearly as long as it should be " +
"but should still serve to prove an algorithm";
string[] sFor = { "string", "as", "not" };
Console.WriteLine(string.Join(", ", FindAny(sIn, sFor)));
}
private static string[] FindAny(string searchIn, string[] searchFor)
{
HashSet<String> hsIn = new HashSet<string>(searchIn.Split());
HashSet<String> hsFor = new HashSet<string>(searchFor);
return hsIn.Intersect(hsFor).ToArray();
}
If you only wanted a yes/no answer (as I see now may have been the case) there's another method of hashset "Overlaps" that's probably better optimized for that:
private static bool FindAny(string searchIn, string[] searchFor)
{
HashSet<String> hsIn = new HashSet<string>(searchIn.Split());
HashSet<String> hsFor = new HashSet<string>(searchFor);
return hsIn.Overlaps(hsFor);
}

Well, there is the Split() method you can call on a string. You could split your input strings into arrays of words using Split() then do a one-to-one check of words with keywords. I have no idea if or under what circumstances this would be faster than using Contains(), however.

First get rid of all the strings that contain negative words. I would suggest doing this using the Contains method. I would think that Contains() is faster then splitting, sorting, and searching.

Seems to me that the best way to do this is take your match strings (both positive and negative) and compute a hash of them. Then march through your million string computing n hashes (in your case it's 10 for strings of length 5-15) and match against the hashes for your match strings. If you get hash matches, then you do an actual string compare to rule out the false positive. There are a number of good ways to optimize this by bucketing your match strings by length and creating hashes based on the string size for a particular bucket.
So you get something like:
IList<Buckets> buckets = BuildBuckets(matchStrings);
int shortestLength = buckets[0].Length;
for (int i = 0; i < inputString.Length - shortestLength; i++) {
foreach (Bucket b in buckets) {
if (i + b.Length >= inputString.Length)
continue;
string candidate = inputString.Substring(i, b.Length);
int hash = ComputeHash(candidate);
foreach (MatchString match in b.MatchStrings) {
if (hash != match.Hash)
continue;
if (candidate == match.String) {
if (match.IsPositive) {
// positive case
}
else {
// negative case
}
}
}
}
}

To optimize Contains(), you need a tree (or trie) structure of your positive/negative words.
That should speed up everything (O(n) vs O(nm), n=size of string, m=avg word size) and the code is relatively small & easy.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.