Get the more similar string from a list

Get the more similar string from a list - c#

I have a List that contains all the remote Path I need
List<string> remotePath = MyTableWithRemotePath.Select(i => i.ID_SERVER_PATH).ToList();
I have a string which is the server I'm finding.
string remotePath = "Path I'm looking for";
I have to find which is the path of the list which match better with the one I'm looking for.
I tried with this but it doesn't work
var matchingvalues = remotePath.FirstOrDefault(stringToCheck => stringToCheck.Contains(remotePath));
Any suggestions?
EDIT
Example:
I have to find the best match for this path: C:\\something\\location\\
This is my List:
- C:\\something\\location\\XX\\
- C:\\something\\location2\\YY\\
- C:\\something\\location3\\AA\\
- C:\\something\\location4\\CS\\
The result have to be the first element:
C:\\something\\location\\directory\\

I'd say instead of:
string dir = #"some\\path\\im\\looking\\for";
Break that up into an array for each path.
string[] dirs = new string[n] { "some", "path", "im", "looking", "for" };
Then iterate over your list, checking each item in the array as well. Each time there's a match, add it to another collection with the key (the full path) and the value (the number of matches).
for (int i = 0; i < remotePath.Count; i++)
{
int counter = 0;
for (int j = 0; j < dirs.Length; j++)
{
if (remotePath[i].Contains(dirs[j])
counter++;
}
if (counter > 0)
someStringIntDictionary.Add(remotePath[i], counter);
}
In regards to the final task of determining which is the "best match", I'm honestly not sure exactly how to do it but searching Google for C# find dicitonary key with highest value gave me this:
https://stackoverflow.com/a/2806074/1189566
This answer might not be the most efficient, with nested looping over multiple collections, but it should work.
I'd like to point out this is succeptible to inaccuracies if the filename or a subdirectory shares part of a name with something in dirs. So using the first item in the array, "some", you might run into an error with the following scenario:
"C:\\something\\location\\directory\\flibflam\\file.pdf"
something would incorrectly match to some, so it might not actually be a valid match. You'd probably want to check the adjacent character(s) to the directory in the actual path and make sure they're \ characters.

var remotePaths = new List<string>
{
#"C:\something\location\directory\",
#"C:\something\location2\directory\",
#"C:\something\location3\directory\",
#"C:\something\location4\directory\"
};
var remotePath = #"C:\something\location\directory\";
var result = remotePaths
.Select(p => new { p, mathes = p.Split('\\').TakeWhile((x, i) => x == remotePath.Split('\\')[i]).Count()})
.OrderByDescending(p => p.mathes)
.First().p;
Result:
C:\something\location\directory\
The code goes through each directory creates parse it and creates subdirectories for each one, then compares each subdirectory with remotePath subdirectory. In the end it takes the first one that has most number of matches.

At the end I did it in this way and perfectly works:
var bestPath = remotePaths.OrderByDescending(i => i.ID_FILE_PATH.Length)
.ToList()
.FirstOrDefault(i => rootPath.StartsWith(i.ID_FILE_PATH, StringComparison.InvariantCultureIgnoreCase));

Related

I am trying to read a CSV file in C#, splitting lines into groups depending on word repition. I am getting an index out of range error

my error is on the very last line, saying my index is out of range. Not sure what the problem is. I would like to continue using a list of lists or lists. I am trying to read a line of a csv file and separate that line into groups if one of the words in that line repeats; for example:
"hey how are you hey whats up"
hey how are you would be in one group and then hey whats up would be in the other group.
string[] ReadDirectory = Directory.GetFiles("C:\\Users\\-------", "*.csv");
List<List<List<string>>> myList = new List<List<List<string>>>();
List<string> CSVlist = new List<string>();
foreach (string file in ReadDirectory)
{
using (StreamReader readFile = new StreamReader(file))
{
int groupIndex = 0;
string line = readFile.ReadLine();
string[] headers = line.Split(',');
Array.Reverse(headers);
CSVlist.Add(headers[headers.Length - 1]);
myList.Add(new List<List<string>>());
for (int i = 0; i < headers.Length; i++)
{
if (headers[i].Contains("repeats") && headers[i + 1].Contains("repeats"))
{
myList.Add(new List<List<string>>());
groupIndex++;
}
myList[0][groupIndex].Add(headers[i]);
}
}
}

the problem resides when i =headers.Length-1, then headers[i + 1] is out of bounds. try:
for (int i = 0; i < headers.Length; i++)
{
if (i<headers.Length-1)
{
if (headers[i].Contains("repeats") && headers[i + 1].Contains("repeats"))
{
myList.Add(new List<List<string>>());
groupIndex++;
}
myList[0][groupIndex].Add(headers[i]);
}
}

Looking at the code, I'm not sure it'll do what you want it too (eg. if headers contains the exact word 'repeats', but this may just be example code so I'll ignore that) - but I'll focus on the error reported.
The exact error you reported is caused by this line:
myList[0][groupIndex].Add(headers[i]);
When you first add a nested list to myList, you don't add a nested list to that first nested list - so when the if statement is false, it tries to add the header into myList[0][0] where the second index is out of range because there is no inner list at myList[0].
Changing
myList.Add(new List<List<string>>());
to something like
var innerGroupList = new List<string>();
var groupList = new List<List<string>>();
groupList.Add(innerGroupList);
myList.Add(groupList);
will resolve the issue, but you won't get your expected outcome from the example data as the word 'repeats' is not there, you would need to do something like save each word in a Hashset, and check each word against that. If it already exists in the dictionary, split it into another group.

how to access and write each word in string array read from a file onto a new file in c#?

My testerfile contains:
processes
deleting
agreed
And this the code in C#
PorterStemmer testing = new PorterStemmer();
string temp,stemmed;
string[] lines = System.IO.File.ReadAllLines(#"C:\\Users\\PJM\\Documents\\project\\testerfile.txt");
System.Console.WriteLine("Contents of testerfile.txt = ");
for (int i = 0; i <2; i++)
{
temp = lines[i];
stemmed = testing.StemWord(temp);
System.IO.File.WriteAllText(#"C:\\Users\\PJM\\Documents\\project\\testerfile3.txt", stemmed);
Console.WriteLine("\t" + stemmed);
}
After running the code, the testerfile3 contains only "agre" .
So my problem here is that I want each word in the string array to be processed seperately i.e. I am having problem accessing string array. Is there any way to access every index in the string array?

From the documentation of WriteAllText:
If the target file already exists, it is overwritten.
so each iteration in your for loop overwrites the file, and you're only left with the text from the last iteration.
you can use System.IO.File.AppendAllText instead
also, you can use the array's Length property to loop through all words for (int i = 0; i < lines.Length; i++)
Alternatively, instead of the for-loop you can use LINQ's Select to project the non-stemmed line to the stemmed one and use AppendAllLines to write the results:
System.IO.File.AppendAllLines(#"C:\\Users\\PJM\\Documents\\project\\testerfile3.txt", lines.Select(l => testing.StemWord(l)));

How to get lines from a file between 2 dynamic locations?

As noted in a thread I asked earlier, I'm trying to parse some segments of code from a single method that is over 8K lines long. It's mostly just duplicated, hardcoded logic for a bunch of fields in a dataset.
Sample data I'm parsing would look something like this;
temp_str = ds->Fields->FieldsByName("Field1")->AsString;
if (temp_str.IsEmpty())
//do something
else
//do something else
temp_str = ds->Fields-FieldsByName("Field2")->AsString;
if (differentCondition)
//do something
else
//do some other thing
In essence, what I want to do is get all lines between the each "pair" of temp_str = ... lines and then just collect each unique set of validation rules. But I'm having a little trouble locating these segments of code.
My method looks like this:
while (lines.Any(stringToCheck => stringToCheck.Contains(validationHeader)))
{
startOfNextValidation = lines.IndexOf(lines.First(s => s.Contains(validationHeader)), lines.IndexOf(validationHeader) + 1);
if (startOfNextValidation > lines.Count || startOfNextValidation <= 0)
break;
validations.Add(GetString(lines.GetRange(0, startOfNextValidation)));
lines.RemoveRange(0, startOfNextValidation);
}
The string validationHeader variable is just temp_str = ds->Fields->FieldsByName(".
This successfully identifies my first chunk of validation, but then it doesn't find anything else, which is incorrect. There's something wrong with how I'm identifying instances of validationHeader on the first line in my while loop, but I cannot seem to discern where the logic error is.
How can I find the "pairs" of validationHeaders and then get the lines between these pairs?
I saw these SO threads but I don't really understand how to 'translate' it for my purposes;
https://stackoverflow.com/a/20360426/1189566
https://stackoverflow.com/a/6562086/1189566

Wound up with this solution:
List<string> lines = File.ReadAllLines(file).ToList<string>();
List<string> validations = new List<string>();
List<int> allIndices = lines.Select((s, i) => new { Str = s, Index = i })
.Where(x => x.Str.Contains(validationHeader))
.Select(x => x.Index).ToList<int>();
for (int j = 0; j < allIndices.Count() - 1; j++)
{
int count = (allIndices[j + 1] - allIndices[j]);
validations.Add(GetString(lines.GetRange(allIndices[j], count)));
}
lines contains all of the code from file
vaidations contains the segments of code between the validationHeader defined in my original question
allIndices just contains the index of each validationHeader
GetString(List<string>) just returns a single string containing all of the elements within the given range, which is then added to my validations list which I later loop over with foreach var v in validations.Distinct() and write v to a file.

Select certain part in string as variable c#

I do have a string like the following
"1 1/2 + 2 2/3"
Now i want the "1 1/2" as a variable, and the "2 2/3" as a different variable.
How do i fix this?
Thanks.

If you are always going to have a '+' inbetween, you could simply do:
var splitStrings = stringWithPlus.Split('+');
for (int i = 0; i < splitStrings.Length; i++) {
splitStrings[i] = splitStrings[i].Trim();
}
edit: If you really wanted to put these two parts into two separate variables, you could do so. But it's quite unnecessary. The type of the var is going to be string[] but to get them into two variables:
var splitStrings = stringWithPlus.Split('+');
for (int i = 0; i < splitStrings.Length; i++) {
splitStrings[i] = splitStrings[i].Trim();
}
string firstHalf = splitStrings[0];
string secondHalf = splitStrings[1];
It would be better though, to just access these strings via the array, as then you're not allocating any more memory for the same data.
If you are comfortable with Linq and want to shorten this (the above example illustrates exactly what happens) you can do the split & foreach in one line:
var splitStrings = stringWithPlus.Split('+').Select(aString => aString.Trim()).ToArray();
string firstHalf=splitStrings[0];
string secondHalf=splitStrings[1];
If this syntax is confusing, you should do some searches on Linq, and more specifically Linq to Objects.

To make it shorter I used Linq to Trim the strings. Then I converted it back to an array.
string[] parts = stringWithPlus.Split('+').Select(p => p.Trim()).ToArray();
Use them as:
parts[0], parts[1]... parts[n - 1]
where n = parts.Length.

Counting occurrences of a string in an array and then removing duplicates

I am fairly new to C# programming and I am stuck on my little ASP.NET project.
My website currently examines Twitter statuses for URLs and then adds those URLs to an array, all via a regular expression pattern matching procedure. Clearly more than one person will update a with a specific URL so I do not want to list duplicates, and I want to count the number of times a particular URL is mentioned in, say, 100 tweets.
Now I have a List<String> which I can sort so that all duplicate URLs are next to each other. I was under the impression that I could compare list[i] with list[i+1] and if they match, for a counter to be added to (count++), and if they don't match, then for the URL and the count value to be added to a new array, assuming that this is the end of the duplicates.
This would remove duplicates and give me a count of the number of occurrences for each URL. At the moment, what I have is not working, and I do not know why (like I say, I am not very experienced with it all).
With the code below, assume that a JSON feed has been searched for using a keyword into srchResponse.results. The results with URLs in them get added to sList, a string List type, which contains only the URLs, not the message as a whole.
I want to put one of each URL (no duplicates), a count integer (to string) for the number of occurrences of a URL, and the username, message, and user image URL all into my jagged array called 'urls[100][]'. I have made the array 100 rows long to make sure everything can fit but generally, this is too big. Each 'row' will have 5 elements in them.
The debugger gets stuck on the line: if (sList[i] == sList[i + 1]) which is the crux of my idea, so clearly the logic is not working. Any suggestions or anything will be seriously appreciated!
Here is sample code:
var sList = new ArrayList();
string[][] urls = new string[100][];
int ctr = 0;
int j = 1;
foreach (Result res in srchResponse.results)
{
string content = res.text;
string pattern = #"((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)";
MatchCollection matches = Regex.Matches(content, pattern);
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
sList.Add(groups[0].Value.ToString());
}
}
sList.Sort();
foreach (Result res in srchResponse.results)
{
for (int i = 0; i < 100; i++)
{
if (sList[i] == sList[i + 1])
{
j++;
}
else
{
urls[ctr][0] = sList[i].ToString();
urls[ctr][1] = j.ToString();
urls[ctr][2] = res.text;
urls[ctr][3] = res.from_user;
urls[ctr][4] = res.profile_image_url;
ctr++;
j = 1;
}
}
}
The code then goes on to add each result into a StringBuilder method with the HTML.
Is now edite

The description of your algorithm seems fine. I don't know what's wrong with the implementation; I haven't read it that carefully. (The fact that you are using an ArrayList is an immediate red flag; why aren't you using a more strongly typed generic collection?)
However, I have a suggestion. This is exactly the sort of problem that LINQ was intended to solve. Instead of writing all that error-prone code yourself, just describe the transformation you're interested in, and let the compiler work it out for you.
Suppose you have a list of strings and you wish to determine the number of occurrences of each:
var notes = new []{ "Do", "Fa", "La", "So", "Mi", "Do", "Re" };
var counts = from note in notes
group note by note into g
select new { Note = g.Key, Count = g.Count() }
foreach(var count in counts)
Console.WriteLine("Note {0} occurs {1} times.", count.Note, count.Count);
Which I hope you agree is much easier to read than all that array logic you wrote. And of course, now you have your sequence of unique items; you have a sequence of counts, and each count contains a unique Note.

I'd recommend using a more sophisticated data structure than an array. A Set will guarantee that you have no duplicates.
Looks like C# collections doesn't include a Set, but there are 3rd party implementations available, like this one.

Your loop fails because when i == 99, (i + 1) == 100 which is outside the bounds of your array.
But as other have pointed out, .Net 3.5 has ways of doing what you want more elegantly.

If you don't need to know how many duplicates a specific entry has you could do the following:
LINQ Extension Methods
.Count()
.Distinct()
.Count()

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Get the more similar string from a list - c#

At the end I did it in this way and perfectly works: var bestPath = remotePaths.OrderByDescending(i => i.ID_FILE_PATH.Length) .ToList() .FirstOrDefault(i => rootPath.StartsWith(i.ID_FILE_PATH, StringComparison.InvariantCultureIgnoreCase));

Related

I am trying to read a CSV file in C#, splitting lines into groups depending on word repition. I am getting an index out of range error

how to access and write each word in string array read from a file onto a new file in c#?

How to get lines from a file between 2 dynamic locations?

Select certain part in string as variable c#

Counting occurrences of a string in an array and then removing duplicates

Categories

Resources