How to get lines from a file between 2 dynamic locations? - c#

As noted in a thread I asked earlier, I'm trying to parse some segments of code from a single method that is over 8K lines long. It's mostly just duplicated, hardcoded logic for a bunch of fields in a dataset.
Sample data I'm parsing would look something like this;
temp_str = ds->Fields->FieldsByName("Field1")->AsString;
if (temp_str.IsEmpty())
//do something
else
//do something else
temp_str = ds->Fields-FieldsByName("Field2")->AsString;
if (differentCondition)
//do something
else
//do some other thing
In essence, what I want to do is get all lines between the each "pair" of temp_str = ... lines and then just collect each unique set of validation rules. But I'm having a little trouble locating these segments of code.
My method looks like this:
while (lines.Any(stringToCheck => stringToCheck.Contains(validationHeader)))
{
startOfNextValidation = lines.IndexOf(lines.First(s => s.Contains(validationHeader)), lines.IndexOf(validationHeader) + 1);
if (startOfNextValidation > lines.Count || startOfNextValidation <= 0)
break;
validations.Add(GetString(lines.GetRange(0, startOfNextValidation)));
lines.RemoveRange(0, startOfNextValidation);
}
The string validationHeader variable is just temp_str = ds->Fields->FieldsByName(".
This successfully identifies my first chunk of validation, but then it doesn't find anything else, which is incorrect. There's something wrong with how I'm identifying instances of validationHeader on the first line in my while loop, but I cannot seem to discern where the logic error is.
How can I find the "pairs" of validationHeaders and then get the lines between these pairs?
I saw these SO threads but I don't really understand how to 'translate' it for my purposes;
https://stackoverflow.com/a/20360426/1189566
https://stackoverflow.com/a/6562086/1189566

Wound up with this solution:
List<string> lines = File.ReadAllLines(file).ToList<string>();
List<string> validations = new List<string>();
List<int> allIndices = lines.Select((s, i) => new { Str = s, Index = i })
.Where(x => x.Str.Contains(validationHeader))
.Select(x => x.Index).ToList<int>();
for (int j = 0; j < allIndices.Count() - 1; j++)
{
int count = (allIndices[j + 1] - allIndices[j]);
validations.Add(GetString(lines.GetRange(allIndices[j], count)));
}
lines contains all of the code from file
vaidations contains the segments of code between the validationHeader defined in my original question
allIndices just contains the index of each validationHeader
GetString(List<string>) just returns a single string containing all of the elements within the given range, which is then added to my validations list which I later loop over with foreach var v in validations.Distinct() and write v to a file.

Related

c# go to specific line in text file then skip a few lines and edit

I know how to read through all lines of a file and replace a selected line when a certain sequence of characters is found. The issue that I'm having at the moment is that I'm stuck with a structure that has no unique string to search for except for the main class name. So for example I'd know that the name of the class is "List_of_boats" and the structure tells me that 11 lines underneath that line is the value "items=2;" which I need to change to a certain value, depending on the amount of items I want to insert there.
Is there a way to use the foreach function or something to do this? I have provided some code that I've already got so far but I'm kind of stuck now.
var lines = File.ReadAllLines(fileToMerge);
var linID = 0;
foreach (var line in lines) {
if (line.Contains("ace_arsenal_saved_loadouts")) {
var newlinID = linID + 11; //go from ace_arsenal_saved_loadouts to "items=x;" to change number of items.
}
linID = linID + 1;
}
Convert the enumerable to an array, and loop through it by index:
var lines = File.ReadAllLines(fileToMerge).ToArray();
for (var linID = 0; linID < lines.Length; linID++) {
var line = lines[linID];
if (line.Contains("ace_arsenal_saved_loadouts")) {
var newlinID = linID + 11; //go from ace_arsenal_saved_loadouts to "items=x;" to change number of items.
}
}

I am trying to read a CSV file in C#, splitting lines into groups depending on word repition. I am getting an index out of range error

my error is on the very last line, saying my index is out of range. Not sure what the problem is. I would like to continue using a list of lists or lists. I am trying to read a line of a csv file and separate that line into groups if one of the words in that line repeats; for example:
"hey how are you hey whats up"
hey how are you would be in one group and then hey whats up would be in the other group.
string[] ReadDirectory = Directory.GetFiles("C:\\Users\\-------", "*.csv");
List<List<List<string>>> myList = new List<List<List<string>>>();
List<string> CSVlist = new List<string>();
foreach (string file in ReadDirectory)
{
using (StreamReader readFile = new StreamReader(file))
{
int groupIndex = 0;
string line = readFile.ReadLine();
string[] headers = line.Split(',');
Array.Reverse(headers);
CSVlist.Add(headers[headers.Length - 1]);
myList.Add(new List<List<string>>());
for (int i = 0; i < headers.Length; i++)
{
if (headers[i].Contains("repeats") && headers[i + 1].Contains("repeats"))
{
myList.Add(new List<List<string>>());
groupIndex++;
}
myList[0][groupIndex].Add(headers[i]);
}
}
}
the problem resides when i =headers.Length-1, then headers[i + 1] is out of bounds. try:
for (int i = 0; i < headers.Length; i++)
{
if (i<headers.Length-1)
{
if (headers[i].Contains("repeats") && headers[i + 1].Contains("repeats"))
{
myList.Add(new List<List<string>>());
groupIndex++;
}
myList[0][groupIndex].Add(headers[i]);
}
}
Looking at the code, I'm not sure it'll do what you want it too (eg. if headers contains the exact word 'repeats', but this may just be example code so I'll ignore that) - but I'll focus on the error reported.
The exact error you reported is caused by this line:
myList[0][groupIndex].Add(headers[i]);
When you first add a nested list to myList, you don't add a nested list to that first nested list - so when the if statement is false, it tries to add the header into myList[0][0] where the second index is out of range because there is no inner list at myList[0].
Changing
myList.Add(new List<List<string>>());
to something like
var innerGroupList = new List<string>();
var groupList = new List<List<string>>();
groupList.Add(innerGroupList);
myList.Add(groupList);
will resolve the issue, but you won't get your expected outcome from the example data as the word 'repeats' is not there, you would need to do something like save each word in a Hashset, and check each word against that. If it already exists in the dictionary, split it into another group.

Apply last non empty string to empty string?

A file is read in. Looks for lines that have a number that beings with an S The lines that do not have an S are maintained. Saves to an array. I am then populating an existing gridview with the same amount of lines.
As a place holder I have set the blank lines to *** This is where I'm stuck. I need the empty strings to be populated with the last non empty string.
So for example if the readout is:
1
2
3
Empty
Empty
Empty
4
Empty
6
I'd want it displayed as:
1
2
3
3
3
3
4
4
6
I can't figure out how to do that. I've been searching all day for examples but can only find ways of grabbing either the first or last number of my array is all. Here is my code.
var sLines = File.ReadAllLines(cboPartProgram.Text)
.Where(s => !s.StartsWith("'"))
.Select(s => new
{
SValue = Regex.Match(s, "(?<=S)[\\d.]*").Value,
})
.ToArray();
string LastSValue = "";
string Value = "";
for (int i = 0; i < sLines.Count(); i++)
{
if (sLines[i].SValue == "")
{
LastSValue = "***";
Value = LastSValue;
}
else
{
Value = (sLines[i].SValue);
}
}
Ok I think I got it.
for (int i = 0; i < sLines.Length; i++)
{
if (sLines[i].SValue == "" && i > 0)
{
foreach (var empt in sLines[i].SValue)
{
LastSValue = sLines[i - 1].SValue;
Value = LastSValue;
}
}
else
{
Value = (sLines[i].SValue);
}
On a side note, when I copy my code I use the code option above to format it, but I notice someone always has to correct my spacing. Its copied straight from the IDE but there are always spaced each line that I guess shouldn't be. Is there a different way I should do it?
UPDATE
If I should ask this as a new question let me know, but it's so dependent on this that I thought I should keep it here.
Using the code I posted above that does what I needed it too. I've been trying to edit this so that if there is NO previous number, so for example if there if line 1 has no number but the rest do, then just apply the string "NA" otherwise still do what the code above does to the rest of the lines.
I guess maybe the best way would be to just take the results from the above code, and if there are any empty spaces left, apply "NA" but I can't figure it out.
In your example, you just need to take the value of the row before to fill the current value. Something like the following :
for (int i = 0; i < sLines.Length; i++)
{
if (sLines[i].SValue == "" && i > 0)
{
sLines[i].SValue = sLines[i-1].SValue;
}
else
{
sLines[i].SValue = sLines[i].SValue;
}
}
Your example has one more issue but currently I'll focus only on gathering the "last non empty" string.
If you look at your example you can spot few things that could potentially help you finding solution. These are for loop and reference to original list that stays intact.
For my example I'll use Linq because it will be much easier.
First of all I'll copy all from before for loop ( if that makes sense :D ) :
var sLines = File.ReadAllLines(cboPartProgram.Text)
.Where(s => !s.StartsWith("'"))
.Select(s => new
{
SValue = Regex.Match(s, "(?<=S)[\\d.]*").Value,
})
.ToArray();
string LastSValue = "";
string Value = "";
Just because it's okay and will work for now.
With your for loop I'll make modifications :
for (int i = 0; i < sLines.Count(); i++)
{
// `i` is representing current "index" of processed "word"
// we can use this to find last "valid" element
// string notEmpty = sLines.Take(i).LastOrDefault(word => !string.IsNullOrEmpty(word));
// but since you want to assign this to `Value` and there can be not empty string at `i` index
// we can make it in one line :
Value = string.IsNullOrEmpty(sLines[i]) ? sLines.Take(i).LastOrDefault(word => !string.IsNullOrEmpty(word)) : sLines[i].SValue;
// instead of your previous logic :
//if (sLines[i].SValue == "")
//{
// LastSValue = "***";
// Value = LastSValue;
//}
//else
//{
// Value = (sLines[i].SValue);
//}
}
Another problem which I think you'll face is that first value ( judging by the input ) can also be empty. Which will throw exception in my example. This will also be impossible to fit this kind of solution because there's no previous value ( at all ).
From what I understand, if you want to store the result in Value and do something else with it inside the loop (instead of changing it in the array), what you probably want is this:
for (int i = 0; i < sLines.Count(); i++)
{
if (sLines[i].SValue == "")
{
Value = LastSValue;
}
else
{
Value = (sLines[i].SValue);
LastSValue = Value;
}
// use Value
}
I would also suggest using sLines.Length instead of Count(), which is made for sequences where the length isn't known in advance - it's supposed to literally count the elements one by one. In this case it would probably be optimized but if you know you're dealing with an array, it's a good idea to ask for the length directly.
EDIT:
To get "NA" if there's no previous number, just initialize LastSValue to this value before the loop:
string LastSValue = "NA";
That way, if Value is empty and there was not LastSValue set before, it will still be "NA".
EDIT2:
A solution similar to the one from #Cubi, to change it in place:
for (int i = 0; i < sLines.Length; i++)
{
if (sLines[i].SValue == "")
sLines[i].SValue = i > 0 ? sLines[i-1].SValue : "NA";
}

Get the more similar string from a list

I have a List that contains all the remote Path I need
List<string> remotePath = MyTableWithRemotePath.Select(i => i.ID_SERVER_PATH).ToList();
I have a string which is the server I'm finding.
string remotePath = "Path I'm looking for";
I have to find which is the path of the list which match better with the one I'm looking for.
I tried with this but it doesn't work
var matchingvalues = remotePath.FirstOrDefault(stringToCheck => stringToCheck.Contains(remotePath));
Any suggestions?
EDIT
Example:
I have to find the best match for this path: C:\\something\\location\\
This is my List:
- C:\\something\\location\\XX\\
- C:\\something\\location2\\YY\\
- C:\\something\\location3\\AA\\
- C:\\something\\location4\\CS\\
The result have to be the first element:
C:\\something\\location\\directory\\
I'd say instead of:
string dir = #"some\\path\\im\\looking\\for";
Break that up into an array for each path.
string[] dirs = new string[n] { "some", "path", "im", "looking", "for" };
Then iterate over your list, checking each item in the array as well. Each time there's a match, add it to another collection with the key (the full path) and the value (the number of matches).
for (int i = 0; i < remotePath.Count; i++)
{
int counter = 0;
for (int j = 0; j < dirs.Length; j++)
{
if (remotePath[i].Contains(dirs[j])
counter++;
}
if (counter > 0)
someStringIntDictionary.Add(remotePath[i], counter);
}
In regards to the final task of determining which is the "best match", I'm honestly not sure exactly how to do it but searching Google for C# find dicitonary key with highest value gave me this:
https://stackoverflow.com/a/2806074/1189566
This answer might not be the most efficient, with nested looping over multiple collections, but it should work.
I'd like to point out this is succeptible to inaccuracies if the filename or a subdirectory shares part of a name with something in dirs. So using the first item in the array, "some", you might run into an error with the following scenario:
"C:\\something\\location\\directory\\flibflam\\file.pdf"
something would incorrectly match to some, so it might not actually be a valid match. You'd probably want to check the adjacent character(s) to the directory in the actual path and make sure they're \ characters.
var remotePaths = new List<string>
{
#"C:\something\location\directory\",
#"C:\something\location2\directory\",
#"C:\something\location3\directory\",
#"C:\something\location4\directory\"
};
var remotePath = #"C:\something\location\directory\";
var result = remotePaths
.Select(p => new { p, mathes = p.Split('\\').TakeWhile((x, i) => x == remotePath.Split('\\')[i]).Count()})
.OrderByDescending(p => p.mathes)
.First().p;
Result:
C:\something\location\directory\
The code goes through each directory creates parse it and creates subdirectories for each one, then compares each subdirectory with remotePath subdirectory. In the end it takes the first one that has most number of matches.
At the end I did it in this way and perfectly works:
var bestPath = remotePaths.OrderByDescending(i => i.ID_FILE_PATH.Length)
.ToList()
.FirstOrDefault(i => rootPath.StartsWith(i.ID_FILE_PATH, StringComparison.InvariantCultureIgnoreCase));

C# Best way to parse flat file with dynamic number of fields per row

I have a flat file that is pipe delimited and looks something like this as example
ColA|ColB|3*|Note1|Note2|Note3|2**|A1|A2|A3|B1|B2|B3
The first two columns are set and will always be there.
* denotes a count for how many repeating fields there will be following that count so Notes 1 2 3
** denotes a count for how many times a block of fields are repeated and there are always 3 fields in a block.
This is per row, so each row may have a different number of fields.
Hope that makes sense so far.
I'm trying to find the best way to parse this file, any suggestions would be great.
The goal at the end is to map all these fields into a few different files - data transformation. I'm actually doing all this within SSIS but figured the default components won't be good enough so need to write own code.
UPDATE I'm essentially trying to read this like a source file and do some lookups and string manipulation to some of the fields in between and spit out several different files like in any normal file to file transformation SSIS package.
Using the above example, I may want to create a new file that ends up looking like this
"ColA","HardcodedString","Note1CRLFNote2CRLF","ColB"
And then another file
Row1: "ColA","A1","A2","A3"
Row2: "ColA","B1","B2","B3"
So I guess I'm after some ideas on how to parse this as well as storing the data in either Stacks or Lists or?? to play with and spit out later.
One possibility would be to use a stack. First you split the line by the pipes.
var stack = new Stack<string>(line.Split('|'));
Then you pop the first two from the stack to get them out of the way.
stack.Pop();
stack.Pop();
Then you parse the next element: 3* . For that you pop the next 3 items on the stack. With 2** you pop the next 2 x 3 = 6 items from the stack, and so on. You can stop as soon as the stack is empty.
while (stack.Count > 0)
{
// Parse elements like 3*
}
Hope this is clear enough. I find this article very useful when it comes to String.Split().
Something similar to below should work (this is untested)
ColA|ColB|3*|Note1|Note2|Note3|2**|A1|A2|A3|B1|B2|B3
string[] columns = line.Split('|');
List<string> repeatingColumnNames = new List<string();
List<List<string>> repeatingFieldValues = new List<List<string>>();
if(columns.Length > 2)
{
int repeatingFieldCountIndex = columns[2];
int repeatingFieldStartIndex = repeatingFieldCountIndex + 1;
for(int i = 0; i < repeatingFieldCountIndex; i++)
{
repeatingColumnNames.Add(columns[repeatingFieldStartIndex + i]);
}
int repeatingFieldSetCountIndex = columns[2 + repeatingFieldCount + 1];
int repeatingFieldSetStartIndex = repeatingFieldSetCountIndex + 1;
for(int i = 0; i < repeatingFieldSetCount; i++)
{
string[] fieldSet = new string[repeatingFieldCount]();
for(int j = 0; j < repeatingFieldCountIndex; j++)
{
fieldSet[j] = columns[repeatingFieldSetStartIndex + j + (i * repeatingFieldSetCount))];
}
repeatingFieldValues.Add(new List<string>(fieldSet));
}
}
System.IO.File.ReadAllLines("File.txt").Select(line => line.Split(new[] {'|'}))

Categories

Resources