c# File.ReadLines doesnt work right - c#

I have a txt file and readline is not working right for my file.
My lines in my code.
And this is my text in my file.Lines are like this, but my code doesnt understand lines like this.
X02233 52330 DISCHY 8 BLUZ
std STD 0 0 0 0 0 8698230653909 0.00
X02237 52337 VALONIA BLUZ STD STD 0 0 0 0 0 8698230653916 0.00
X02245 72458 HARMONY 9 BLUZ STD STD 0 0 0 0 0 8698230653923 0.00
UPDATE :
var text = File.ReadAllText(lblPath.Text);
var lines = text.Split('\n'); //Unix-based newline
var longestLine = lines.OrderByDescending(a => a.Length).First();
var shortestLine = lines.OrderBy(a => a.Length).First();
var orderByShort = lines.OrderBy(a => a.Length);
I get out of memory exception in this code.Above example is only a part of my file.My notepad file is 105 MB.

You can use File.ReadAllText to read the whole file to a string and then use the Split method to split based on the end of line character your file is using:
var text = File.ReadAllText(myFilePath);
var lines = text.Split("\n"); //Unix-based newline
File.ReadAllLines by default uses \r\n sequence for new lines - see documentation:
A line is defined as a sequence of characters followed by a carriage return ('\r'), a line feed ('\n'), or a carriage return immediately followed by a line feed.

Related

remove first line from text file and put rest of words into an array that separated from white-spaces,

I have text file in a following format
432
23 34 45 56 78 90
67 87 90 76 43 09
.................
I want to remove first line and insert rest of the words into an array that separated from white space.
I wrote following code to get words by removing white spaces
StreamReader streamReader = new StreamReader("C:\\Users\\sample.txt"); //get the file
string stringWithMultipleSpaces = streamReader.ReadToEnd(); //load file to string
streamReader.Close();
Regex newrow = new Regex(" +"); //specify delimiter (spaces)
string[] splitwords = r.Split(stringWithMultipleSpaces); //(convert string to array of words)
once I put a debug point on string[] splitwords line i can see following out put
How can I remove first line and get rest of the words from array index [0] ?
You need to split with all whitespaces, not just a plain space.
Use #"\s+" pattern to match 1+ whitespace symbol(s):
string[] splitwords = Regex.Split(stringWithMultipleSpaces, #"\s+");
Another approach is reading the file line by line, and - if there are always only numbers like those and no Unicode spaces present - use plain String.Split().
Something like
var results = new List<string>();
using (var sr = new StreamReader("C:\\Users\\sample.txt", true))
{
var s = string.Empty;
while ((s=sr.ReadLine()) != null)
{
results.AddRange(s.Split());
}
}

How to remove datetime from a Logfile string

I have a logfile like this:
[2016 01 10 11:10:44] Operation3 \r\n
[2016 01 10 11:10:40] Operation2 \r\n
[2016 01 10 11:10:36] Operation1 \r\n
on that I perform a readAlllines operation so that in a string I have:
[2016 01 10 11:10:44] Operation3 \r\n[2016 01 10 11:10:40] Operation2 \r\n[2016 01 10 11:10:36] Operation1 \r\n
Now I have to remove all those timestamps.
Being a newbie and to be on the safe side I'd split it and the search on each item for start=indexOf("[") and indexOf("]") and the remove the subString by cutting each and then join all of them.
I'd like to know a smarter way to do that.
--EDIT--
Ok for downvoting me I didn't considered everything.
additional constraints:
I can't be sure of the fact that all line have the timestamp so I have to check each line for a "[" starting and a "]" in the middle
I can't even be sure for the [XXXX] lenght since I could have [2016 1 1 11:1:4] instead than [2016 01 01 11:01:04]. So it's important to check for its lenght.
Thanks
You don't need to cut/paste the lines, you can use string.replace.
This takes into account the lenght of Environment.NewLine.
while(true)
{
int start;
if (lines.Substring(0,1) == "[")
start = 0;
else
start = lines.IndexOf(Environment.NewLine + "[") + Environment.NewLine.Length;
int end = lines.IndexOf("] ");
if (start == -1 || end == -1)
break;
string subString = lines.Substring(start, end + 2 - start);
lines = lines.Replace(subString, "");
}
ReadAllLines returns an array of lines, so you don't need to look for the start of each item. If your timestamp format will be consistent, you can just trim off the start of the string.
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = line.SubString("[XXXX XX XX XX:XX:XX] ".Length);
}
Or combine this with a linq Select to do it in one step
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => x.SubString("[XXXX XX XX XX:XX:XX] ".Length);
Without consistent format, you will need to identify what you are looking for. I would write a regular expression to remove what you are looking for, otherwise you may get caught by things you weren't expecting (for example, you mention that some lines may not have timestamps - they might have something else in square brackets instead which you don't want to remove).
Example:
Regex rxTimeStamp = new Regex("^\[\d{4} \d{2} \d{2} \d{1,2}:\d{1,2}:\d{1,2}\]\s*");
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = rxTimeStamp.Replace(line, String.Empty);
}
// or
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => rxTimeStamp.Replace(x, String.Empty));
You'll need to tune the regular expression based on whether it misses anything, but that's beyond the scope of this question.
Since your code works and you search for some different way:
string result = string.Join(string.Empty, str.Skip(22));
for each item
Explanation:
Since every timestamp is of equal length you don`t need to search for beginning or end. Normally you would have to do length checks (empty lines etc) but this works even for smaller strings - you will just get an empty string in return if the size is < 22. An alternative way if your file really just contains timestamps.

Filtering on full string match but not on substrings

So I've got a long string of numbers and characters and I'd like to filter out a substring. The thing I'm struggling with is that I need a full match on a certain value (starting with S) but this may not be matched in another value.
Input:
S10 1+0000000297472+00EURS100 1+0000000297472+00EURS1023P 1+0000000816072+00EUR
The input is exactly like this.
Breakdown of input:
S10 1+0000000297472+00EUR
Every part starts with a tag S and ends with EUR
There are spaces in between because every part has a fixed length
=>
index 0 : tag 'S' with length 1
index 1 : code with length 7
index 8 : numbertype with length 1
index 9 : sign with length 1
index 10 : value with length 13
index 23 : sign with length 1
index 24 : exponent with length 2
index 26 : unit with length 3
I need to match on for example S10 and I only want this substring till EUR. I don't want it to match on S100 or S1023P or any other combination. Only on exactly S10
Output:
S10 1+0000000297472+00EUR
I'm trying to use Regex to find my match on 'S + code'. I'm doing a full match on my search query and then as soon as anything follows I don't want it anymore. But doing it like this also discards the actual match as after the S10 the value will follow which will match with [^\d|^\D])+\w
foreach (var field in fieldList)
{
var query = "S" + field.BallanceCode;
var index = Regex.Match(values, Regex.Escape(query) + #"([^\d|^\D])+\w").Index;
}
For example when looking for S10
needs to match:
S10 1+0000000297472+00EUR
may not match:
S10/15 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10000001+0000000546546+00EUR
Update:
Using this code
var index = Regex.Match(values, Regex.Escape(query) + #"\p{Zs}.*?EUR").Index;
wil yield S10, S10/15, etc when looked for. However looking for S1000000 in the string doesn't work because there is no whitespace between the code and 1+
S10000001+0000000546546+00EUR
For example when looking for S1000000
needs to match:
S10000001+0000000297472+00EUR
may not match:
S10 1+0000001748447+00EUR
S1023P 1+0000000816072+00EUR
S10/15 1+0000000546546+00EUR
You can use a regex that requires a space (or whitespace) to appear right after the field.BallanceCode:
var index = Regex.Match(values, Regex.Escape(query) + (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") + ".*?EUR").Index;
The regex will match the S10, then any horizontal whitespace (\p{Zs}), then any 0 or more characters other than a newline (as few as possible due to *?) up to the first EUR.
The (field.BallanceCode.Length < 7 ? #"\p{Zs}" : "") check is necessary to support a 7-digit BallanceCode. If it contains 7 digits or more, we do not check if there is a whitespace after it. If the length is less than 7, we check for a space.
So you just want the start (S...) and end (...EUR) of each line and skip everything in between?
^([sS]\d+).*?([\d\+]+EUR)$
http://regexr.com/3c1ob

Split a string into lines?

Here is code;
foreach (var file in d.GetFiles("*.xml"))
{
string test = getValuesOneFile(file.ToString());
result.Add(test);
Console.WriteLine(test);
Console.ReadLine();
}
File.WriteAllLines(filepath + #"\MapData.txt", result);
Here is what it looks like in the console;
[30000]
total=5
sp 0 -144 152 999999999
sp 0 -207 123 999999999
sp 0 -173 125 999999999
in00 1 -184 213 999999999
out00 2 1046 94 40000
Here is how it looks like in the text file (when written at end of loop).
[30000]total=5sp 0 -144 152 999999999sp 0 -207 123 999999999sp 0 -173 125 999999999in00 1 -184 213 999999999out00 2 1046 94 40000
I need it to write the lines in the same style as the console output.
WriteAllLines is going to separate each of the values with the environments new line string, however, throughout the history of computers a number of possible different characters have been used to represent new lines. You are looking at the text file using some program that is expecting a different type of new line separator. You should either be using a different program to look at the value of that file; one that either properly handles this type of separator (or can handle any type of separator), you should be configuring your program to expect the given type of separator, or you'll need to replace WriteAllLines with a manual method of writing the strings that uses another new line separator.
Rather than WriteAllLines You'll probably want to just write the text manually:
string textToWrite = "";
foreach (var res in result)
{
textToWrite += res.Replace("\r","").Replace("\n",""); //Ensure there are no line feeds or carriage returns
textToWrite += "\r\n"; //Add the Carriage Return
}
File.WriteAllText(filepath + #"\MapData.txt", textToWrite)
The problem is definitely how you are looking for newlines in your output. Environment.NewLine will get inserted after each string written by WriteAllLines.
I would recommend opening the output file in NotePad++ and turn on View-> ShowSymbol-> Show End of Line to see what end of line characters are in the file. On my machine for instance it is [CR][LF] (Carriage Return / Line Feed) at the end of each line which is standard for windows.

How Can I read From Line number() to line Starts with in C#

Let's say I have text file like this
<pre>----------------
hPa m C
---------------------
1004.0 28 13.6
1000.0 62 16.2
998.0 79 17.2
992.0 131 18.0
<pre>----------------
Sometext here
1000.0 10 10.6
1000.0 10 11.2
900.0 10 12.2
900.0 100 13.0
<aaa>----------------
How Can I Create Array in C# that reads text file from line number 5 (1004.0) to just before line that starts with string <pre>-
I used string[] lines = System.IO.File.ReadAllLines(Filepath);
To make each line in the array
The problem is I want only numbers of first section in the array in order to separate them later to another 3 arrays (hPa, m, C) .
Here's a possible solution. It's probably way more complicated than it should be, but that should give you an idea of possible mechanisms to further refine your data.
string[] lines = System.IO.File.ReadAllLines("test.txt");
List<double> results = new List<double>();
foreach (var line in lines.Skip(4))
{
if (line.StartsWith("<pre>"))
break;
Regex numberReg = new Regex(#"\d+(\.\d){0,1}"); //will find any number ending in ".X" - it's primitive, and won't work for something like 0.01, but no such data showed up in your example
var result = numberReg.Matches(line).Cast<Match>().FirstOrDefault(); //use only the first number from each line. You could use Cast<Match>().Skip(1).FirstOrDefault to get the second, and so on...
if (result != null)
results.Add(Convert.ToDouble(result.Value, System.Globalization.CultureInfo.InvariantCulture)); //Note the use of InvariantCulture, otherwise you may need to worry about , or . in your numbers
}
Do you mean this?
System.IO.StreamReader file = new System.IO.StreamReader(FILE_PATH);
int skipLines = 5;
for (int i = 0; i < skipLines; i++)
{
file.ReadLine();
}
// Do what you want here.

Categories

Resources