Split a string into lines? - c#

Here is code;
foreach (var file in d.GetFiles("*.xml"))
{
string test = getValuesOneFile(file.ToString());
result.Add(test);
Console.WriteLine(test);
Console.ReadLine();
}
File.WriteAllLines(filepath + #"\MapData.txt", result);
Here is what it looks like in the console;
[30000]
total=5
sp 0 -144 152 999999999
sp 0 -207 123 999999999
sp 0 -173 125 999999999
in00 1 -184 213 999999999
out00 2 1046 94 40000
Here is how it looks like in the text file (when written at end of loop).
[30000]total=5sp 0 -144 152 999999999sp 0 -207 123 999999999sp 0 -173 125 999999999in00 1 -184 213 999999999out00 2 1046 94 40000
I need it to write the lines in the same style as the console output.

WriteAllLines is going to separate each of the values with the environments new line string, however, throughout the history of computers a number of possible different characters have been used to represent new lines. You are looking at the text file using some program that is expecting a different type of new line separator. You should either be using a different program to look at the value of that file; one that either properly handles this type of separator (or can handle any type of separator), you should be configuring your program to expect the given type of separator, or you'll need to replace WriteAllLines with a manual method of writing the strings that uses another new line separator.

Rather than WriteAllLines You'll probably want to just write the text manually:
string textToWrite = "";
foreach (var res in result)
{
textToWrite += res.Replace("\r","").Replace("\n",""); //Ensure there are no line feeds or carriage returns
textToWrite += "\r\n"; //Add the Carriage Return
}
File.WriteAllText(filepath + #"\MapData.txt", textToWrite)

The problem is definitely how you are looking for newlines in your output. Environment.NewLine will get inserted after each string written by WriteAllLines.
I would recommend opening the output file in NotePad++ and turn on View-> ShowSymbol-> Show End of Line to see what end of line characters are in the file. On my machine for instance it is [CR][LF] (Carriage Return / Line Feed) at the end of each line which is standard for windows.

Related

c# File.ReadLines doesnt work right

I have a txt file and readline is not working right for my file.
My lines in my code.
And this is my text in my file.Lines are like this, but my code doesnt understand lines like this.
X02233 52330 DISCHY 8 BLUZ
std STD 0 0 0 0 0 8698230653909 0.00
X02237 52337 VALONIA BLUZ STD STD 0 0 0 0 0 8698230653916 0.00
X02245 72458 HARMONY 9 BLUZ STD STD 0 0 0 0 0 8698230653923 0.00
UPDATE :
var text = File.ReadAllText(lblPath.Text);
var lines = text.Split('\n'); //Unix-based newline
var longestLine = lines.OrderByDescending(a => a.Length).First();
var shortestLine = lines.OrderBy(a => a.Length).First();
var orderByShort = lines.OrderBy(a => a.Length);
I get out of memory exception in this code.Above example is only a part of my file.My notepad file is 105 MB.
You can use File.ReadAllText to read the whole file to a string and then use the Split method to split based on the end of line character your file is using:
var text = File.ReadAllText(myFilePath);
var lines = text.Split("\n"); //Unix-based newline
File.ReadAllLines by default uses \r\n sequence for new lines - see documentation:
A line is defined as a sequence of characters followed by a carriage return ('\r'), a line feed ('\n'), or a carriage return immediately followed by a line feed.

How do I round trip an entitized carriage return with XDocument?

Suppose I have this XML document:
<x xml:space='preserve'>
</x>
with this sequence of bytes as the content of the <x/>:
38 35 120 100 59 13 10
My understanding from the W3C spec is that the sequence 13 10 will be replaced before parsing. To get the sequence 13 10 to show up in my parsed tree, I have to include the character entity &xd; as clarified in a note in the W3C spec (I recognize these are from XML-1.1 instead of XML-1.0, but they clarify confusing things in XML-1.0 without describing a different behavior).
As explained in 2.11 End-of-Line Handling, all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal.
With XDocument.Parse, this all seems to work correctly. The text content for the above XML is 13 10 (rather than 13 13 10), suggesting that the character entity is preserved and the literal 13 10 is replaced with 10 prior to parsing.
However, I can’t figure out how to get XDocument.ToString() to entitize newlines when serializing. I.e., I’d expect (XDocument xd) => XDocument.Parse($"{xd}") to be a lossless function. But if I pass in an XDocument instance with 13 10 as text content, that function outputs an XDocument instance with 10 as text content. See this demonstration:
var x = XDocument.Parse("<x xml:space='preserve'>
\r\n</x>");
present("content", x.Root.Value); // 13 10, expected
present("formatted", $"{x}"); // inside <x/>: 13 10, unexpected
x = XDocument.Parse($"{x}");
present("round tripped", x.Root.Value); // 10, unexpected
// Note that when formatting the version with just 10 in the value,
// we get Environment.NewLine in the formatted XML. So there is no
// way to differentiate between 10 and 13 10 with XDocument because
// it normalizes when serializing.
present("round tripped formatted", $"{x}"); // inside <x/>: 13 10, expected
void present(string label, string thing)
{
Console.WriteLine(label);
Console.WriteLine(thing);
Console.WriteLine(string.Join(" ", Encoding.UTF8.GetBytes(thing)));
Console.WriteLine();
}
You can see that when XDocument is serialized, it fails to entitize the carriage return as either 
 or
. The result is that it loses information. How can I safely encode an XDocument so that I do not lose anything, particularly carriage returns, that were in the original document I loaded?
To round-trip XDocument, do not use the recommended/easy serialization methods such as XDocument.ToString() because this is lossy. Note also that, even if you do something like xd.ToString(SaveOptions.DisableFormatting), any carriage returns in the parsed tree will be lost.
Instead, use a properly-configured XmlWriter with XDocument.WriteTo. If using an XmlWriter, the XmlWriter will be able to see that the document contained literal carriage returns and encode them correctly. To instruct it to do so, set XmlWritterSettings.NewLineHandling to NewLineHandling.Entitize. You’ll probably want to write an extension method to make this easier to reuse.
The demo altered to use this approach is below:
var x = XDocument.Parse("<x xml:space='preserve'>
\r\n</x>");
present("content", x.Root.Value); // 13 10, expected
present("formatted", toString(x)); // inside <x/>: 38 35 120 68 59 10 ("
\n"), acceptable
x = XDocument.Parse(toString(x));
present("round tripped", x.Root.Value); // 13 10, expected
string toString(XDocument xd)
{
using var sw = new StringWriter();
using (var writer = XmlWriter.Create(sw, new XmlWriterSettings
{
NewLineHandling = NewLineHandling.Entitize,
}))
{
xd.WriteTo(writer);
}
return sw.ToString();
}
void present(string label, string thing)
{
Console.WriteLine(label);
Console.WriteLine(thing);
Console.WriteLine(string.Join(" ", Encoding.UTF8.GetBytes(thing)));
Console.WriteLine();
}

How to remove datetime from a Logfile string

I have a logfile like this:
[2016 01 10 11:10:44] Operation3 \r\n
[2016 01 10 11:10:40] Operation2 \r\n
[2016 01 10 11:10:36] Operation1 \r\n
on that I perform a readAlllines operation so that in a string I have:
[2016 01 10 11:10:44] Operation3 \r\n[2016 01 10 11:10:40] Operation2 \r\n[2016 01 10 11:10:36] Operation1 \r\n
Now I have to remove all those timestamps.
Being a newbie and to be on the safe side I'd split it and the search on each item for start=indexOf("[") and indexOf("]") and the remove the subString by cutting each and then join all of them.
I'd like to know a smarter way to do that.
--EDIT--
Ok for downvoting me I didn't considered everything.
additional constraints:
I can't be sure of the fact that all line have the timestamp so I have to check each line for a "[" starting and a "]" in the middle
I can't even be sure for the [XXXX] lenght since I could have [2016 1 1 11:1:4] instead than [2016 01 01 11:01:04]. So it's important to check for its lenght.
Thanks
You don't need to cut/paste the lines, you can use string.replace.
This takes into account the lenght of Environment.NewLine.
while(true)
{
int start;
if (lines.Substring(0,1) == "[")
start = 0;
else
start = lines.IndexOf(Environment.NewLine + "[") + Environment.NewLine.Length;
int end = lines.IndexOf("] ");
if (start == -1 || end == -1)
break;
string subString = lines.Substring(start, end + 2 - start);
lines = lines.Replace(subString, "");
}
ReadAllLines returns an array of lines, so you don't need to look for the start of each item. If your timestamp format will be consistent, you can just trim off the start of the string.
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = line.SubString("[XXXX XX XX XX:XX:XX] ".Length);
}
Or combine this with a linq Select to do it in one step
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => x.SubString("[XXXX XX XX XX:XX:XX] ".Length);
Without consistent format, you will need to identify what you are looking for. I would write a regular expression to remove what you are looking for, otherwise you may get caught by things you weren't expecting (for example, you mention that some lines may not have timestamps - they might have something else in square brackets instead which you don't want to remove).
Example:
Regex rxTimeStamp = new Regex("^\[\d{4} \d{2} \d{2} \d{1,2}:\d{1,2}:\d{1,2}\]\s*");
string[] lines = File.ReadAllLines("log.txt");
foreach (string line in lines)
{
string logContents = rxTimeStamp.Replace(line, String.Empty);
}
// or
var logContentsWithoutTimestamps = File.ReadAllLines("log.txt")
.Select(x => rxTimeStamp.Replace(x, String.Empty));
You'll need to tune the regular expression based on whether it misses anything, but that's beyond the scope of this question.
Since your code works and you search for some different way:
string result = string.Join(string.Empty, str.Skip(22));
for each item
Explanation:
Since every timestamp is of equal length you don`t need to search for beginning or end. Normally you would have to do length checks (empty lines etc) but this works even for smaller strings - you will just get an empty string in return if the size is < 22. An alternative way if your file really just contains timestamps.

How Can I read From Line number() to line Starts with in C#

Let's say I have text file like this
<pre>----------------
hPa m C
---------------------
1004.0 28 13.6
1000.0 62 16.2
998.0 79 17.2
992.0 131 18.0
<pre>----------------
Sometext here
1000.0 10 10.6
1000.0 10 11.2
900.0 10 12.2
900.0 100 13.0
<aaa>----------------
How Can I Create Array in C# that reads text file from line number 5 (1004.0) to just before line that starts with string <pre>-
I used string[] lines = System.IO.File.ReadAllLines(Filepath);
To make each line in the array
The problem is I want only numbers of first section in the array in order to separate them later to another 3 arrays (hPa, m, C) .
Here's a possible solution. It's probably way more complicated than it should be, but that should give you an idea of possible mechanisms to further refine your data.
string[] lines = System.IO.File.ReadAllLines("test.txt");
List<double> results = new List<double>();
foreach (var line in lines.Skip(4))
{
if (line.StartsWith("<pre>"))
break;
Regex numberReg = new Regex(#"\d+(\.\d){0,1}"); //will find any number ending in ".X" - it's primitive, and won't work for something like 0.01, but no such data showed up in your example
var result = numberReg.Matches(line).Cast<Match>().FirstOrDefault(); //use only the first number from each line. You could use Cast<Match>().Skip(1).FirstOrDefault to get the second, and so on...
if (result != null)
results.Add(Convert.ToDouble(result.Value, System.Globalization.CultureInfo.InvariantCulture)); //Note the use of InvariantCulture, otherwise you may need to worry about , or . in your numbers
}
Do you mean this?
System.IO.StreamReader file = new System.IO.StreamReader(FILE_PATH);
int skipLines = 5;
for (int i = 0; i < skipLines; i++)
{
file.ReadLine();
}
// Do what you want here.

read a very big single line txt file and split it

I have the following problem:
I have a file which is nearly 500mb big. Its text, all in one line. The text is seperated with a virtual line ending, its called ROW_DEL and is in the text like this:
this is a line ROW_DEL and this is a line
now I need to make the following, I want to split this file into its lines so I get a file like this:
this is a line
and this is a line
the problem, even if I open it with the windows text editor, it breakes because the file is to big.
Is it possible to split this file like I mentioned with C#, Java or Python? Whats would be the best soultion to dont overkill my cpu.
Actually 500mb of text is not that big, it's just that notepad sucks. You probably don't have sed available since you're on windows but at least try naive solution in python, I think it will work fine:
import os
with open('infile.txt') as f_in, open('outfile.txt', 'w') as f_out:
f_out.write(f_in.read().replace('ROW_DEL ', os.linesep))
Read this file in chunks, for example use StreamReader.ReadBlock in c#. You can set the maximum number of characters to read there.
For each readed chunk you can replace ROW_DEL to \r\n and append it to new file.
Just remember to increase current index by the number of character you just read.
Here's my solution.
Easy in the principle (ŁukaszW.pl gave it) but not so easy to code if one wants to take care of peculiar cases (which ŁukaszW.pl did not).
The peculiar cases are when the separator ROW_DEL is splitted in two of the read chunks (as I4V pointed out), and even more subtlely if there are two contiguous ROW_DEL of which the second is splitted in two read chunks.
Since ROW_DEL is longer than any of the possible newlines ('\r', '\n', '\r\n') , it can be replaced in place in the file by the newline used by the OS. That's why I choosed to rewrite the file in itself.
For that I use mode 'r+', it doesn't create a new file.
It's also absolutely mandatory to use a binary mode 'b'.
The principle is to read a chunk (in real life its size will be 262144 for example) and x additional characters, wher x is the length of the separator -1.
And then to examine if the separator is present in the end of the chunk + the x characters.
Accoridng if it is present or not, the chunk is shortened or not before the transformation of the ROW_DEL is performed, and rewritten in place.
The nude code is:
text = ('The hospital roommate of a man infected ROW_DEL'
'with novel coronavirus (NCoV)ROW_DEL'
'—a SARS-related virus first identified ROW_DELROW_DEL'
'last year and already linked to 18 deaths—ROW_DEL'
'has contracted the illness himself, ROW_DEL'
'intensifying concerns about the ROW_DEL'
"virus's ability to spread ROW_DEL"
'from person to person.')
with open('eessaa.txt','w') as f:
f.write(text)
with open('eessaa.txt','rb') as f:
ch = f.read()
print ch.replace('ROW_DEL','ROW_DEL\n')
print '\nlength of the text : %d chars\n' % len(text)
#==========================================
from os.path import getsize
from os import fsync,linesep
def rewrite(whichfile,sep,chunk_length,OSeol=linesep):
if chunk_length<len(sep):
print 'Length of second argument, %d , is '\
'the minimum value for the third argument'\
% len(sep)
return
x = len(sep)-1
x2 = 2*x
file_length = getsize(whichfile)
with open(whichfile,'rb+') as fR,\
open(whichfile,'rb+') as fW:
while True:
chunk = fR.read(chunk_length)
pch = fR.tell()
twelve = chunk[-x:] + fR.read(x)
ptw = fR.tell()
if sep in twelve:
pt = twelve.find(sep)
m = ("\n !! %r is "
"at position %d in twelve !!" % (sep,pt))
y = chunk[0:-x+pt].replace(sep,OSeol)
else:
pt = x
m = ''
y = chunk.replace(sep,OSeol)
pos = fW.tell()
fW.write(y)
fW.flush()
fsync(fW.fileno())
if fR.tell()<file_length:
fR.seek(-x2+pt,1)
else:
fW.truncate()
break
rewrite('eessaa.txt','ROW_DEL',14)
with open('eessaa.txt','rb') as f:
ch = f.read()
print '\n'.join(repr(line)[1:-1] for line in ch.splitlines(1))
print '\nlength of the text : %d chars\n' % len(ch)
To follow the execution, here's another code that prints messages all along:
text = ('The hospital roommate of a man infected ROW_DEL'
'with novel coronavirus (NCoV)ROW_DEL'
'—a SARS-related virus first identified ROW_DELROW_DEL'
'last year and already linked to 18 deaths—ROW_DEL'
'has contracted the illness himself, ROW_DEL'
'intensifying concerns about the ROW_DEL'
"virus's ability to spread ROW_DEL"
'from person to person.')
with open('eessaa.txt','w') as f:
f.write(text)
with open('eessaa.txt','rb') as f:
ch = f.read()
print ch.replace('ROW_DEL','ROW_DEL\n')
print '\nlength of the text : %d chars\n' % len(text)
#==========================================
from os.path import getsize
from os import fsync,linesep
def rewrite(whichfile,sep,chunk_length,OSeol=linesep):
if chunk_length<len(sep):
print 'Length of second argument, %d , is '\
'the minimum value for the third argument'\
% len(sep)
return
x = len(sep)-1
x2 = 2*x
file_length = getsize(whichfile)
with open(whichfile,'rb+') as fR,\
open(whichfile,'rb+') as fW:
while True:
chunk = fR.read(chunk_length)
pch = fR.tell()
twelve = chunk[-x:] + fR.read(x)
ptw = fR.tell()
if sep in twelve:
pt = twelve.find(sep)
m = ("\n !! %r is "
"at position %d in twelve !!" % (sep,pt))
y = chunk[0:-x+pt].replace(sep,OSeol)
else:
pt = x
m = ''
y = chunk.replace(sep,OSeol)
print ('chunk == %r %d chars\n'
' -> fR now at position %d\n'
'twelve == %r %d chars %s\n'
' -> fR now at position %d'
% (chunk ,len(chunk), pch,
twelve,len(twelve),m, ptw) )
pos = fW.tell()
fW.write(y)
fW.flush()
fsync(fW.fileno())
print (' %r %d long\n'
' has been written from position %d\n'
' => fW now at position %d'
% (y,len(y),pos,fW.tell()))
if fR.tell()<file_length:
fR.seek(-x2+pt,1)
print ' -> fR moved %d characters back to position %d'\
% (x2-pt,fR.tell())
else:
print (" => fR is at position %d == file's size\n"
' File has thoroughly been read'
% fR.tell())
fW.truncate()
break
raw_input('\npress any key to continue')
rewrite('eessaa.txt','ROW_DEL',14)
with open('eessaa.txt','rb') as f:
ch = f.read()
print '\n'.join(repr(line)[1:-1] for line in ch.splitlines(1))
print '\nlength of the text : %d chars\n' % len(ch)
There's some subtlety in the treatment of the ends of the chunks in order to detect if ROW_DEL straddles on two chunks and if there are two ROW_DEL contiguous. That's why I took a long time to post my solution: I finally was obliged to write fR.seek(-x2+pt,1) and not only fR.seek(-2*x,1) or fR.seek(-x,1) according if sep is straddling or not (2*x is x2 in the code, with ROW_DEL x and x2 are 6 and 12). Anybody interested by this point will examine it by changing the codes in the sections accoridng if 'ROW_DEL' is in twelve or not.

Categories

Resources