Write Tab Delimited File - c#

I'm having trouble writing a Tab-delimited File and I've checked around here and have not gotten my answers yet.
So I've got a function that returns the string with the important pieces below (delimiter used and how I build each line):
var delimiter = #"\t";
sb.Append(string.Join(delimiter, itemContent));
sb.Append(Environment.NewLine);
The string returned is like this:
H\t13\t170000000000001\t20150630
D\t1050\t10.0000\tY
D\t1050\t5.0000\tN
And then I write it to a file with this (content below is the string above):
var content = BuildFile(item);
var filePath = tempDirectory + fileName;
// Create the File
using (FileStream fs = File.Create(filePath))
{
Byte[] info = new UTF8Encoding(true).GetBytes(content);
fs.Write(info, 0, info.Length);
}
However, the file output is this with no tabs (opened in notepad++):
H\t13\t170000000000005\t20150630
D\t1050\t20.0000\tN
D\t1050\t2.5000\tY
When it should be more like this (sample file provided):
H 100115980 300010000000003 20150625
D 430181 1 N
D 342130 2 N
D 459961 1 N
Could this be caused by the encoding I used? Appreciate any input you may have, thanks!

Using var delimiter = #"\t";, the variable contains a literal \ plus t. The # syntax disables the backslash as "special". In this case you really want
var delimiter = "\t";
to have a tab character.

There is a typo in your code. The # prefix means that the following string is a literal so #"\t" is a two-character string with the characters \ and t
You should use "\t" without the prefix.
You should consider using a StreamWriter instead of constructing the entire string in memory and writing the raw bytes though. StreamWriter uses UTF-8 by default and allows you to write formatted lines just as you would with Console.WriteLine:
var delimiter ="\t";
using(var writer=new StreamWriter(filePath))
{
var line=string.Join(delimiter, itemContent);
writer.WriteLine(line);
}

Related

How can I use C# to search an XML file for specific words?

I'm very new to C# and XML files in general, but currently I have an XML file that still has some html markup in it (&amp, ;quot;, etc.) and I want to read through the XML file and remove all of those so it becomes easily readable. I can open and print the file to the console with no issue, but I'm stumped trying to search for those specific strings and remove them.
One way to do this would be to put all the words you want to remove into an array, and then use the Replace method to replace them with empty strings:
var xmlFilePath = #"c:\temp\original.xml";
var newFilePath = #"c:\temp\modified.xml";
var wordsToRemove = new[] {"&amp", ";quot;"};
// Read existing xml file
var fileContents = File.ReadAllText(xmlFilePath);
// Remove words
foreach (var word in wordsToRemove)
{
fileContents = fileContents.Replace(word, "");
}
// Create new file with words removed
File.WriteAllText(newFilePath, fileContents);
I suppose you are looking for this: https://learn.microsoft.com/en-us/dotnet/api/system.web.httputility.htmldecode?view=netcore-3.1
Converts a string that has been HTML-encoded for HTTP transmission into a decoded string.
// Encode the string.
string myEncodedString = HttpUtility.HtmlEncode(myString);
Console.WriteLine($"HTML Encoded string is: {myEncodedString}");
StringWriter myWriter = new StringWriter();
// Decode the encoded string.
HttpUtility.HtmlDecode(myEncodedString, myWriter);
string myDecodedString = myWriter.ToString();
Console.Write($"Decoded string of the above encoded string is: {myDecodedString}");
Your string is html encoded, probably for transmission over network. So there is a built in method to decode it.

Convert a String, which is already malformed

I have a class, which uses another class which reads a Textfile.
The Textfile is written in Ascii or to be clear CP1525.
Background info: The Textfile is generated in Axapta and uses the ASCIIio class which writes the text by using the writeRaw method
The class which I am using is by a collegue and he is using a C# StreamReader to read files. Normally this works okay because the files are written in UTF8, but in this particular case it isn't.
So the Streamreader reads the file as UTF8 and passes the read string to me.
I now have some letters, like for example the Lating small letter o with Diaeresis (ö) which aren't formated as I would need them to be.
A simple convert of the String doesn't help in this case and I can't figure out how I can get the right letters.
So this is basically how he reads it:
char quotationChar = '"';
String line = "";
using (StreamReader reader = new StreamReader(fileName))
{
if((line = reader.ReadLine()) != null)
{
line = line.Replace(quotationChar.ToString(), "");
}
}
return line;
What now happens is, in the Textfile I have the german word "Röhre" which, after reading it with the streamreader, transforms to R�hre (which looks stupid in a database).
I could try to convert every letter
Encoding enc = Encoding.GetEncoding(1252);
byte[] utf8_Bytes = new byte[line.Length];
for (int i = 0; i < line.Length; ++i)
{
utf8_Bytes[i] = (byte)line[i];
}
String propEncodeString = enc.GetString(utf8_Bytes, 0, utf8_Bytes.Length);
That doesn't give me the right character !
byte[] myarr = Encoding.UTF8.GetBytes(line);
String propEncodeString = enc.GetString(myarr);
That also returns the wrong character.
I am aware that I could just solve the problem by using this:
using (StreamReader reader = new StreamReader(fileName, Encoding.Default, true))
But just for fun:
How can I get the right string from an already wrongly decoded string ?
Once the UTF8 to ASCII conversion is first made, all characters that don't map to valid ASCII entries are replaced with the same bad data character which means that data is just lost and you can't simply 'convert' back to a good character downstream. See this example: https://dotnetfiddle.net/XWysml

How to strip out 0x0a special char from utf8 file using c# and keep file as utf8?

The following is a line from a UTF-8 file from which I am trying to remove the special char (0X0A), which shows up as a black diamond with a question mark below:
2464577 外國法譯評 True s6620178 Unspecified <1>�1009-672
This is generated when SSIS reads a SQL table then writes out, using a flat file mgr set to code page 65001.
When I open the file up in Notepad++, displays as 0X0A.
I'm looking for some C# code to definitely strip that char out and replace it with either nothing or a blank space.
Here's what I have tried:
string fileLocation = "c:\\MyFile.txt";
var content = string.Empty;
using (StreamReader reader = new System.IO.StreamReader(fileLocation))
{
content = reader.ReadToEnd();
reader.Close();
}
content = content.Replace('\u00A0', ' ');
//also tried: content.Replace((char)0X0A, ' ');
//also tried: content.Replace((char)0X0A, '');
//also tried: content.Replace((char)0X0A, (char)'\0');
Encoding encoding = Encoding.UTF8;
using (FileStream stream = new FileStream(fileLocation, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, encoding))
{
writer.Write(encoding.GetPreamble()); //This is for writing the BOM
writer.Write(content);
}
}
I also tried this code to get the actual string value:
byte[] bytes = { 0x0A };
string text = Encoding.UTF8.GetString(bytes);
And it comes back as "\n". So in the code above I also tried replacing "\n" with " ", both in double quotes and single quotes, but still no change.
At this point I'm out of ideas. Anyone got any advice?
Thanks.
may wanna have a look at regex replacement, for a good example of this, take a look at the post towards the bottom of this page...
http://social.msdn.microsoft.com/Forums/en-US/1b523d24-dab6-4870-a9ca-5d313d1ee602/invalid-character-returned-from-webservice
You can convert the string to a char array and loop through the array.
Then check what char the black diamond is and just remove it.
string content = "blahblah" + (char)10 + "blahblah";
char find = (char)10;
content = content.Replace(find.ToString(), "");

StreamWriter.WriteLine() function is generating a "Â" character

I am writing some strings in an Excel file. Sometimes the call to the
StreamWriter.WriteLine()
function unexpectedly creates a "Â" character.
Any idea why?
Update
the code:
StreamWriter writer = new StreamWriter(File.Create(outFile));
string headerline = "";
foreach (DataColumn colum in reportContents.Columns)
{
headerline = headerline + '"' + row[colum].ToString() + '"' + ',';
}
writer.WriteLine(headerline);
the output:
Personal Protection |Post-Retirement Savings|Pre-Retirement Pension|Tax & Estate Planning
Expected output: Personal Protection |Post-Retirement Savings|Pre-Retirement Pension|Tax & Estate Planning
I get the solution:
just i need to specify the default the encoding in StreamWriter like as follows and it works.
StreamWriter writer = new StreamWriter(File.Create(outFile), Encoding.Default);
shuvra
It isn't actually creating an  character - it's just writing data in a different encoding. If you look at the StreamWriter constructor overloads (http://msdn.microsoft.com/en-us/library/system.io.streamwriter.aspx) you can indicate which encoding you want the StreamWriter to write it's data in.
In case you haven't dealt with encoding before: Joel wrote a good article about it at http://www.joelonsoftware.com/articles/Unicode.html
The character set of your code should be "utf-8".

How can I efficiently process a delimited text file?

I'm simply trying to execute File.ReadAllLines against a specific file and, for every line, split on |. I have to use regex on this one.
This code below doesnt work, but you'll see what i'm trying to do:
string[] contents = File.ReadAllLines(filename);
string[] splitlines = Regex.Split(contents, '|');
foreach (string split in splitlines)
{
//Regex line = content.Split('|');
//content.Split('|');
string prefix = prefix = Regex.Match(line, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
It's not entirely clear to me what you are trying to do, but there are a number of errors in your code. I have tried to guess what you are doing, but if this isn't what you want, please explain what you do want preferably with some examples:
string inputFilename = "input.txt";
string outputFilename = "output.txt";
using (StreamWriter streamWriter = File.AppendText(outputFilename))
{
using (StreamReader streamReader = File.OpenText(inputFilename))
{
while (true)
{
string line = streamReader.ReadLine();
if (line == null)
{
break;
}
string[] splitlines = line.Split('|');
foreach (string split in splitlines)
{
Match match = Regex.Match(split, #"\S+\d+");
if (match.Success)
{
string prefix = match.Groups[0].Value;
streamWriter.WriteLine(prefix);
}
else
{
// Handle match failed...
}
}
}
}
}
Key points:
You seem to want to perform an operation on each line, so you need to iterate over the lines.
Use the simple string.Split method if you want to split on a single character. Regex.Split doesn't accept a character and "|" has a special meaning in regular expressions so it wouldn't have worked anyway unless you escaped it.
You were opening and closing the output file multiple times. You should open it just once and keep it open until you have finished writing to it. The using keyword is useful here.
Use WriteLine instead of appending "\r\n".
If the input file is large, use a StreamReader instead of ReadAllLines.
If the match fails, your program will throw an exception. You probably should check match.Success before using the match and if this returns false, handle the error appropriately (skip the line, report a warning, throw an exception with an appropriate message, etc.)
You aren't actually using groups 1 and 2 in the regular expression, so you can remove the parentheses to save the regular expression engine from having to store results that you won't use anyway.
You should pass the original string to Regex.Split and not an array.
Looks like you are using line instead of split when settings the prefix. Without knowing more about your code I cant tell if it's right or not but in any case it sticks out as the error.(it shouldnt build either)
This is a really inefficient on at least two levels :)
Regex.Split takes a string, not an array of strings.
I would recommend calling Regex.Split on each item of contents individually, then looping over the results of that call. This would mean nested for loops.
string[] contents = File.ReadAllLines(filename);
foreach (string line in contents)
{
string[] splitlines = Regex.Split(line);
foreach (string splitline in splitlines)
{
string prefix = Regex.Match(splitline, #"(\S+)(\d+)").Groups[0].Value;
File.AppendAllText(workingdirform2 + "configuration.txt", prefix+"\r\n");
}
}
This, of course isn't the most efficient way to go about it.
A more efficient way might be to split on a regular expression instead. I think this works:
string splitlines = Regex.Split(File.ReadAllText(filename), "$|\\|");
I have to assume, based on the limited feedback, that this is what you're looking for:
string inputFile = filename;
string outputFile = Path.Combine( workingdirform2, "configuration.txt" );
using ( StreamReader inputFileStream = File.OpenText( inputFile ) )
{
using ( StreamWriter ouputFileStream = File.AppendText( outputFile ) )
{
// Iterate over the file contents to extract the prefix
string currentLine;
while ( ( currentLine = inputFileStream.ReadLine() ) != null )
{
// Notice the updated Regex - your's is a bit broken
string prefix = Regex.Match( currentLine, #"^(\S+?)\d+" ).Groups[1].Value;
ouputFileStream.WriteLine( prefix );
}
}
}
This would take a file full of:
Text1231|abc|abc
Text1232|abc|abc
Text1233|abc|abc
Text1234|abc|abc
and place:
Text
Text
Text
Text
into a new file.
I hope this, at least, gets you on the right path. My crystal ball is getting hazy.. haaazzzy..
Probably one of the best way to process text files in C# is to use fileHelpers. Give it a look. It allows you to strongly type your import data.

Categories

Resources