Saving/Loading Json string to/from .dat - c#

I am trying to learn how to save a json string in a .dat file, but I have trouble converting it back to the correct json string. My new string at the end starts with 2 special characters (rest of it is correct) and I am not sure why.
//Saving
string save = "a json string";
string path = #"E:\tempTest\MyTest.dat";
if (!File.Exists(path))
{
FileStream myFile = File.Create(path);
BinaryWriter binaryfile = new BinaryWriter(myFile);
binaryfile.Write(save);
binaryfile.Close();
myFile.Close();
}
//Loading
string path = #"E:\tempTest\MyTest.dat";
StreamReader objInput = new StreamReader(path, System.Text.Encoding.Default);
string contents = objInput.ReadToEnd().Trim();
string [] split = System.Text.RegularExpressions.Regex.Split(contents, "\\s+", RegexOptions.None);
StringBuilder sb = new StringBuilder();
foreach (string s in split)
{
sb.AppendLine(s);
}
string save = sb.ToString(); //string starts with 2 wrong special characters
I can obviously fix it with a simple save = save.Substring(2), but I would like to understand what the error was in my code (I guess the "\\s+" part of Regex is wrong).
Also, I am not exactly sure if this is still a good way of converting json to a data file and back. This example of how to do it, is from a 10 year old post I found online.

As posted in the comments, I should have used BinaryReader to read the file. This solved the problem.
//Loading
string path = #"E:\tempTest\MyTest.dat";
var stream = File.Open(path, FileMode.Open);
var reader = new BinaryReader(stream, Encoding.UTF8, false);
string save = reader.ReadString();
stream.Close();
reader.Close();

Related

Why does FileStream sometimes ignore invisible characters?

I have two blocks of code that I've tried using for reading data out of a file-stream in C#. My overall goal here is to try and read each line of text into a list of strings, but they are all being read into a single string (when opened with read+write access together)...
I am noticing that the first block of code correctly reads in all of my carriage returns and line-feeds, and the other ignores them. I am not sure what is really going on here. I open up the streams in two different ways, but that shouldn't really matter right? Well, in any case here is the first block of code (that correctly reads-in my white-space characters):
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
// first, open up the file for reading
fs = File.OpenRead(path);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
Now, here is the block of code that ignores all of the white-space characters (i.e. line-feed, carriage-return) and reads my entire file in one line.
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
// first, open up the file for reading/writing
fs = File.Open(path, FileMode.Open);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
Why does Open cause all data to be read as a single line, and OpenRead works correctly (reads data as multiple lines)?
UPDATE 1
I have been asked to provide the text of the file that reproduces the problem. So here it is below (make sure that CR+LF is at the end of each line!! I am not sure if that will get pasted here!)
;$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
;$$$$$$$$$ $$$$$$$
;$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
;
;
;
UPDATE 2
An exact block of code that reproduces the problem (using the text above for the file). In this case I am actually seeing the problem WITHOUT trying Open and only using OpenRead.
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
try
{
// first, open up the file for reading/writing
fs = File.OpenRead(path);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
// now, erase the contents of the file
File.WriteAllText(path, string.Empty);
// make sure that the contents of the file have been erased
fs = File.OpenRead(path);
sr = new StreamReader(fs);
if (!string.IsNullOrEmpty(line = sr.ReadLine()))
{
Trace.WriteLine("Failed: Could not erase the contents of the file.");
Assert.Fail();
}
else
{
Trace.WriteLine("Passed: Successfully erased the contents of the file.");
}
// now, attempt to over-write the contents of the file
fs.Close();
fs = File.OpenWrite(path);
sw = new StreamWriter(fs);
foreach(var l in content)
{
sw.Write(l);
}
// read back the over-written contents of the file
fs.Close();
fs = File.OpenRead(path);
sr = new StreamReader(fs);
while (!string.IsNullOrEmpty((line = sr.ReadLine())))
{
actual.Add(line);
}
// make sure the contents of the file are correct
if(content.SequenceEqual(actual))
{
Trace.WriteLine("Passed: The contents that were over-written are correct!");
}
else
{
Trace.WriteLine("Failed: The contents that were over-written are not correct!");
}
}
finally
{
// close out all the streams
fs.Close();
// finish-up with a message
Trace.WriteLine("Finished running the overwrite-file test.");
}
Your new file generated by
foreach(var l in content)
{
sw.Write(l);
}
does not contain end-of-line characters because end-of-line characters are not included in content.
As #DaveKidder points out in this thread over here, the spec for StreamReader.ReadLine specifically says that the resulting line does not include end of line.
When you do
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
You are losing end of line characters.

Quicker way of cleaning XML files from invalid characters

I found a way to clean an XML file of invalid characters, which works fine, but it is a bit slow. The cleaning takes ~10-20s which is not appreciated by users.
it seems like a huge waste of time to use streamread/write to create a clean file and then use xmlreader, is it possible to clean the line during XMLread or atleast use streamReader as an input to XMLreader to save the time saving the file?
I'm trying to get the team who creates the databases to create clean files before uploading them, but it is a slow process...
XmlReaderSettings settings = new XmlReaderSettings { CheckCharacters = false};
cleanDatabase = createCleanSDDB(database);
using (XmlReader sddbReader = XmlReader.Create(cleanDatabase, settings))
{ //Parse XML... }
private string createCleanSDDB(String sddbPath)
{
string fileName = getTmpFileName(); // get a temporary file name from the OS
string line;
string cleanLine;
using (StreamReader streamReader = new StreamReader(sddbPath, Encoding.UTF8))
using (StreamWriter streamWriter = new StreamWriter(fileName))
{
while ((line = streamReader.ReadLine()) != null)
{
cleanLine = getCleanLine(line);
streamWriter.WriteLine(cleanLine);
}
}
return fileName;
}
private string getCleanLine(string dirtyLine)
{
const string regexPattern = #"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
string cleanLine = Regex.Replace(dirtyLine, regexPattern, "");
return cleanLine;
}

Split large XML file after string found

What I have:
A large XML file # nearly 1 million lines worth of content. Example of content:
<etc35yh3 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc123 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc15y etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
^ repeat that by 900k or so lines (content changing of course)
What I need:
Search the XML file for "<etc123". Once found move (write) that line along with all lines below it to a separate XML file.
Would it be advisable to use a method such as File.ReadAllLines for the search portion? What would you all recommend for the writing portion. Line by line is not an option as far as I can tell as it would take much too long.
To quite literaly discard the content above your search string, I would not use File.ReadAllLines, as it would load the entire file into memory. Try File.Open and wrap it in a StreamReader. Loop on StreamReader.ReadLine, then start writing to a new StreamWriter, or do a byte copy on the underlying filestream.
An example of how to do so with StreamWriter/StreamReader alone is listed below.
//load the input file
//open with read and sharing
using (FileStream fsInput = new FileStream("input.txt",
FileMode.Open, FileAccess.Read, FileShare.Read))
{
//use streamreader to search for start
var srInput = new StreamReader(fsInput);
string searchString = "two";
string cSearch = null;
bool found = false;
while ((cSearch = srInput.ReadLine()) != null)
{
if (cSearch.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase)
{
found = true;
break;
}
}
if (!found)
throw new Exception("Searched string not found.");
//we have the data, write to a new file
using (StreamWriter sw = new StreamWriter(
new FileStream("out.txt", FileMode.OpenOrCreate, //create or overwrite
FileAccess.Write, FileShare.None))) // write only, no sharing
{
//write the line that we found in the search
sw.WriteLine(cSearch);
string cline = null;
while ((cline = srInput.ReadLine()) != null)
sw.WriteLine(cline);
}
}
//both files are closed and complete
You can copy with LINQ2XML
XElement doc=XElement.Load("yourXML.xml");
XDocument newDoc=new XDocument();
foreach(XElement elm in doc.DescendantsAndSelf("etc123"))
{
newDoc.Add(elm);
}
newDoc.Save("yourOutputXML.xml");
You could do one line at a time... Would not use read to end if checking contents of each line.
FileInfo file = new FileInfo("MyHugeXML.xml");
FileInfo outFile = new FileInfo("ResultFile.xml");
using(FileStream write = outFile.Create())
using(StreamReader sr = file.OpenRead())
{
bool foundit = false;
string line;
while((line = sr.ReadLine()) != null)
{
if(foundit)
{
write.WriteLine(line);
}
else if (line.Contains("<etc123"))
{
foundit = true;
}
}
}
Please note, this method may not produce valid XML, given your requirements.

C# UTF8 Decoding, returning bytes/numbers instead of string

I've having an issue decoding a file using an UTF8Encoder.
I am reading text from a file which I have encoded with UTF8 (String > Byte)
See the following method.
public static void Encode(string Path)
{
string text;
Byte[] bytes;
using (StreamReader sr = new StreamReader(Path))
{
text = sr.ReadToEnd();
UTF8Encoding Encoding = new UTF8Encoding();
bytes = Encoding.GetBytes(text);
sr.Close();
}
using (StreamWriter sw = new StreamWriter(Path))
{
foreach (byte b in bytes)
sw.Write(b.ToString());
sw.Close();
}
}
I then decode it using the method
public static String Decode(string Path)
{
String text;
Byte[] bytes;
using (StreamReader sr = new StreamReader(Path))
{
text = sr.ReadToEnd();
UTF8Encoding Encoding = new UTF8Encoding();
bytes = Encoding.GetBytes(text);
text = Encoding.GetString(bytes);
return text;
}
}
But instead of decoding the byte to have it come back to text, it just returns it as a string of numbers. I can't see what I am doing wrong as I don't really have much experience with this.
EDIT: To clarify what I'm trying to achieve. I'm trying to have a text file save the text as bytes, rather than chars/numbers. This is to provide a very simple encryption to the files, that so you can't modify them, unless you know what you're doing. The Decode function is then used to read the text (bytes) from the file and make them in to readable text. I hope this clarified what I'm trying to achieve.
PS: Sry for no comments, but I think it's short enough to be understandable
What exactly are you trying to achieve? UTF-8 (and all other Encodings) is a method to converting strings to byte arrays (text to raw data) and vice versa. StreamReader and StreamWriter are used to read/write strings from/to files. No need to re-encode anything there. Just using reader.ReadToEnd() will return the correct string.
Your piece of code seems to attempt to write a file containing a list of numbers (as a readable, textual representation) corresponding to UTF-8 bytes of the given text. OK. Even though this is very strange idea (I hope you are not trying to do anything like “encryption” with that.), this is definitely possible, if that’s really what you want to do. But you need to separate the readable numbers somehow, e.g. by newlines, and parse it when reading them back:
public static void Encode(string path)
{
byte[] bytes;
using (var sr = new StreamReader(path))
{
var text = sr.ReadToEnd();
bytes = Encoding.UTF8.GetBytes(text);
}
using (var sw = new StreamWriter(path))
{
foreach (byte b in bytes)
{
sw.WriteLine(b);
}
}
}
public static void Decode(string path)
{
var data = new List<byte>();
using (var sr = new StreamReader(path))
{
string line;
while((line = sr.ReadLine()) != null)
data.Add(Byte.Parse(line));
}
using (var sw = new StreamWriter(path))
{
sw.Write(Encoding.UTF8.GetString(data.ToArray()));
}
}
This code will decode encrypted string to text, it worked on my side.
public static String Decode(string Path)
{
String text;
using (StreamReader sr = new StreamReader(Path))
{
text = st.ReadToEnd();
byte[] bytes = Convert.FromBase64String(text);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
System.Text.Decoder decoder = encoder.GetDecoder();
int count = decoder.GetCharCount(bytes, 0, bytes.Length);
char[] arr = new char[count];
decoder.GetChars(bytes, 0, bytes.Length, arr, 0);
text= new string(arr);
return text;
}
}
The StreamReader class will handle decoding for you, so your Decode() method can be as simple as this:
public static string Decode(string path)
{
// This StreamReader constructor defaults to UTF-8
using (StreamReader reader = new StreamReader(path))
return reader.ReadToEnd();
}
I'm not sure what your Encode() method is supposed to do, since the intent seems to be to read a file as UTF-8 and then write the text back to the exact same file as UTF-8. Something like this might make more sense:
public static void Encode(string path, string text)
{
// This StreamWriter constructor defaults to UTF-8
using (StreamWriter writer = new StreamWriter(path))
writer.Write(text);
}

Parsing UTF8 encoded data from a Web Service

I'm parsing the date from http://toutankharton.com/ws/localisations.php?l=75
As you can see, it's encoded (<name>Paris 2ème</name>).
My code is the following :
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
var contents = reader.ReadToEnd();
XElement cities = XElement.Parse(contents);
var t = from city in cities.Descendants("city")
select new City
{
Name = city.Element("name").Value,
Insee = city.Element("ci").Value,
Code = city.Element("code").Value,
};
}
Isn't new StreamReader(stream, Encoding.UTF8) sufficient ?
That looks like something that happens if you take utf8-bytes and output them with a incompatible encoding like ISO8859-1. Do you know what the real character is? Going back, using ISO8859-1 to get a byte array, and UTF8 to read it, gives "è".
var input = "è";
var bytes = Encoding.GetEncoding("ISO8859-1").GetBytes(input);
var realString = Encoding.UTF8.GetString(bytes);

Categories

Resources