Reading xml file from specific line - c#

this is my situation:
public static string ReadFromLine(int lineNumber, params string[] searchedAttribute)
{
//XmlTextReader reader = new XmlTextReader(xmlFilePath);
XmlReaderSettings settings = new XmlReaderSettings();
settings.LineNumberOffset = lineNumber;
settings.DtdProcessing = DtdProcessing.Parse;
FileStream fs = new FileStream(xmlFilePath, FileMode.OpenOrCreate, FileAccess.Read, FileShare.Read);
XmlReader reader = XmlReader.Create(fs, settings);
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == searchedAttribute[0] || reader.Name == searchedAttribute[1])
{
if(reader.Name == searchedAttribute[0])
reader.MoveToAttribute(searchedAttribute[0]);
else
reader.MoveToAttribute(searchedAttribute[1]);
string currentAttributeValue = reader.ReadElementString();
return currentAttributeValue;
}
}
return "notFound";
}
In the method I want to read a xml file from line that is passed in the variable "linenumber".
Unfortunately, despite using my settings according to documentation, my reader starts at line 1 every time.
Appreciate any ideas how to solve it or another solutions.

Unfortunately, despite using my settings according to documentation...
There must be some misunderstanding here. An XmlReader always starts from the current position of the passed Stream, TextReader, etc.
All what LineNumberOffset does is just an adjustment of the reported line number (if there is any error during the processing, for example).
A possible (but not quite recommended) solution can be if you embed your FileStream into a StreamReader, read lineNumber of lines, and then create an XmlReader passing the StreamReader at the current position to the XmlReader.Create(TextReader,XmlReaderSettings) overload:
using var stream = File.OpenRead(xmlFilePath);
// skipping lineNumber of lines
var streamReader = new StreamReader(stream);
for (int i = 0; i < lineNumber; i++)
streamReader.ReadLine();
XmlReader xmlReader = XmlReader.Create(sreamReader, settings);
But in fact a more preferable (and formatting-proof) solution would be to read the whole XML into anXDocument and then navigate in the content by LINQ to XML.

Related

c# how to end streamreader

I am doing a project Windows form for assignment in Uni, I want to search an already created text file to match a first name and last name then write some additional information if the name and last name exist. I have the code constructed and showing no errors, however when I run and attempt to add information I am being provided with an error which essentially says the next process (Streamreader writer can not access the file as it is already in use by another process) I assume this process is streamreader, I have tried to code it to stop reading to no avail. I am in my first 3 months learning coding and would appreciate some assistance if possible, I have put a snippet of my code below.
//check if there is a file with that name
if (File.Exists(sFile))
{
using (StreamReader sr = new StreamReader(sFile))
{
//while there is more data to read
while (sr.Peek() != -1)
{
//read first name and last name
sFirstName = sr.ReadLine();
sLastName = sr.ReadLine();
}
{
//does this name match?
if (sFirstName + sLastName == txtSearchName.Text)
sr.Close();
}
//Process write to file
using (StreamWriter sw = new StreamWriter(sFile, true))
{
sw.WriteLine("First Name:" + sFirstName);
sw.WriteLine("Last Name:" + sLastName);
sw.WriteLine("Gender:" + sGender);
}
You are using your writer inside the reader, using the same file.
A using disposes the object inside it, after the closing curly braces.
using(StreamReader reader = new StreamReader("foo")){
//... some stuff
using(Streamwriter writer = new StreamWriter("foo")){
}
}
Do it like so :
using(StreamReader reader = new StreamReader("foo")){
//... some stuff
}
using(Streamwriter writer = new StreamWriter("foo")){
}
As per my comment regarding the using statement.
Rearrange to the below. I've tested locally and it seems to work.
using (StreamReader sr = new StreamReader(sfile))
{
//while there is more data to read
while (sr.Peek() != -1)
{
//read first name and last name
sFirstName = sr.ReadLine();
sLastName = sr.ReadLine();
//does this name match?
if (sFirstName + sLastName == txtSearchName.Text)
break;
}
}
using (StreamWriter sw = new StreamWriter(sfile, true))
{
sw.WriteLine("First Name:" + sFirstName);
sw.WriteLine("Last Name:" + sLastName);
sw.WriteLine("Gender:" + sGender);
}
I've replaced the sr.Close with a break statement to exit out. Closing the reader causes the subsequent peek to error as it's closed.
Also, I've noticed that you are not setting gender? unless its set elsewhere.
hope that helps
You can use FileStream. It gives you many options to work with file:
var fileStream = new FileStream("FileName", FileMode.Open,
FileAccess.Write, FileShare.ReadWrite);
var fileStream = new FileStream("fileName", FileMode.Open,
FileAccess.ReadWrite, FileShare.ReadWrite);
I think this is what you want/need. You can't append to a file the way you are trying to do it. Instead you'll want to read your input file, and write a temp file as you are reading through. And, whenever your line matches your requirements, then you can write the line with your modifications.
string inputFile = "C:\\temp\\StreamWriterSample.txt";
string tempFile = "C:\\temp\\StreamWriterSampleTemp.txt";
using (StreamWriter sw = new StreamWriter(tempFile))//get a writer ready
{
using (StreamReader sr = new StreamReader(inputFile))//get a reader ready
{
string currentLine = string.Empty;
while ((currentLine = sr.ReadLine()) != null)
{
if (currentLine.Contains("Clients"))
{
sw.WriteLine(currentLine + " modified");
}
else
{
sw.WriteLine(currentLine);
}
}
}
}
//now lets crush the old file with the new file
File.Copy(tempFile, inputFile, true);

Why does FileStream sometimes ignore invisible characters?

I have two blocks of code that I've tried using for reading data out of a file-stream in C#. My overall goal here is to try and read each line of text into a list of strings, but they are all being read into a single string (when opened with read+write access together)...
I am noticing that the first block of code correctly reads in all of my carriage returns and line-feeds, and the other ignores them. I am not sure what is really going on here. I open up the streams in two different ways, but that shouldn't really matter right? Well, in any case here is the first block of code (that correctly reads-in my white-space characters):
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
// first, open up the file for reading
fs = File.OpenRead(path);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
Now, here is the block of code that ignores all of the white-space characters (i.e. line-feed, carriage-return) and reads my entire file in one line.
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
// first, open up the file for reading/writing
fs = File.Open(path, FileMode.Open);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
Why does Open cause all data to be read as a single line, and OpenRead works correctly (reads data as multiple lines)?
UPDATE 1
I have been asked to provide the text of the file that reproduces the problem. So here it is below (make sure that CR+LF is at the end of each line!! I am not sure if that will get pasted here!)
;$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
;$$$$$$$$$ $$$$$$$
;$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
;
;
;
UPDATE 2
An exact block of code that reproduces the problem (using the text above for the file). In this case I am actually seeing the problem WITHOUT trying Open and only using OpenRead.
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
try
{
// first, open up the file for reading/writing
fs = File.OpenRead(path);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
// now, erase the contents of the file
File.WriteAllText(path, string.Empty);
// make sure that the contents of the file have been erased
fs = File.OpenRead(path);
sr = new StreamReader(fs);
if (!string.IsNullOrEmpty(line = sr.ReadLine()))
{
Trace.WriteLine("Failed: Could not erase the contents of the file.");
Assert.Fail();
}
else
{
Trace.WriteLine("Passed: Successfully erased the contents of the file.");
}
// now, attempt to over-write the contents of the file
fs.Close();
fs = File.OpenWrite(path);
sw = new StreamWriter(fs);
foreach(var l in content)
{
sw.Write(l);
}
// read back the over-written contents of the file
fs.Close();
fs = File.OpenRead(path);
sr = new StreamReader(fs);
while (!string.IsNullOrEmpty((line = sr.ReadLine())))
{
actual.Add(line);
}
// make sure the contents of the file are correct
if(content.SequenceEqual(actual))
{
Trace.WriteLine("Passed: The contents that were over-written are correct!");
}
else
{
Trace.WriteLine("Failed: The contents that were over-written are not correct!");
}
}
finally
{
// close out all the streams
fs.Close();
// finish-up with a message
Trace.WriteLine("Finished running the overwrite-file test.");
}
Your new file generated by
foreach(var l in content)
{
sw.Write(l);
}
does not contain end-of-line characters because end-of-line characters are not included in content.
As #DaveKidder points out in this thread over here, the spec for StreamReader.ReadLine specifically says that the resulting line does not include end of line.
When you do
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
You are losing end of line characters.

Quicker way of cleaning XML files from invalid characters

I found a way to clean an XML file of invalid characters, which works fine, but it is a bit slow. The cleaning takes ~10-20s which is not appreciated by users.
it seems like a huge waste of time to use streamread/write to create a clean file and then use xmlreader, is it possible to clean the line during XMLread or atleast use streamReader as an input to XMLreader to save the time saving the file?
I'm trying to get the team who creates the databases to create clean files before uploading them, but it is a slow process...
XmlReaderSettings settings = new XmlReaderSettings { CheckCharacters = false};
cleanDatabase = createCleanSDDB(database);
using (XmlReader sddbReader = XmlReader.Create(cleanDatabase, settings))
{ //Parse XML... }
private string createCleanSDDB(String sddbPath)
{
string fileName = getTmpFileName(); // get a temporary file name from the OS
string line;
string cleanLine;
using (StreamReader streamReader = new StreamReader(sddbPath, Encoding.UTF8))
using (StreamWriter streamWriter = new StreamWriter(fileName))
{
while ((line = streamReader.ReadLine()) != null)
{
cleanLine = getCleanLine(line);
streamWriter.WriteLine(cleanLine);
}
}
return fileName;
}
private string getCleanLine(string dirtyLine)
{
const string regexPattern = #"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
string cleanLine = Regex.Replace(dirtyLine, regexPattern, "");
return cleanLine;
}

Split large XML file after string found

What I have:
A large XML file # nearly 1 million lines worth of content. Example of content:
<etc35yh3 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc123 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc15y etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
^ repeat that by 900k or so lines (content changing of course)
What I need:
Search the XML file for "<etc123". Once found move (write) that line along with all lines below it to a separate XML file.
Would it be advisable to use a method such as File.ReadAllLines for the search portion? What would you all recommend for the writing portion. Line by line is not an option as far as I can tell as it would take much too long.
To quite literaly discard the content above your search string, I would not use File.ReadAllLines, as it would load the entire file into memory. Try File.Open and wrap it in a StreamReader. Loop on StreamReader.ReadLine, then start writing to a new StreamWriter, or do a byte copy on the underlying filestream.
An example of how to do so with StreamWriter/StreamReader alone is listed below.
//load the input file
//open with read and sharing
using (FileStream fsInput = new FileStream("input.txt",
FileMode.Open, FileAccess.Read, FileShare.Read))
{
//use streamreader to search for start
var srInput = new StreamReader(fsInput);
string searchString = "two";
string cSearch = null;
bool found = false;
while ((cSearch = srInput.ReadLine()) != null)
{
if (cSearch.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase)
{
found = true;
break;
}
}
if (!found)
throw new Exception("Searched string not found.");
//we have the data, write to a new file
using (StreamWriter sw = new StreamWriter(
new FileStream("out.txt", FileMode.OpenOrCreate, //create or overwrite
FileAccess.Write, FileShare.None))) // write only, no sharing
{
//write the line that we found in the search
sw.WriteLine(cSearch);
string cline = null;
while ((cline = srInput.ReadLine()) != null)
sw.WriteLine(cline);
}
}
//both files are closed and complete
You can copy with LINQ2XML
XElement doc=XElement.Load("yourXML.xml");
XDocument newDoc=new XDocument();
foreach(XElement elm in doc.DescendantsAndSelf("etc123"))
{
newDoc.Add(elm);
}
newDoc.Save("yourOutputXML.xml");
You could do one line at a time... Would not use read to end if checking contents of each line.
FileInfo file = new FileInfo("MyHugeXML.xml");
FileInfo outFile = new FileInfo("ResultFile.xml");
using(FileStream write = outFile.Create())
using(StreamReader sr = file.OpenRead())
{
bool foundit = false;
string line;
while((line = sr.ReadLine()) != null)
{
if(foundit)
{
write.WriteLine(line);
}
else if (line.Contains("<etc123"))
{
foundit = true;
}
}
}
Please note, this method may not produce valid XML, given your requirements.

"Root element is missing" error but I have a root element

If anyone can explain why I'm getting a "Root element is missing" error when my XML document (image attached) has a root element, they win a pony which fires lazers from its eyes.
Code:
if (ISF.FileExists("Players.xml"))
{
string xml;
using (IsolatedStorageFileStream rawStream = ISF.OpenFile("Players.xml", FileMode.Open))
{
StreamReader reader = new StreamReader(rawStream);
xml = reader.ReadToEnd();
XmlReaderSettings settings = new XmlReaderSettings { IgnoreComments = true, IgnoreWhitespace = true };
XmlReader xmlReader = XmlReader.Create(reader, settings);
while (xmlReader.Read())
{
switch (xmlReader.NodeType)
{
case XmlNodeType.Element:
switch (xmlReader.Name)
{
case "numberOfPlayers":
string nodeValue = xmlReader.ReadContentAsString();
int NODEVALUE = int.Parse(nodeValue);
MessageBox.Show(" " + NODEVALUE);
break;
}
break;
}
break;
}
reader.Close();
}
}
Your problem is due to this line:
xml = reader.ReadToEnd();
This positions the reader stream to the end so that when XmlReader.Create is executed, there is nothing left in the stream for it to read.
If you need the xml string to be populated, then you need to close and reopen the reader prior to XmlReader.Create. Otherwise, removing or commenting this line out will solve your problem.
Reset the base stream's position each time it is read if you want to read from the beginning as stated earlier, but you don't have to re-create the stream each time.
String xmlResource = Assembly.GetExecutingAssembly().GetName().Name + ".XML.IODeleter.xsd";
configXsd = new StreamReader(Assembly.GetExecutingAssembly().GetManifestResourceStream(xmlResource));
if (configXsd != null)
{
configXsd.BaseStream.Position = 0;
File.WriteAllText(apppath + #"\" + Assembly.GetExecutingAssembly().GetName().Name + ".XML.IODeleter.xsd", configXsd.ReadToEnd());
}
I ended up creating a quick little function to reference before each new XmlReader...
private void ResetStream()
{
/*
The point of this is simply to open the stream with a StreamReader object
and set the position of the stream to the beginning again.
*/
StreamReader reader = new StreamReader(m_stream);
if (reader != null)
{
reader.BaseStream.Position = 0;
}
}
So when I'm working in xml I call it right before I create my reader. I always have the same stream in memory and never recreate that.
ResetStream();
using (XmlReader reader = XmlReader.Create(m_stream)) { reader.Read(); }

Categories

Resources