Split large XML file after string found - c#

What I have:
A large XML file # nearly 1 million lines worth of content. Example of content:
<etc35yh3 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc123 etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
<etc15y etc="numbers" etc234="a" etc345="date"><something><some more something></some more something></something></etc123>
^ repeat that by 900k or so lines (content changing of course)
What I need:
Search the XML file for "<etc123". Once found move (write) that line along with all lines below it to a separate XML file.
Would it be advisable to use a method such as File.ReadAllLines for the search portion? What would you all recommend for the writing portion. Line by line is not an option as far as I can tell as it would take much too long.

To quite literaly discard the content above your search string, I would not use File.ReadAllLines, as it would load the entire file into memory. Try File.Open and wrap it in a StreamReader. Loop on StreamReader.ReadLine, then start writing to a new StreamWriter, or do a byte copy on the underlying filestream.
An example of how to do so with StreamWriter/StreamReader alone is listed below.
//load the input file
//open with read and sharing
using (FileStream fsInput = new FileStream("input.txt",
FileMode.Open, FileAccess.Read, FileShare.Read))
{
//use streamreader to search for start
var srInput = new StreamReader(fsInput);
string searchString = "two";
string cSearch = null;
bool found = false;
while ((cSearch = srInput.ReadLine()) != null)
{
if (cSearch.StartsWith(searchString, StringComparison.CurrentCultureIgnoreCase)
{
found = true;
break;
}
}
if (!found)
throw new Exception("Searched string not found.");
//we have the data, write to a new file
using (StreamWriter sw = new StreamWriter(
new FileStream("out.txt", FileMode.OpenOrCreate, //create or overwrite
FileAccess.Write, FileShare.None))) // write only, no sharing
{
//write the line that we found in the search
sw.WriteLine(cSearch);
string cline = null;
while ((cline = srInput.ReadLine()) != null)
sw.WriteLine(cline);
}
}
//both files are closed and complete

You can copy with LINQ2XML
XElement doc=XElement.Load("yourXML.xml");
XDocument newDoc=new XDocument();
foreach(XElement elm in doc.DescendantsAndSelf("etc123"))
{
newDoc.Add(elm);
}
newDoc.Save("yourOutputXML.xml");

You could do one line at a time... Would not use read to end if checking contents of each line.
FileInfo file = new FileInfo("MyHugeXML.xml");
FileInfo outFile = new FileInfo("ResultFile.xml");
using(FileStream write = outFile.Create())
using(StreamReader sr = file.OpenRead())
{
bool foundit = false;
string line;
while((line = sr.ReadLine()) != null)
{
if(foundit)
{
write.WriteLine(line);
}
else if (line.Contains("<etc123"))
{
foundit = true;
}
}
}
Please note, this method may not produce valid XML, given your requirements.

Related

StreamWriter only writes one line

I am trying to write from a .csv file to a new file.
Every time StreamWriter writes, it writes to the first line of the new file. It then overwrites that line with the next string, and continues to do so until StreamReader reaches EndOfStream.
Has anybody ever experienced this? How did you overcome it?
This is my first solution outside of those required in by my school work. There is an unknown number of rows in the original file. Each row of the .csv file has only 17 columns. I need to write only three of them and in the order found in the code snippet below.
Before coding the StreamWriter I used Console.WriteLine() to make sure that each line was in the correct order.
Here is the code snippet:
{
string path = # "c:\directory\file.csv";
string newPath = # "c:\directory\newFile.csv"
using(FileStream fs = new FileStream(path, FileMode.Open))
{
using(StreamReader sr = new StreamReader(fs))
{
string line;
string[] columns;
while ((line = sr.ReadLine()) != null)
{
columns = line.Split(',');
using(FileStream aFStream = new FileStream(
newPath,
FileMode.OpenOrCreate,
FileAccess.ReadWrite))
using(StreamWriter sw = new StreamWriter(aFStream))
{
sw.WriteLine(columns[13] + ',' + columns[10] + ',' + columns[16]);
sw.Flush();
sw.WriteLine(sw.NewLine);
}
}
}
}
}
You should open the target in the same scope as you are opening the source instead of doing so in the loop which will cause you to overwrite the file every time with the FileMode option OpenOrCreate.
var path = #"c:\directory\file.csv";
var newPath = #"c:\directory\newFile.csv"
using(var sr = new StreamReader(new FileStream(path, FileMode.Open)))
using(var sw = new StreamWriter(new FileStream(newPath, FileMode.OpenOrCreate, FileAccess.ReadWrite)))
{
while(!sr.EndOfStream)
{
string line = sr.ReadLine();
var columns = line.Split(',');
sw.WriteLine(columns[13] + ',' + columns[10] + ',' + columns[16]);
sw.WriteLine(sw.NewLine);
}
sw.Flush();
}
I also hope you are sure about your CSV spacing as you are hard coding the positions in your code.
To correctly fix your code, you'll want to structure more:
public void CopyFileContentToLog()
{
var document = ReadByLine();
WriteToFile(document);
}
public IEnumerable<string> ReadByLine()
{
string line;
using(StreamReader reader = File.OpenText(...))
while ((line = reader.ReadLine()) != null)
yield return line;
}
public void WriteToFile(IEnumerable<string> contents)
{
using(StreamWriter writer = new StreamWriter(...))
{
foreach(var line in contents)
writer.WriteLine(line);
writer.Flush();
}
}
You could obviously tailor and make it a bit more flexible. But this should demonstrate and resolve some of the issues you have with your loop and streams.
First off, you are creating and closing a write stream to the same file for every single line. This means the file gets overwritten every line. You want to take your using block outside of the while loop; however, if you insist on opening and closing the write stream for every single line, then you need to use FileMode.Append
{
string path=#"c:\directory\file.csv";
string newPath=#"c:\directory\newFile.csv"
using(StreamReader sr = new StreamReader(new FileStream(path, FileMode.Open))) // no need for 2 usings
using (FileStream aFStream = new FileStream (newPath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
string line;
string[] columns;
{
while((line = sr.ReadLine()) != null)
{
columns = line.Split(',');
using (StreamWriter sw = new StreamWriter(aFStream))
{
sw.WriteLine(columns[13] + ',' + columns[10] + ',' + columns[16]);
sw.Flush();
sw.WriteLine(sw.NewLine);
}
}
}
}
}

c# how to end streamreader

I am doing a project Windows form for assignment in Uni, I want to search an already created text file to match a first name and last name then write some additional information if the name and last name exist. I have the code constructed and showing no errors, however when I run and attempt to add information I am being provided with an error which essentially says the next process (Streamreader writer can not access the file as it is already in use by another process) I assume this process is streamreader, I have tried to code it to stop reading to no avail. I am in my first 3 months learning coding and would appreciate some assistance if possible, I have put a snippet of my code below.
//check if there is a file with that name
if (File.Exists(sFile))
{
using (StreamReader sr = new StreamReader(sFile))
{
//while there is more data to read
while (sr.Peek() != -1)
{
//read first name and last name
sFirstName = sr.ReadLine();
sLastName = sr.ReadLine();
}
{
//does this name match?
if (sFirstName + sLastName == txtSearchName.Text)
sr.Close();
}
//Process write to file
using (StreamWriter sw = new StreamWriter(sFile, true))
{
sw.WriteLine("First Name:" + sFirstName);
sw.WriteLine("Last Name:" + sLastName);
sw.WriteLine("Gender:" + sGender);
}
You are using your writer inside the reader, using the same file.
A using disposes the object inside it, after the closing curly braces.
using(StreamReader reader = new StreamReader("foo")){
//... some stuff
using(Streamwriter writer = new StreamWriter("foo")){
}
}
Do it like so :
using(StreamReader reader = new StreamReader("foo")){
//... some stuff
}
using(Streamwriter writer = new StreamWriter("foo")){
}
As per my comment regarding the using statement.
Rearrange to the below. I've tested locally and it seems to work.
using (StreamReader sr = new StreamReader(sfile))
{
//while there is more data to read
while (sr.Peek() != -1)
{
//read first name and last name
sFirstName = sr.ReadLine();
sLastName = sr.ReadLine();
//does this name match?
if (sFirstName + sLastName == txtSearchName.Text)
break;
}
}
using (StreamWriter sw = new StreamWriter(sfile, true))
{
sw.WriteLine("First Name:" + sFirstName);
sw.WriteLine("Last Name:" + sLastName);
sw.WriteLine("Gender:" + sGender);
}
I've replaced the sr.Close with a break statement to exit out. Closing the reader causes the subsequent peek to error as it's closed.
Also, I've noticed that you are not setting gender? unless its set elsewhere.
hope that helps
You can use FileStream. It gives you many options to work with file:
var fileStream = new FileStream("FileName", FileMode.Open,
FileAccess.Write, FileShare.ReadWrite);
var fileStream = new FileStream("fileName", FileMode.Open,
FileAccess.ReadWrite, FileShare.ReadWrite);
I think this is what you want/need. You can't append to a file the way you are trying to do it. Instead you'll want to read your input file, and write a temp file as you are reading through. And, whenever your line matches your requirements, then you can write the line with your modifications.
string inputFile = "C:\\temp\\StreamWriterSample.txt";
string tempFile = "C:\\temp\\StreamWriterSampleTemp.txt";
using (StreamWriter sw = new StreamWriter(tempFile))//get a writer ready
{
using (StreamReader sr = new StreamReader(inputFile))//get a reader ready
{
string currentLine = string.Empty;
while ((currentLine = sr.ReadLine()) != null)
{
if (currentLine.Contains("Clients"))
{
sw.WriteLine(currentLine + " modified");
}
else
{
sw.WriteLine(currentLine);
}
}
}
}
//now lets crush the old file with the new file
File.Copy(tempFile, inputFile, true);

Why does FileStream sometimes ignore invisible characters?

I have two blocks of code that I've tried using for reading data out of a file-stream in C#. My overall goal here is to try and read each line of text into a list of strings, but they are all being read into a single string (when opened with read+write access together)...
I am noticing that the first block of code correctly reads in all of my carriage returns and line-feeds, and the other ignores them. I am not sure what is really going on here. I open up the streams in two different ways, but that shouldn't really matter right? Well, in any case here is the first block of code (that correctly reads-in my white-space characters):
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
// first, open up the file for reading
fs = File.OpenRead(path);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
Now, here is the block of code that ignores all of the white-space characters (i.e. line-feed, carriage-return) and reads my entire file in one line.
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
// first, open up the file for reading/writing
fs = File.Open(path, FileMode.Open);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
Why does Open cause all data to be read as a single line, and OpenRead works correctly (reads data as multiple lines)?
UPDATE 1
I have been asked to provide the text of the file that reproduces the problem. So here it is below (make sure that CR+LF is at the end of each line!! I am not sure if that will get pasted here!)
;$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
;$$$$$$$$$ $$$$$$$
;$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
;
;
;
UPDATE 2
An exact block of code that reproduces the problem (using the text above for the file). In this case I am actually seeing the problem WITHOUT trying Open and only using OpenRead.
StreamReader sr = null;
StreamWriter sw = null;
FileStream fs = null;
List<string> content = new List<string>();
List<string> actual = new List<string>();
string line = string.Empty;
try
{
// first, open up the file for reading/writing
fs = File.OpenRead(path);
sr = new StreamReader(fs);
// read-in the entire file line-by-line
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
// now, erase the contents of the file
File.WriteAllText(path, string.Empty);
// make sure that the contents of the file have been erased
fs = File.OpenRead(path);
sr = new StreamReader(fs);
if (!string.IsNullOrEmpty(line = sr.ReadLine()))
{
Trace.WriteLine("Failed: Could not erase the contents of the file.");
Assert.Fail();
}
else
{
Trace.WriteLine("Passed: Successfully erased the contents of the file.");
}
// now, attempt to over-write the contents of the file
fs.Close();
fs = File.OpenWrite(path);
sw = new StreamWriter(fs);
foreach(var l in content)
{
sw.Write(l);
}
// read back the over-written contents of the file
fs.Close();
fs = File.OpenRead(path);
sr = new StreamReader(fs);
while (!string.IsNullOrEmpty((line = sr.ReadLine())))
{
actual.Add(line);
}
// make sure the contents of the file are correct
if(content.SequenceEqual(actual))
{
Trace.WriteLine("Passed: The contents that were over-written are correct!");
}
else
{
Trace.WriteLine("Failed: The contents that were over-written are not correct!");
}
}
finally
{
// close out all the streams
fs.Close();
// finish-up with a message
Trace.WriteLine("Finished running the overwrite-file test.");
}
Your new file generated by
foreach(var l in content)
{
sw.Write(l);
}
does not contain end-of-line characters because end-of-line characters are not included in content.
As #DaveKidder points out in this thread over here, the spec for StreamReader.ReadLine specifically says that the resulting line does not include end of line.
When you do
while(!string.IsNullOrEmpty((line = sr.ReadLine())))
{
content.Add(line);
}
sr.Close();
You are losing end of line characters.

How to access a text file in c# that is being used by another process

I have text file which is being been used by modscan to write data into the file. At a particular time I have to read the data and save in database. In offline mode ie; without modscan using it I can read the data and very well save in database. however as it online with modscan it gives exception
Cannot access file as it been used by other process.
My code:
using System.IO;
string path = dt.Rows[i][11].ToString();
string[] lines = System.IO.File.ReadAllLines(#path);
path has "E:\Metertxt\02.txt"
So what changes I need to make in order to read it without interfering with modscan.
I googled and I found this which might work, however I am not sure how to use it
FileShare.ReadWrite
You can use a FileStream to open a file that is already open in another application. Then you'll need a StreamReader if you want to read it line by line. This works, assuming a file encoding of UTF8:
using (var stream = new FileStream(#"c:\tmp\locked.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
string line;
while ((line = reader.ReadLine()) != null)
{
// Do something with line, e.g. add to a list or whatever.
Console.WriteLine(line);
}
}
}
Alternative in case you really need a string[]:
var lines = new List<string>();
using (var stream = new FileStream(#"c:\tmp\locked.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
string line;
while ((line = reader.ReadLine()) != null)
{
lines.Add(line);
}
}
}
// Now you have a List<string>, which can be converted to a string[] if you really need one.
var stringArray = lines.ToArray();
FileStream fstream = new FileStream("#path", FileMode.Open,FileAccess.Read, FileShare.ReadWrite);
StreamReader sreader = new StreamReader(fstream);
List<string> lines = new List<string>();
string line;
while((line = sreader.ReadeLine()) != null)
lines.Add(line);
//do something with the lines
//if you need all lines at once,
string allLines = sreader.ReadToEnd();

Quicker way of cleaning XML files from invalid characters

I found a way to clean an XML file of invalid characters, which works fine, but it is a bit slow. The cleaning takes ~10-20s which is not appreciated by users.
it seems like a huge waste of time to use streamread/write to create a clean file and then use xmlreader, is it possible to clean the line during XMLread or atleast use streamReader as an input to XMLreader to save the time saving the file?
I'm trying to get the team who creates the databases to create clean files before uploading them, but it is a slow process...
XmlReaderSettings settings = new XmlReaderSettings { CheckCharacters = false};
cleanDatabase = createCleanSDDB(database);
using (XmlReader sddbReader = XmlReader.Create(cleanDatabase, settings))
{ //Parse XML... }
private string createCleanSDDB(String sddbPath)
{
string fileName = getTmpFileName(); // get a temporary file name from the OS
string line;
string cleanLine;
using (StreamReader streamReader = new StreamReader(sddbPath, Encoding.UTF8))
using (StreamWriter streamWriter = new StreamWriter(fileName))
{
while ((line = streamReader.ReadLine()) != null)
{
cleanLine = getCleanLine(line);
streamWriter.WriteLine(cleanLine);
}
}
return fileName;
}
private string getCleanLine(string dirtyLine)
{
const string regexPattern = #"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]";
string cleanLine = Regex.Replace(dirtyLine, regexPattern, "");
return cleanLine;
}

Categories

Resources