This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to efficiently write to file from SQL datareader in c#?
I am currently trying to create a web application that uses read-only access to allow users to download large files from our database. The table in question has 400,000 records in it and generates a 50 MB .csv file when exported.
It takes about 7s to run the statement "SELECT * FROM [table]" on SQL server, and about 33s to do so from my web application (hosted on a different server). This is reading all the data into a System.Data.SqlClient.SqlDataReader object.
My problem is that I am at a loss for converting my SqlDataReader to a .csv file. Converting each row of the SqlDataReader to a string and outputting that string to a file line by line takes almost 2 hours, which is unacceptable. Below is the code I'm using to create a file on the web application's server:
while (rdr.Read())
{
string lineout = "";
for (int index = 0; index < rdr.FieldCount; index++)
lineout += rdr[index].ToString().Replace(',', ' ') + ',';
write(lineout, filename); //uses StreamWriter.WriteLine()
}
There has to be a better way. I've looked around and saw a lot of suggestions that essentially recommend doing the above to create a file. This works great with smaller tables, but not the two really large ones we use every day. Can anyone give me a push in the right direction?
You could try building your lineout with a StringBuilder rather than manually concatenating strings:
//you can test whether it makes any difference in performance declaring a single
//StringBuilder and clearing, or creating a new one per loop
var sb = new StringBuilder();
while (rdr.Read())
{
for (int index = 0; index < rdr.FieldCount; index++)
sb.Append(rdr[index].ToString().Replace(',', ' ').Append(',');
write(sb.ToString(), filename); //uses StreamWriter.WriteLine()
sb.Clear();
}
Alternatively try to just write to the file directly and avoid generating each line in memory first:
//assume a StreamWriter instance has been created called sw...
while (rdr.Read())
{
for (int index = 0; index < rdr.FieldCount; index++)
{
sw.Write(rdr[index].ToString().Replace(',', ' ');
sw.WriteLine(",");
}
}
//flush and close stream
Related
I would like to consecutively read from a text file that is generated by my program. The problem is that after parsing the file for the first time, my program reads the last line of the file before it can begin re-parsing, which causes it to accumulates unwanted data.
3 photos: first is creating tournament and showing points, second is showing text file and the third is showing that TeamA got more 3 points
StreamReader = new StreamReader("Torneios.txt");
torneios = 0;
while (!rd.EndOfStream)
{
string line = rd.ReadLine();
if (line == "Tournament")
{
torneios++;
}
else
{
string[] arr = line.Split('-');
equipaAA = arr[0];
equipaBB = arr[1];
res = Convert.ToChar(arr[2]);
}
}
rd.Close();
That is what I'm using at the moment.
To avoid mistakes like these, I highly recommend using File.ReadAllText or File.ReadAllLines unless you are using large files (in which case they are not good choices), here is an example of an implementation of such:
string result = File.ReadAllText("textfilename.txt");
Regarding your particular code, an example using File.ReadAllLines which achieves this is:
string[] lines = File.ReadAllLines("textfilename.txt");
for(int i = 0; i < lines.Length; i++)
{
string line = lines[i];
//Do whatever you want here
}
Just to make it clear, this is not a good idea if the files you intend to read from are large (such as binary files).
I created a loop that takes user names from a text boxes. I would like to put the names into a text file and add a new line each time. It's not working. It's once again overrides the previous name. I know how to add new line to a text file, but in the loop statement it dose not work.
Here's my code:
for (int i = 0; i < txt_user.length ; i++ )
{
File.WriteAllText(#"C:\mail\users.txt", txt_user[i].Text + Environment.NewLine);
}
Here is a sample code out of the loop as writing a new line - and it works:
File.WriteAllText(#"C:\mail\users.txt", txt_user1.Text + Environment.NewLine + "abc");
You're close: there is File.AppendAllText() or File.AppendText(). You could also collect all lines in memory first and use File.AppendAllLines() (if you have enough RAM to store all lines).
WriteAllText() will write a new file or overwrite an existing one.
This will work well enough for smaller files, since the OS may apply some caching strategies. However, doing that in a loop for very large files may not be efficient. You should have a look at FileStreams then.
If you have too many entities it's better use this code
using (StreamWriter sw = File.CreateText(#"C:\mail\users.txt"))
{
for (int i = 0; i < txt_user.length ; i++ )
{
sw.WriteLine(txt_user[i].Text);
}
}
This will open the file once, and write lines to the text file as it enumerates them, and then close the file. It doesn't do multiple opens, nor does it try and build up a large string in the process, which likely makes it the most I/O and memory efficient of the bunch of answers given so far.
File.WriteAllLines(#"C:\mail\users.txt",
Enumerable
.Range(0,txt_user.length)
.Select(i=>txt_user[i].Text));
Use File.AppendText if you want to add to an existing file.
I recommend use a StringBuilder:
var builder = new StringBuilder();
for (int i = 0; i < txt_user.length ; i++ )
builder.AppendLine(txt_user[i].Text);
File.WriteAllText(#"C:\mail\users.txt", builder.ToString());
Instead of openeing and closing the file all the time (because this what File.WriteAllText does at every call), prepare the text by using a string builder and write it to the file at once. Since the text comes from text boxes, I assume that the resulting string will not be very long anyway.
var sb = new Stringbuilder();
for (int i = 0; i < txt_user.length; i++)
{
sb.AppendLine(txt_user[i].Text);
}
File.WriteAllText(#"C:\mail\users.txt", sb.ToString());
I'm trying to transpose a large data file that may have many rows and columns, for subsequent analysis in Excel. Currently rows might contain either 2 or 125,000 points, but I'm trying to be generic. (I need to transpose because Excel can't handle that many columns, but is fine if the large sets span many rows.)
Initially, I implemented this is Python, using the built-in zip function. I process the source file to separate long rows from short, then transpose the long rows with zip:
tempdata = zip(*csv.reader(open(tempdatafile,'r')))
csv.writer(open(outfile, 'a', newline='')).writerows(tempdata)
os.remove(tempdatafile)
This works great and takes a few seconds for a 15MB csv file, but since the program that generated the data in the first place is in C#, I thought it would be best to do it all in one program.
My initial approach in C# is a little different, since from what I've read, the zip function might not work quite the same. Here's my approach:
public partial class Form1 : Form
{
StreamReader source;
int Rows = 0;
int Columns = 0;
string filePath = "input.csv";
string outpath = "output.csv";
List<string[]> test_csv = new List<string[]>();
public Form1()
{
InitializeComponent();
}
private void button_Load_Click(object sender, EventArgs e)
{
source = new StreamReader(filePath);
while(!source.EndOfStream)
{
string[] Line = source.ReadLine().Split(',');
test_csv.Add(Line);
if (test_csv[Rows].Length > Columns) Columns = test_csv[Rows].Length;
Rows++;
}
}
private void button_Write_Click(object sender, EventArgs e)
{
StreamWriter outfile = new StreamWriter(outpath);
for (int i = 0; i < Columns; i++)
{
string line = "";
for (int j = 0; j < Rows; j++)
{
try
{
if (j != 0) line += ",";
line += test_csv[j][i];
}
catch { }
}
outfile.WriteLine(line);
}
outfile.Close();
MessageBox.Show("Outfile written");
}
}
I used the List because the rows might be of variable length, and I have the load function set to give me total number of columns and rows so I can know how big the outfile has to be.
I used a try/catch when writing to deal with variable length rows. If the indices are out of range for the row, this catches the exception and just skips it (the next loop writes a comma before an exception occurs).
Loading takes very little time, but actually saving the outfile is an insanely long process. After 2 hours, I was only 1/3 of the way through the file. When I stopped the program and looked at the outfile, everything is done correctly, though.
What might be causing this program to take so long? Is it all the exception handling? I could implement a second List that stores row lengths for each row so I can avoid exceptions. Would that fix this issue?
Try using StringBuilder. Concatenation (+) of long strings is very inefficient.
Create a List<string> of lines and then make a single call System.IO.File.WriteAllLines(filename, lines). This will reduce disk IO.
If you don't care about the order of the points try changing your outside for loop to System.Threading.Tasks.Parallel.For. This will run multiple threads. Since these run parallel it won't preserve the order when writing it out.
In regards to your exception handling: Since this is an error that you can determine ahead of time, you should not use a try/catch to take care of it. Change it to this:
if (j < test_csv.Length && i < test_csv[j].Length)
{
line += test_csv[j][i];
}
I have a flat file that is pipe delimited and looks something like this as example
ColA|ColB|3*|Note1|Note2|Note3|2**|A1|A2|A3|B1|B2|B3
The first two columns are set and will always be there.
* denotes a count for how many repeating fields there will be following that count so Notes 1 2 3
** denotes a count for how many times a block of fields are repeated and there are always 3 fields in a block.
This is per row, so each row may have a different number of fields.
Hope that makes sense so far.
I'm trying to find the best way to parse this file, any suggestions would be great.
The goal at the end is to map all these fields into a few different files - data transformation. I'm actually doing all this within SSIS but figured the default components won't be good enough so need to write own code.
UPDATE I'm essentially trying to read this like a source file and do some lookups and string manipulation to some of the fields in between and spit out several different files like in any normal file to file transformation SSIS package.
Using the above example, I may want to create a new file that ends up looking like this
"ColA","HardcodedString","Note1CRLFNote2CRLF","ColB"
And then another file
Row1: "ColA","A1","A2","A3"
Row2: "ColA","B1","B2","B3"
So I guess I'm after some ideas on how to parse this as well as storing the data in either Stacks or Lists or?? to play with and spit out later.
One possibility would be to use a stack. First you split the line by the pipes.
var stack = new Stack<string>(line.Split('|'));
Then you pop the first two from the stack to get them out of the way.
stack.Pop();
stack.Pop();
Then you parse the next element: 3* . For that you pop the next 3 items on the stack. With 2** you pop the next 2 x 3 = 6 items from the stack, and so on. You can stop as soon as the stack is empty.
while (stack.Count > 0)
{
// Parse elements like 3*
}
Hope this is clear enough. I find this article very useful when it comes to String.Split().
Something similar to below should work (this is untested)
ColA|ColB|3*|Note1|Note2|Note3|2**|A1|A2|A3|B1|B2|B3
string[] columns = line.Split('|');
List<string> repeatingColumnNames = new List<string();
List<List<string>> repeatingFieldValues = new List<List<string>>();
if(columns.Length > 2)
{
int repeatingFieldCountIndex = columns[2];
int repeatingFieldStartIndex = repeatingFieldCountIndex + 1;
for(int i = 0; i < repeatingFieldCountIndex; i++)
{
repeatingColumnNames.Add(columns[repeatingFieldStartIndex + i]);
}
int repeatingFieldSetCountIndex = columns[2 + repeatingFieldCount + 1];
int repeatingFieldSetStartIndex = repeatingFieldSetCountIndex + 1;
for(int i = 0; i < repeatingFieldSetCount; i++)
{
string[] fieldSet = new string[repeatingFieldCount]();
for(int j = 0; j < repeatingFieldCountIndex; j++)
{
fieldSet[j] = columns[repeatingFieldSetStartIndex + j + (i * repeatingFieldSetCount))];
}
repeatingFieldValues.Add(new List<string>(fieldSet));
}
}
System.IO.File.ReadAllLines("File.txt").Select(line => line.Split(new[] {'|'}))
So, I have a huge query that I need to run on an Access DB. I am attempting to use a for loop to break it down because I can't run the query all at once (it has an IN with 50k values). The reader is causing all kinds of problems hanging and such. Most times when I break up the for loop into 50-10000 values the reader will read 400 (exactly 400) values and then hang for about 3 minutes then do another hundred or so, hang, ad infinium. If I do over 10k values per query it gets to 2696 and then hangs, does another 1k or so after hanging, on and on. I have never really worked with odbc, sql or any type of database for that matter, so it must be something stupid, or is this expected? Maybe there's a better way to do something like this? Here's my code that is looped:
//connect to mdb
OdbcConnection mdbConn = new OdbcConnection();
mdbConn.ConnectionString = #"Driver={Microsoft Access Driver (*.mdb)};DBQ=C:\PINAL_IMAGES.mdb;";
mdbConn.Open();
OdbcCommand mdbCmd = mdbConn.CreateCommand();
mdbCmd.CommandText = #"SELECT RAW_NAME,B FROM 026_006_886 WHERE (B='CM1' OR B='CM2') AND MERGEDNAME IN" + imageChunk;
OdbcDataReader mdbReader = mdbCmd.ExecuteReader();
while (mdbReader.Read())
{
sw.WriteLine(#"for /R %%j in (" + mdbReader[0] + #") do move %%~nj.tif .\" + mdbReader[1] + #"\done");
linesRead++;
Console.WriteLine(linesRead);
}
mdbConn.Close();
Here's how I populate the imageChunk variable for the IN by reading 5000 lines with a value line from a text file using a StreamReader:
string imageChunk = "(";
for (int j = 0; j < 5000; j++)
{
string image;
if ((image = sr.ReadLine()) != null)
{
imageChunk += #"'" + sr.ReadLine() + #"',";
}
else
{
break;
}
}
imageChunk = imageChunk.Substring(0, imageChunk.Length - 1);
imageChunk += ")";
Your connection to the DB and execution of the querys seems ok to me. I suspect the "hanging" is coming because you are running the query multiple times. A couple of tips for speed. Columns B and MergedName should have indexes on them. Re-factoring your data table structure may also improve speed. Are you MergedNames truely random? If so you are probably stuck with the speed you have. As #Remou suggests, I would also compare total runtime of uploading your MergedNames list to a table, then joining the table to get your results, then delete your table on completion.
Ended up using a data adapter... Was slow but provided constant feedback instead of freezing up. Never really got a good answer why, but got some advice on smarter ways to perform a large query.