Read a .csv file in c# efficiently? [closed]

Read a .csv file in c# efficiently? [closed] - c#

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm reading huge csv files (about 350K lines by file) using this way:
StreamReader readFile = new StreamReader(fi);
string line;
string[] row;
readFile.ReadLine();
while ((line = readFile.ReadLine()) != null)
{
row = line.Split(';');
x=row[1];
y=row[2];
//More code and assignations here...
}
readFile.Close();
}
The point here is that reading line by line a huge file for every day of the month may be slow and I think that it must be another method to do it faster.

Method 1
By using LINQ:
var Lines = File.ReadLines("FilePath").Select(a => a.Split(';'));
var CSV = from line in Lines
select (line.Split(',')).ToArray();
Method 2
As Jay Riggs stated here
Here's an excellent class that will copy CSV data into a datatable using the structure of the data to create the DataTable:
A portable and efficient generic parser for flat files
It's easy to configure and easy to use. I urge you to take a look.
Method 3
Rolling your own CSV reader is a waste of time unless the files that you're reading are guaranteed to be very simple. Use a pre-existing, tried-and-tested implementation instead.

In a simple case (there're no quotation, i.e. '"' within the file) when you expect partial reading, you may find useful
var source = File
.ReadLines(fileName)
.Select(line => line.Split(';'));
for instance if you want to find out if there's a line in CSV such that 3d column value equals to 0:
var result = source
.Any(items => items[2] == "0");

Related

How do i add a list of numbers together from a text file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a text file containing numbers both negative and positive and want them to display the total in a TextBox. The text file is arranged as follows
0
25
25
-10
67.5
33.33
-45
and so on. There can be different numbers on each line both positive and negative and to 2 decimal place at the most. I'm sure its quite simple but I don't know how to do it. Can anybody help?

var total = Directory.EnumerateFiles ("C:\\", "*.txt")
.Select (filePath => File.ReadLines (filePath)
.Select (x => decimal.Parse (x))
.Sum ())
.Sum ();
Of course this code needs some improvements (error handling, parsing etc.).

Assume you have text files with contents 1,3,33,-44.3 in each files on D drive. You can try this
decimal total = 0;
foreach (string file in Directory.EnumerateFiles("D:/", "*.txt"))
{
string content = File.ReadAllText(file);
total += Convert.ToDecimal(content);
}

how to Tryparse Numbers from Text file like ascii art? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
how can i get the numbers that are coded in a ascii art with sticks?
the numberss are in a txt file und it contains this:
I must convert this txt file in
3 2 1 4 5
1 4 5
I read the text file so:
using (StreamReader sr = new StreamReader("SourceFile.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
sb.AppendLine(line);
}
}
string allines = sb.ToString();
Now, like the answer of #Zotta i have to save in two different strings (the first 4 lines and the seconds, than
then will be easier

Your numbers are 4 lines tall each => Split input into blocks of 4 Lines each
Your numbers are separated by columns of whitespace => Search for colums containing only whitespaces and split.
After you separated all the numbers, use a lookup table.

I don't know why this question is down-voted so much but I think it's interesting question. I'll answer giving a general approach other than hardcoding the possible results by finding the characters and that would work with different "ASCII font".
If you're looking for a library, maybe you can look at captcha decoding on google. There is a comprehensive article here if you want to do it yourself for ASCII specifically:
http://www.boyter.org/decoding-captchas/
Also, since most libraries probably only support images, maybe you'll need to convert your ascii art text file into a bitmap by rendering it yourself.

How will be faster get the file extension C# [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
How will be faster get the file extension
string ext = System.IO.Path.GetExtension(FileName);
string ext=FileName.Substring(FileName.LastIndexOf('.'));
System.IO.FileInfo fi = new System.IO.FileInfo(FileName);
string ext = fi.Extension;
string[] temp= FileName.Split('.');
string ext =temp[temp.Length-1];
System.Text.RegularExpressions.Regex extend = new
System.Text.RegularExpressions.Regex(#"(?:.*\.)(.*)");
string ext = extend.Match(FileName).Groups[1].Value;

In case of such operations your first concern should be which one is more idiomatic, maintainable and readable. The first and the third version are examples of these. In the second and the last one you're trying to reinvent the wheel, making the code less readable and more error prone.
In VM frameworks performance is achieved through higher-level optimisations, like controlling the number of allocated objects, references between them etc. Things you talk about here are minor and probably irrelevant in terms of performance.

c# compare two numbers got from two files [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
in c# i need to compare 2 numbers one from a local file, and one another from a downloaded file like a Patcher.
if I use Streamreader c# sad to me that he can't convert string into INT.
are there a solution for this?
file a contains the value "1" , the file b contains the value "2"
so if b>a then download the new files catch from another updater file.
thanks

If that is the only number in the file, you can use File.ReadAllText (or File.ReadAllLines in a multiline file) and convert to int like this:
string[] lines = File.ReadAllLines(#"c:\t.txt");
int number = Convert.ToInt32(lines[0]);

try to use the Convert.ToInt32 method.
If your file contains olny one number, you could use the File.ReadAllLine method, insted of streamreader.

void CompareVersions()
{
WebClient client = new WebClient();
var serverVersion = client.DownloadString("http://yourwebsite.com/version.txt");
using (StreamReader sr = new StreamReader("file.txt"))
{
if (Convert.ToInt32(serverVersion) > Convert.ToInt32(sr.ReadLine()))
{
// server version bigger
}
else
{
// up to date
}
}
}

Best way to read a FASTA file in c# [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I have a FASTA file containing several protein sequences. The format is like
----------------------
>protein1
MYRALRLLARSRPLVRAPAAALASAPGLGGAAVPSFWPPNAAR
MASQNSFRIEYDTFGELKVPNDKYYGAQTVRSTMNFKIGGVTE
RMPTPVIKAFGILKRAAAEVNQDYGLDPKIANAIMKAADEVAE
GKLNDHFPLVVWQTGSGTQTNMNVNEVISNRAIEMLGGELGSK
IPVHPNDHVNKSQ
>protein2
MRSRPAGPALLLLLLFLGAAESVRRAQPPRRYTPDWPSLDSRP
LPAWFDEAKFGVFIHWGVFSVPAWGSEWFWWHWQGEGRPYQRF
MRDNYPPGFSYADFGPQFTARFFHPEEWADLFQAAGAKYVVLT
TKHHEGFTNW*
>protein3
MKTLLLLAVIMIFGLLQAHGNLVNFHRMIKLTTGKEAALSYGF
CHCGVGGRGSPKDATDRCCVTHDCCYKRLEKRGCGTKFLSYKF
SNSGSRITCAKQDSCRSQLCECDKAAATCFARNKTTY`
-----------------------------------
Is there a good way to read in this file and store the sequences separately?
Thanks

To do this one way is to:
Create a vector where each location
holds a name and the sequence
Go through the file line by line
If the line starts with > then add
an element to the end of the vector
and save the line.substring(1) to
the element as the protein name.
Initialize the sequence in the
element to equal "".
If the line.length == 0 then it is
blank and do nothing
Else the line doesn't start with >
then it is part of the sequence so
go current vector element.sequence
+= line. Thus way each line between >protein2 and >protein3 is
concatenated and saved to the
sequence of protein2

I think maybe a little more detail about the exact file structure could be helpful. Just looking at what you have (and a quick peek at the samples on wikipedia) suggest that the name of the protein is prepended with a >, followed by at least one line break, so that would be a good place to start.
You could split the file on newline, and look for a > character to determine the name.
From there it is a little less clear because I'm not sure if the sequence data is all in one line (no linebreaks) or if it could have linebreaks. If there are none, then you should be able to just store that sequence information, and move on to the next protein name. Something like this:
var reader = new StreamReader("C:\myfile.fasta");
while(true)
{
var line = reader.ReadLine();
if(string.IsNullOrEmpty(line))
break;
if(line.StartsWith(">"))
StoreProteinName(line);
else
StoreSequence(line);
}
If it were me, I would probably use TDD and some sample data to build out a simple parser, and then keep plugging in samples until I felt I had covered all of major variances in the format.

Can you use a language other than C#? There are excellent libraries for dealing with FASTA files and other biological sequence in Perl, Python, Ruby, Java, and R (off the top of my head). They're usually branded Bio* (i.e. BioPerl, BioJava, etc)
If you're interested in C or C++, check out the answers to this question over at Biostar:
http://biostar.stackexchange.com/questions/1516/c-c-libraries-for-bioinformatics
Do yourself a favor, and don't reinvent the wheel if you don't have to.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Read a .csv file in c# efficiently? [closed] - c#

Related

How do i add a list of numbers together from a text file? [closed]

how to Tryparse Numbers from Text file like ascii art? [closed]

How will be faster get the file extension C# [closed]

c# compare two numbers got from two files [closed]

Best way to read a FASTA file in c# [closed]

Categories

Resources