I have a text file which is as follows:
Ali
M*59*AB
John
M*68*B
Shirley
F*35*B
Peter
M*88*A
Fiona
F*55*O
Mary
F*46*B
How do I effectively read two lines of data from a text file and assign into variables where 1st line is name, and 2nd line is GENDERWEIGHTBLOODTYPE?
There are a lot of ways to accomplish this. All of them involve iterating through the lines of your text file.
Here's one such solution that plays on the new ValueTuple type available in C# 7.
string path = "file path here";
Dictionary<string, (string Gender, string Weight, string BloodType)> records =
new Dictionary<string, (string Gender, string Weight, string BloodType)>();
Stack<string> stack =
new Stack<string>();
foreach (string line in File.ReadLines(path))
{
if (stack.Count != 1)
{
stack.Push(line);
continue;
}
string[] fields =
line.Split('*');
records.Add(
stack.Pop(),
(Gender: fields[0],
Weight: fields[1],
BloodType: fields[2]));
}
This snippet streams lines from the file one at a time. First it pushes the name line onto a stack. Once there's a name on the stack, the next loop pops it off, parses the current line for record information, and adds it all to the records dictionary using the name as a key.
While this solution will get you started, there are some obvious areas in which you can improve it's robustness with some insight into your data environment.
For example, this solution doesn't handle cases where either the name or the record information may be missing, nor does it handle the case where the record information may not have all three fields.
You should think carefully about how to handle such cases in your implementing code.
Related
I have a CSV that looks like this. My goal is to extract each entry (notice I said entry, not line), where an entry starts from the first column and stretches to the last column, and may span multiple lines. I'd like to extract an entry without ruining the formatting. For example, I do not want the following to be considered four seperate lines,
Eg. 1, One Column Multiple Lines
...,"1. copy ctor
2. copy ctor
3. declares function
4. default ctor",... // Where ... represents the columns before and after
but rather a column in one entry that can be represented as such
Eg. 2, One Column Single Line
"1. copy ctor\n2.copy ctor\ndeclares function\n4.default ctor"
When I iterate over the CSV, as such, I get Eg. 1. I'm not sure why splitting on a comma is treating a new line as a comma.
using (var streamReader = new StreamReader("results-survey111101.csv"))
{
string line;
while ((line = streamReader.ReadLine()) != null)
{
string[] splitLine = line.Split(',');
foreach (var column in splitLine)
Console.WriteLine(column);
}
}
If someone can show me what I need to do to get these multi line CSV columns into one line that maintains the formatting (e.g. adds \t or \n where necessary) that would be great. Thanks!
Assuming your source file is valid CSV, variability in the data is really hard to account for. That's all I'll say, but I'll link you to another SO answer if you need convincing that writing your own CSV parser is a horrible task. Reading CSV files using C#
Let's assume you are going to take advantage of an existing CSV reader library. I'll use TextFieldParser from the Microsoft.VisualBasic library as is used in the example answer I linked.
Your task is to read your source file line by line, and validate whether the line is a complete CSV entry on it's own, or if it forms part of a broken line.
If it forms part of a broken line, we need to remember the line and add the next line to it before attempting validation again.
For this we need to know one thing:
What is the expected number of fields each data entry row should have?
int expectedFieldCount = 7;
string brokenLine = "";
using (var streamReader = new StreamReader("results-survey111101.csv"))
{
string line;
while ((line = streamReader.ReadLine()) != null) // read the next line
{
// if the previous line was incomplete, add it to the current line,
// otherwise use the current line
string csvLineData = (brokenLine.Length > 0) ? brokenLine + line : line;
try
{
using (StringReader stringReader = new StringReader(csvLineData ))
using (TextFieldParser parser = new TextFieldParser(stringReader))
{
parser.SetDelimiters(",");
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields(); // tests if the line is valid csv
if (expectedFieldCount == fields.Length)
{
// do whatever you want with the fields now.
foreach (var field in fields)
{
Console.WriteLine(field);
}
brokenLine = ""; // reset the brokenLine
}
else // it was valid csv, but we don't have the required number of fields yet
{
brokenLine += line + #"\r\n";
break;
}
}
}
}
catch (Exception ex) // the current line is NOT valid csv, update brokenLine
{
brokenLine += (line + #"\r\n");
}
}
}
I am replacing the line breaks that broken lines contain with \r\n literals. You can display these in your resulting one-liner field however you want. But you shouldn't expect to be able to copy paste the result into notepad and see line breaks.
One assumes you have the same number of columns in each record. Therefore in your code where you do your Split you can merely sum the length of splitLine into a running columnsReadCount until they equal the desired columnsPerRecordCount. At that point you have read all the record and can reset the running columnsReadCount back to zero ready for the next record to read.
I am attempting to import a .CSV file into my database which is a table export from an image management system. This system allows end-users to take images and (sometimes) split them into multiple images. There is a column in this report that signifies the file name of the image that I am tracking. If items are split in the image management system, the file name receives an underscore ("_") on the report. The previous file name is not kept. The way the items can possibly exist on the CSV are shown below:
Report 1 # 8:00AM: ABC.PNG
Report 2 # 8:30AM: ABC_1.PNG
ABC_2.PNG
Report 3 # 9:00AM: ABC_1_1.PNG
ABC_1_2.PNG
ABC_2_1.PNG
ABC_2_2.PNG
Report 4 # 9:30AM ABC_1_1_1.PNG
ABC_1_1_2.PNG
ABC_1_2.PNG
ABC_2_1.PNG
ABC_2_2.PNG
I am importing each file name into its own record. When an item is split, I would like to identify the previous version and update the original record, then add the new split record into my database. The key to knowing if an item is split is locating an underscore ("_").
I am not sure what I should do to recreate previous child names, I have to test every previous iteration of the file name to see if it exists. My problem is interpreting the current state of the file name and rebuilding all previous possibilities. I do not need the original name, only the first possible split name up until the current name. The code below shows kind of what I am getting at, but I am not sure how to do this cleanly.
String[] splitName = theStringToSplit.Split('_');
for (int i = 1; i < splitName.Length - 1; i++)
{
//should concat everything between 0 and i, not just 0 and I
//not sure if this is the best way or what I should do
MessageBox.Show(splitName[0] + "_" + splitName[i] + ".PNG");
}
The thing you are looking for is part of string.
So string.Join() might help you joining an array to a delimited string:
It also contains a parameter start index and number of items to use.
string[] s = new string[] { "2", "a", "b" };
string joined = string.Join("_", s, 0 ,3);
// joined will be "2_a_b"
Maybe you are using the wrong tool for you problem. If you want to keep the last "_", you may want to use LastIndexOf() or even Regular Expressions. Anyways: You should not unnecessarily rip of names and re-glue them. If done, do it cultrue invariant and not culture specific (there might be different interpretations of "-" or the low letter of "I".
string fnwithExt = "Abc_12_23.png";
string fn = System.IO.Path.GetFileName(fnwithExt);
int indexOf = fn.LastIndexOf('_');
string part1 = fn.Substring(0, indexOf-1);
string part2 = fn.Substring(indexOf+1);
string part3 = System.IO.Path.GetExtension(fnwithExt);
string original = System.IO.Path.ChangeExtension(part1 + "_"+ part2, part3);
I need help, trying to take a large text document ~1000 lines and put it into a string array, line by line.
Example:
string[] s = {firstLineHere, Secondline, etc};
I also want a way to find the first word, only the first word of the line, and once first word it found, copy the entire line. Find only the first word or each line!
You can accomplish this with File.ReadAllLines combined with a little Linq (to accomplish the addition to the question stated in the comments of Praveen's answer.
string[] identifiers = { /*Your identifiers for needed lines*/ };
string[] allLines = File.ReadAllLines("C:\test.txt");
string[] neededLines = allLines.Where(c => identifiers.Contains(c.SubString(0, c.IndexOf(' ') - 1))).ToArray();
Or make it more of a one liner:
string[] lines = File.ReadAllLines("your path").Where(c => identifiers.Contains(c.SubString(0, c.IndexOf(' ') - 1))).ToArray();
This will give you array of all the lines in your document that start with the keywords you define within your identifiers string array.
There is an inbuilt method to achieve your requirement.
string[] lines = System.IO.File.ReadAllLines(#"C:\sample.txt");
If you want to read the file line by line
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(#"C:\sample.txt"))
{
while (reader.Peek() >= 0)
{
string line = reader.ReadLine();
//Add your conditional logic to add the line to an array
if (line.Contains(searchTerm)) {
lines.Add(line);
}
}
}
Another option you could use would be to read each individual line, while splitting the line into segments and comparing only the first element against
the provided search term. I have provided a complete working demonstration below:
Solution:
class Program
{
static void Main(string[] args)
{
// Get all lines that start with a given word from a file
var result = GetLinesWithWord("The", "temp.txt");
// Display the results.
foreach (var line in result)
{
Console.WriteLine(line + "\r");
}
Console.ReadLine();
}
public static List<string> GetLinesWithWord(string word, string filename)
{
List<string> result = new List<string>(); // A list of strings where the first word of each is the provided search term.
// Create a stream reader object to read a text file.
using (StreamReader reader = new StreamReader(filename))
{
string line = string.Empty; // Contains a single line returned by the stream reader object.
// While there are lines in the file, read a line into the line variable.
while ((line = reader.ReadLine()) != null)
{
// If the line is white space, then there are no words to compare against, so move to next line.
if (line != string.Empty)
{
// Split the line into parts by a white space delimiter.
var parts = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Get only the first word element of the line, trim off any additional white space
// and convert the it to lowercase. Compare the word element to the search term provided.
// If they are the same, add the line to the results list.
if (parts.Length > 0)
{
if (parts[0].ToLower().Trim() == word.ToLower().Trim())
{
result.Add(line);
}
}
}
}
}
return result;
}
}
Where the sample text file may contain:
How shall I know thee in the sphere which keeps
The disembodied spirits of the dead,
When all of thee that time could wither sleeps
And perishes among the dust we tread?
For I shall feel the sting of ceaseless pain
If there I meet thy gentle presence not;
Nor hear the voice I love, nor read again
In thy serenest eyes the tender thought.
Will not thy own meek heart demand me there?
That heart whose fondest throbs to me were given?
My name on earth was ever in thy prayer,
Shall it be banished from thy tongue in heaven?
In meadows fanned by heaven's life-breathing wind,
In the resplendence of that glorious sphere,
And larger movements of the unfettered mind,
Wilt thou forget the love that joined us here?
The love that lived through all the stormy past,
And meekly with my harsher nature bore,
And deeper grew, and tenderer to the last,
Shall it expire with life, and be no more?
A happier lot than mine, and larger light,
Await thee there; for thou hast bowed thy will
In cheerful homage to the rule of right,
And lovest all, and renderest good for ill.
For me, the sordid cares in which I dwell,
Shrink and consume my heart, as heat the scroll;
And wrath has left its scar--that fire of hell
Has left its frightful scar upon my soul.
Yet though thou wear'st the glory of the sky,
Wilt thou not keep the same beloved name,
The same fair thoughtful brow, and gentle eye,
Lovelier in heaven's sweet climate, yet the same?
Shalt thou not teach me, in that calmer home,
The wisdom that I learned so ill in this--
The wisdom which is love--till I become
Thy fit companion in that land of bliss?
And you wanted to retrieve every line where the first word of the line is the word 'the' by calling the method like so:
var result = GetLinesWithWord("The", "temp.txt");
Your result should then be the following:
The disembodied spirits of the dead,
The love that lived through all the stormy past,
The same fair thoughtful brow, and gentle eye,
The wisdom that I learned so ill in this--
The wisdom which is love--till I become
Hopefully this answers your question adequately enough.
I have a text file which contains lines that i need to process.Here is the format of the lines present into my text file..
07 IVIN 15:37 06/03 022 00:00:14 600 2265507967 0:03
08 ITRS 15:37 06/03 022 00:00:09 603 7878787887 0:03
08 ITRS 15:37 06/03 022 00:00:09 603 2265507967 0:03
Now as per my requirement i have to read this text file line by line.Now as soon as i get ITRS into any line i have to search for the number 2265507967 into the immediate upside of the text file lines.As soon as it gets 2265507967 in the upside lines ,it should read that line.
Now i am reading the lines into strings and breaking into characters based on spaces.Here is my code..
var strings = line.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
My problem is that i am not getting way to traverse upside of the text file and search for the substring .i.e. 2265507967.Please help .
I am not aware of being able to go backwards when reading a file (other than using the seek() method) but I might be wrong...
A simpler approach would be to:
Create a dictionary, key value being the long numeric values while the value being the line to which it belongs: <2265507967,07 IVIN 15:37 06/03 022 00:00:14 600 2265507967 0:03>
Go through the file one line at a time and:
a. If the line contains ITRS, get the value from the line and check your dictionary. Once you will have found it, clear the dictionary and go back to step 1.
b. If it does not contain ITRS, simply add the number and the line as key-value pairs.
This should be quicker than going through one line at a time and also simpler. The drawback would be that it could be quite memory intensive.
EDIT: I do not have a .NET compiler handy, so I will provide some pseudo code to better explain my answer:
//Initialization
Dictionary<string, string> previousLines = new Dictionary<string, string>();
TextReader tw = new TextReader(filePath);
string line = String.Empty;
//Read the file one line at a time
while((line = tw.ReadLine()) != null)
{
if(line.contains("ITRS")
{
//Get the number you will use for searching
string number = line.split(new char[]{' '})[4];
//Use the dictionary to read a line you have previously read.
string line = previousLines[number];
previousLines.Clear(); //Remove the elements so that they do not interrupt the next searches. I am assuming that you want to search between lines which are found between ITRS tags. If you do not want this, simply omit this line.
... //Do your logic here.
}
else
{
string number = line.split(new char[]{' '})[4];
previousLines.Add(number, line);
}
}
I'm trying to parse a text file that has a heading and the body. In the heading of this file, there are line number references to sections of the body. For example:
SECTION_A 256
SECTION_B 344
SECTION_C 556
This means, that SECTION_A starts in line 256.
What would be the best way to parse this heading into a dictionary and then when necessary read the sections.
Typical scenarios would be:
Parse the header and read only section SECTION_B
Parse the header and read fist paragraph of each section.
The data file is quite large and I definitely don't want to load all of it to the memory and then operate on it.
I'd appreciate your suggestions. My environment is VS 2008 and C# 3.5 SP1.
You can do this quite easily.
There are three parts to the problem.
1) How to find where a line in the file starts. The only way to do this is to read the lines from the file, keeping a list that records the start position in the file of that line. e.g
List lineMap = new List();
lineMap.Add(0); // Line 0 starts at location 0 in the data file (just a dummy entry)
lineMap.Add(0); // Line 1 starts at location 0 in the data file
using (StreamReader sr = new StreamReader("DataFile.txt"))
{
String line;
int lineNumber = 1;
while ((line = sr.ReadLine()) != null)
lineMap.Add(sr.BaseStream.Position);
}
2) Read and parse your index file into a dictionary.
Dictionary index = new Dictionary();
using (StreamReader sr = new StreamReader("IndexFile.txt"))
{
String line;
while ((line = sr.ReadLine()) != null)
{
string[] parts = line.Split(' '); // Break the line into the name & line number
index.Add(parts[0], Convert.ToInt32(parts[1]));
}
}
Then to find a line in your file, use:
int lineNumber = index["SECTION_B";]; // Convert section name into the line number
long offsetInDataFile = lineMap[lineNumber]; // Convert line number into file offset
Then open a new FileStream on DataFile.txt, Seek(offsetInDataFile, SeekOrigin.Begin) to move to the start of the line, and use a StreamReader (as above) to read line(s) from it.
Well, obviously you can store the name + line number into a dictionary, but that's not going to do you any good.
Well, sure, it will allow you to know which line to start reading from, but the problem is, where in the file is that line? The only way to know is to start from the beginning and start counting.
The best way would be to write a wrapper that decodes the text contents (if you have encoding issues) and can give you a line number to byte position type of mapping, then you could take that line number, 256, and look in a dictionary to know that line 256 starts at position 10000 in the file, and start reading from there.
Is this a one-off processing situation? If not, have you considered stuffing the entire file into a local database, like a SQLite database? That would allow you to have a direct mapping between line number and its contents. Of course, that file would be even bigger than your original file, and you'd need to copy data from the text file to the database, so there's some overhead either way.
Just read the file one line at a time and ignore the data until you get to the ones you need. You won't have any memory issues, but performance probably won't be great. You can do this easily in a background thread though.
Read the file until the end of the header, assuming you know where that is. Split the strings you've stored on whitespace, like so:
Dictionary<string, int> sectionIndex = new Dictionary<string, int>();
List<string> headers = new List<string>(); // fill these with readline
foreach(string header in headers) {
var s = header.Split(new[]{' '});
sectionIndex.Add(s[0], Int32.Parse(s[1]));
}
Find the dictionary entry you want, keep a count of the number of lines read in the file, and loop until you hit that line number, then read until you reach the next section's starting line. I don't know if you can guarantee the order of keys in the Dictionary, so you'd probably need the current and next section's names.
Be sure to do some error checking to make sure the section you're reading to isn't before the section you're reading from, and any other error cases you can think of.
You could read line by line until all the heading information is captured and stop (assuming all section pointers are in the heading). You would have the section and line numbers for use in retrieving the data at a later time.
string dataRow = "";
try
{
TextReader tr = new StreamReader("filename.txt");
while (true)
{
dataRow = tr.ReadLine();
if (dataRow.Substring(1, 8) != "SECTION_")
break;
else
//Parse line for section code and line number and log values
continue;
}
tr.Close();
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}