Im creating an application which will read in a large data file and return a specific selection of text from each line in a .dat file. Please see example of the data below.
22/06/2016 22:18:21.209 Type6 -92.31435 2.06424 0.07686
22/06/2016 22:18:21.210 Type34 -91.4085 1.84464 -0.09333
I need the first 3 sets of data which is the date, time and type. The values after the type go on for a while and i have a large amount of rows which need to collected from. I have thought about just splitting each section of the line and taking the first 3 fields. Would this work or would there be an easier way to complete this?
Thanks
You are on the right way (extracting just three fields); I suggest using Linq in the context, e.g.
var source = File
.ReadLines(#"C:\MyData.dat")
.Select(line => line.Split(new char[] { ' ' }, 4))
.Where(items => items.Length >= 3) // it seems that you have empty lines or something
.Select(items => new {
// Let's combine date and time into DateTime
date = DateTime.ParseExact(items[0] + " " + items[1],
#"dd/MM/yyyy H:m:s.fff",
CultureInfo.InvariantCulture),
kind = items[2] });
// .ToArray(); // you may want add materialization (i.e. read once and put into array)
Having got this Linq query you can easily filter out and represent the data you want, e.g.
var test = source
.Where(item => item.date > DateTime.Now.AddDays(-3)) // let's have fresh records only
.OrderByDescending(item => item.date)
.Select(item => $"{item.date} {item.kind}");
Console.Write(string.Join(Environment.NewLine, test));
You could make something just to read the first chars of each line, but the length of the line is not specified anywhere, so you have to read all the data.
You should use File.ReadLines(path) because it is lazy loading the data. This will only load one line per iteration. Foreach line you should check what data you need and save it on whatever you like...
var relevantData = new List<T>();
foreach(var line in File.ReadLines(path))
{
// parse the data you need.
relevantData.Add( new T { Date = whatever, ..... });
}
If you need to parse it multiple times, you could create an index file that contains the start index of each line.
Related
I would like to know how to do this with C #:
I have a CSV file with multiple columns as follows:
I would like to concatenate the result of all the lines of the first column to have:
Name = NDECINT, NDEC, NFAC, ORIGIN .....
You said all c#. This is done with Core 5.
var yourData = File.ReadAllLines("yourFile.csv")
.Skip(1)
.Select(x => x.Split(','))
.Select(x => new
{
Name = x[0] //only working with Name column
,Type = int.Parse(x[1]) //Only added for reference for handling more columns
});
string namesJoined = string.Join(',', yourData.Select(x => x.Name));
This is really basic code and does not handle the crazy things that can be inside a csv like a comma in the name for example.
This solution is for SSIS.
Add a variable called concat set equal to ""
Read the file using SSIS.
Add a script component
Pass in Row A and add variable
Set variable to concat += RowA + ","
When you are done, you will have an extra "," on the variable that needs to be removed.
Use an expression.
concat = left(concat, len(concat)-1)
As part of a data cleansing exercise I need to correct the formatting of a csv file.
Due to poor formatting/lack of quotes an extra comma in a description field is breaking my DTS package.
So, to get around this I have created a simple C# script to find any line in the csv that contains more columns than the header row.
When the row contains more columns than the header I want to merge array item [10] and [11] into one column and then write the line to my new file - keeping all the other existing columns as they are.
Code:
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length)
{
// TODO - Linq to write comma separated string but merge column 10 and 11 of the array
// writer.WriteLine(string.Join(delimiter, columns));
}
else
{
writer.WriteLine(string.Join(delimiter, columns));
}
Unfortunately, my Linq writing skills are somewhat lacking, can someone please help me fill in the TODO.
Simply use for columns list instead of array. That will allow you to remove unnecessary columns after merge:
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToList();
if (headers == null) headers = new string[columns.Count];
if (columns.Count != headers.Length)
{
columns[10] = columns[10] + columns[11]; // combine columns here
columns.RemoveAt(11);
}
writer.WriteLine(string.Join(delimiter, columns));
I have some problem with linq to txt file. Txt file has next specific structure:
================ 09.01.2017 [8:51:11] created by VBScript ================
....some text
============================= END =============================
================ 16.01.2017 [9:49:09] created by VBScript ================
....some text
============================= END =============================
================ 18.01.2017 [8:43:50] created by VBScript ================
....some text
============================= END =============================
etc
So I want to select all lines from that file that starts and ends with "=" and select their indexes (positions) in it.
First step: I've opened and convertered it to List (cause it's easier to work with list)
string filekvitErrorGroupsResource = Utils.ReadTextResource(resourceName, Assembly.GetExecutingAssembly());
string[] stringSeparators = {"\r\n"};
string[] lines = filekvitErrorGroupsResource.Split(stringSeparators, StringSplitOptions.None);
return new List<string>(lines);
Second step: I've tried to make simple lambda query to list by condition:
var myQuery = lines.Where(l => l.StartsWith("=") && l.EndsWith("="))
.Select(l => new {idx = lines.IndexOf(l), body = l});
PROBLEM: As the result, I expect to receive list of strings with unique indexes (idx), but instead I've received this:
So as you can see the line with "END" isn't unique, why?
a.IndexOf(b) returns the index of the first occurrence of b within a, so the index of === END === is always the same.
Instead, you can use an overload of Select which takes Func<TSource, int, TResult> as a parameter so that you can get an index of the element.
var myQuery = lines
.Select((l, i) => new {idx = i, body = l})
.Where(l => l.body.StartsWith("=") && l.body.EndsWith("="));
You can have different index using select first and then doing where.
var myQuery = lines.Select((l,idx) => new {idx = idx, body = l}).Where(m => m.body.StartsWith("=") && m.body.EndsWith("="));
Here is the fiddler : https://dotnetfiddle.net/JW7S1s
Edit : Answer updated as per comment.
Your issue is that all of the END lines have an identical string so all calls to lines.IndexOf(..) will return the first matching instance. You'll need to introduce a new method (maybe name it NextIndexOf that takes the list and maintains a counter of the last index it returned.
Each subsequent call to NextIndexOf would start looking from where it left off last time.
I have a .csv file with a list of abbreviations and their actual meaning e.g.
Laughing Out Loud, LOL
I need to be able to search for an abbreviation in a text box and replace the abbreviation with the actual words. This is what I have attempted so far to understand dictionaries but cannot work out how to read in values from the file.
Dictionary<string, string> Abbreviations = new Dictionary<string, string>();
Abbreviations.Add("Laughing Out Loud", "lol");
foreach (KeyValuePair<string, string> abbrev in Abbreviations)
{
txtinput.Text = txtinput + "<<" + abbrev.Key + ">>";
}
You can try this LINQ solution the GroupBy is to handle the case where a key is in a file multiple times.
Dictionary<string, string[]> result =
File.ReadLines("test.csv")
.Select(line => line.Split(','))
.GroupBy(arr => arr[0])
.ToDictionary(gr => gr.Key,
gr => gr.Select(s => s[1]).ToArray());
To check if the abbreviation in the TextBox exists in the Dictionary:
foreach (KeyValuePair<string, string[]> abbrev in result)
{
if (txtinput.Text == abbrev.Value)
{
txtinput.Text = txtinput + "<<" + abbrev.Key + ">>";
}
}
You can start by creating a Stream Reader for your file, then looping for all your values in the CSV and add them to the dictionary.
static void Main(string[] args)
{
var csv_reader = new StreamReader(File.OpenRead(#"your_file_path"));
//declare your dictionary somewhere outside the loop.
while (!csv_reader.EndOfStream)
{
//read the line and split if you need to with .split('')
var line = reader.ReadLine();
//Add to the dictionary here
}
//Call another method for your search and replace.
SearchAndReplace(your_input)
}
Then have the implementation of that method, search if the input exists in the dictionary and if it does replace it.
You could use LINQ to put the values of the csv into your dictionary, if that's easier for you.
I'm going to assume that your input file may have commas in the actual text, and not just separating the two fields.
Now, if that were the case, then the standard CSV file format for format the file like this:
Laughing Out Loud,LOL
"I Came, I Saw, I Conquered",ICISIC
However, from your example you have a space before the "LOL", so it doesn't appear that you're using standard CSV.
So I'll work on this input:
Laughing Out Loud, LOL
"I Came, I Saw, I Conquered",ICISIC
"to, too, or two", 2
because,B/C
For this input then this code produces a dictionary:
var dictionary =
(
from line in File.ReadAllLines("FILE.CSV")
let lastComma = line.LastIndexOf(',')
let abbreviation = line.Substring(lastComma + 1).Trim()
let actualRaw = line.Substring(0, lastComma).Trim()
let actual = actualRaw.StartsWith("\"") && actualRaw.EndsWith("\"")
? actualRaw.Substring(1, actualRaw.Length - 2)
: actualRaw
select new { abbreviation, actual }
).ToDictionary(x => x.abbreviation, x => x.actual);
You can go one better than this though. It's quite possible to create a "super function" that will do all of the replaces in one go for you.
Try this:
var translate =
(
from line in File.ReadAllLines("FILE.CSV")
let lastComma = line.LastIndexOf(',')
let abbreviation = line.Substring(lastComma + 1).Trim()
let actualRaw = line.Substring(0, lastComma).Trim()
let actual = actualRaw.StartsWith("\"") && actualRaw.EndsWith("\"")
? actualRaw.Substring(1, actualRaw.Length - 2)
: actualRaw
select (Func<string, string>)(x => x.Replace(abbreviation, actual))
).Aggregate((f1, f2) => x => f2(f1(x)));
Then I can do this:
Console.WriteLine(translate("It was me 2 B/C ICISIC, LOL!"));
I get this result:
It was me to, too, or two because I Came, I Saw, I Conquered, Laughing Out Loud!
I have a file with this content :
1,2,3,4,5#
1,2,3,4,5#
How can i read all lines using readline ?the important thing is i need to separate the values in each line ,i mean the first line's values 1,2,3,4,5 should be separated .
Suppose i have an array named myarray that can save all values in first line :the array should be like this :
myarray[0]=1
myarray[1]=2
myarray[2]=3
myarray[3]=4
myarray[4]=5
I am so new in IO in c#
Best regards
Using LINQ you can do:
List<string[]> list = File.ReadLines("YourFile.txt")
.Select(r => r.TrimEnd('#'))
.Select(line => line.Split(','))
.ToList();
File.ReadLines would read the file line by line.
.Select(r => r.TrimEnd('#')) would remove the # from end of the
line
.Select(line => line.Split(',')) would split the line on comma and return an array of string items.
ToList() would give you a List<string[]> back.
You can also use TrimEnd and Split in a single Select statement like below, (it would result in the same output):
List<string[]> list = File.ReadLines("YourFile.txt")
.Select(r => r.TrimEnd('#').Split(','))
.ToList();
Try this
string[] readText = File.ReadAllLines(path);
That will return an array of all the lines.
https://msdn.microsoft.com/en-us/library/s2tte0y1(v=vs.110).aspx
You can use a StreamReader to read all the lines in from a file and split them with a given delimiter (,).
var filename = #"C:\data.txt";
using (var sr = new StreamReader(filename))
{
var contents = sr.ReadToEnd();
var myarray = contents.
Split(',');
}
Although I do prefer the LINQ approach answer further up.