I need to process a large amount of csv data in real time as it is spat out by a TCP port. Here is an example as displayed by Putty:
MSG,3,1920,742,4009C5,14205994,2017/01/29,20:14:27.065,2017/01/29,20:14:27.972,,8000,,,51.26582,-0.33783,,,0,0,0,0
MSG,4,1920,742,4009C5,14205994,2017/01/29,20:14:27.065,2017/01/29,20:14:27.972,,,212.9,242.0,,,0,,,,,
MSG,1,1920,742,4009C5,14205994,2017/01/29,20:14:27.065,2017/01/29,20:14:27.972,BAW469,,,,,,,,,,,
MSG,3,1920,742,4009C5,14205994,2017/01/29,20:14:27.284,2017/01/29,20:14:27.972,,8000,,,51.26559,-0.33835,,,0,0,0,0
MSG,4,1920,742,4009C5,14205994,2017/01/29,20:14:27.284,2017/01/29,20:14:27.972,,,212.9,242.0,,,0,,,,,
I need to put each line of data in string (line) into an array (linedata[]) so that I can read and process certain elements, but linedata = line.Split(','); seems to ignore the many empty elements, with the result that linedata[20], for example, may or may not exist, and if it doesn't I get an error if I try to read it. Even if element 20 in the line contains a value it won't necessarily be the 20th element in the array. And that's no good.
I can work out how to parse line character by character into linedata[], inserting an empty string where appropriate, but surely there must be a better way ? Have I missed something obvious ?
Many Thanks. Perhaps I'd better add that I'm quite new to C#, my past experience is all with Delphi 7. I really miss stringlists.
Edited: sorry, this is now resolved with the help of MSDN's documentation. This code works: lineData = line.Split(separators, StringSplitOptions.None); after setting "string[] separators = { "," };". My big mistake was to follow examples found on tutorial sites which didn't give any clues that the .split method had any options.
https://msdn.microsoft.com/en-us/library/system.stringsplitoptions(v=vs.110).aspx
That link has an example section, look at example 1b specifically. There is an extra parameter to Split called StringSplitOptions which does this.
For Example:
string[] linedata = line.Split(charSeparators, StringSplitOptions.None);
foreach (string line in linedata)
{
Console.Write("<{0}>", line);
}
Console.Write("\n\n");
The way to find this sort of information is to start with the Reference Documentation for the function, and hope it has an option or a link to a similar function.
If you want to also start validating types, handling variants in the format etc... you could move up to a CSV library. If you do not need that functionality, this is the easiest way and efficient for small files.
Some of the overloads for String.Split() take a StringSplitOptions argument, and if you use the RemoveEmptyEntries option, it will...remove the empty entries. So you can specify the None option:
linedata = line.Split(new [] { ',' }, StringSplitOptions.None);
Or better yet, use the overload that doesn't take a StringSplitOptions, which treats it as None by default:
linedata = line.Split(',');
The code in your question indicates that you are doing this, but your description of the problem suggests that you are not.
However, you're probably better off using an actual CSV parser, which would handle things like unescaping and so on.
The StringReader class provides methods for reading lines, characters, or blocks of characters from a string. Hope this could be the clue
string str = #"MSG,3,1920,742,4009C5,14205994,2017/01/29,20:14:27.065,2017/01/29,20:14:27.972,,8000,,,51.26582,-0.33783,,,0,0,0,0
MSG,4,1920,742,4009C5,14205994,2017/01/29,20:14:27.065,2017/01/29,20:14:27.972,,,212.9,242.0,,,0,,,,,
MSG,1,1920,742,4009C5,14205994,2017/01/29,20:14:27.065,2017/01/29,20:14:27.972,BAW469,,,,,,,,,,,
MSG,3,1920,742,4009C5,14205994,2017/01/29,20:14:27.284,2017/01/29,20:14:27.972,,8000,,,51.26559,-0.33835,,,0,0,0,0
MSG,4,1920,742,4009C5,14205994,2017/01/29,20:14:27.284,2017/01/29,20:14:27.972,,,212.9,242.0,,,0,,,,,";
using (StringReader reader = new StringReader(str))
do
{
string[] linedata = reader.ReadLine().Split(',');
} while (reader.Read() != -1);
While you should look into the various ways the String class can help you here, sometimes the quick and dirty "MAKE it fit" option is called for. In this case, that'd be to roll through the strings in advance and ensure you have at least one character between the commas.
public static string FixIt(string s)
{
return s.Replace(",,", ", ,");
}
You should be able to:
var lineData = FixIt(line).Split(',');
Edit: In response to the question below, I'm not sure what you meant, but if you mean doing it without creating a helper method, you can do so easily. The code will be harder to read and troubleshoot if you do it in one line though. My personal rule is, if you have to do it a LOT, it should probably be a method. If you only had to do it once, this is particularly clean. I'd actually do it this way and just wrap it in a method that does all the work for you.
var lineData = line.Replace(",,", ", ,").Split(',');
As a method, it'd be:
public static string[] GiveMeAnArray(string s)
{
return s.Replace(",,", ", ,").Split(',');
}
Related
I have an application where I have to provide number of parameters in the format Name:Value
I provide the list of parameters through the Command line arguments value under "Debug" section of the project
So, it look something like that: "MyJobName" "0" "#FullFilePath:C:\MyFile.txt" "#FileType:MyFileType" "#FileDate:20200318" "#FileID:MyAppID"
One parameter is FilePath:C:\FileDir\MyFileTxt.txt
So, when the following logic is applied:
for (int i = 2; i <= args.GetLength(0) - 1; i++)
{
L.Add(args[i].Split(':')[0], args[i].Split(':')[1]);
}
My Parameter looks like that: FilePath:C, ignoring the rest of the path.
The final parameter list that I need to pass to the Stored Procedure should have "Name:Value" format
How can I fix that?
Split lets you pass the maximum array length.
See Split Split(Char[], Int32)
Splits a string into a maximum number of substrings based on the characters
in an array.
You also specify the maximum number of substrings to return.
Sample:
var keyValue = args[i].Split(new char[]{ ':' }, 2);
L.Add(keyValue[0], keyValue[1]);
This way only the first : is taken. The other : that come after it are ignored and will be part of the second item in the array.
But I honestly advise you to use a proper parameter parser, because your approach is very easy to break and very very fragile.
https://github.com/commandlineparser/commandline
Have a look at dragonfruit and Systel.CommandLine
Instead of writing your arguments parser yourself.
It’s a way to have type safe arguments in your main method.
Scott Hanselman has a great blog post about it.
The great part being your XML comments are used to generate a help message.
The moment you use Split, you exclude the delimiter from being a valid character without having to add the extra overloads to it. So if you absolutely must use a colon as your delimiter, you can either use the Split with overload as suggested above, or write extra code to address it;below is how I would parse it.
Of course, a much easier alternative (if possible) would be to change your delimiter to something you know it would never use, something like a pipe or a tilde or a backtick (|, ~, ). Then Split would work cleanly.
"#FullFilePath:C:\MyFile.txt" "#FileType:MyFileType" "#FileDate:20200318" "#FileID:MyAppID"
If your parameters always have the format #ParameterName:ParameterValue, your best bet is to parse the command line args like so:
var argumentsList = new Dictionary<string,object>();
for (int i=2; i < args.Length; i++)
{
int colonIndex = args[i].IndexOf(":");
string parameterName = args[i].Substring(0, colonIndex - 1);
string parameterValue = args[i].Substring(colonIndex + 1);
argumentsList[parameterName] = parameterValue;
}
The scope of your question centers around how to get around the colon, so however you choose to store the parameter values is up to you, I just used the dictionary as an example to help wrap up the code.
This will skip FilePath and give you C:\FileDir\MyFileTxt.txt
string.Join(":", args[i].Split(':').Skip(1));
iam just curious, is there a way to break multiple string in cell gridview and store it or display it one by one.
earlier when i messagebox.show it would diplay the whole name or number like
abdullah ali ashonie; adefitri; candry. so what i want is, it display one by one abdullah ali shonie then adefitri then candry and how to store it
sorry for bad english, because i dont quite sure you guys know what i want
The simple way is String.Split():
var parts = GridView1.Rows[0].Cells[0].Text.Split(";".ToCharArray())
Just be warned: String.Split() has all kinds of pitfalls and gotchas. If you can't put meaningful constraints on the possible values — be absolutely certain you won't find things like new-lines or other semi-colon(;) characters as part of individual names, have quoted text, etc — you should really look into a dedicated delimited text parser. There are three (at least) built into the .Net Framework (see TextFieldParser as one option), and a plethora more on NuGet.
Look at String.Split
Returns a string array that contains the substrings in this instance that are delimited by elements of a specified string or Unicode character array.
For example:
string text = "abdullah ali ashonie; adefitri; candry";
string[] names = text.Split(';');
foreach (string name in names)
{
System.Console.WriteLine(name);
}
Outputs:
abdullah ali ashonie
adefitri
candry
There is some more information here too
I'm not 100% sure I completely understand what you're trying to do, but this is a basic string split example:
string input = "abdullah ali ashonie; adefitri; candry";
string[] pieces = input.Split(';');
foreach (var s in pieces) {
Console.WriteLine(s.Trim());
}
Fiddle here.
:barbosza!barbosza#barbosza.tmi.twitch.tv PRIVMSG #pggamesbr :My text
I want the part after the second ':', but i can't split by ':' because sometimes contains it too.
You can split and specify the maximum number of items, so that everything after the second colon ends up in the third item:
string[] parts = str.Split(new char[]{':'}, 3);
The part after the second colon is now in parts[2].
I guess that "he contains it too" means "My text contains it too".
In that case, do this
string toFind = "#pggamesbr :";
string myText = myString.Substring(myString.IndexOf(toFind) + toFind.Length);
I like Guffa's simple Split solution, and would go with that if this is all you need here. But, just for fun...
If you run into a lot of odd cases like this -- things you wish were easier to do with strings -- you can consider adding extension methods to handle them. E.g.,
using System;
public static MyStringExtentions
{
public static string After(this string orig, char delimiter)
{
int p = orig.indexOf(delimiter);
if (p == -1)
return string.Empty;
else
return orig.Substring(p + 1);
}
}
And then, in your existing code, as long as you have a using directive to include reference access to MyStringExtentions's definition:
string afterPart = myString.After(':').After(':');
Disclaimer: I didn't actually test this. Some tuning may be required. And it could probably be tuned to be more efficient, etc.
Again, this is probably overkill for this one problem. (See Guffa's perfectly good simple answer for that.) Just tossing it out for when you find yourself with lots of these and want a common way to make them available.
Ref. Extension Methods (C# Programming Guide)
I am reading a couple of csv files into var's as follows:
var myFullCsv = ReadFile(myFullCsvFilePath);
var masterCsv = ReadFile(csvFilePath);
Some of the line entries in each csv appear in both files and I am able to create a new var containing lines that exists in myFullCsv but not in masterCsv as follows:
var extraFilesCsv = myFullCsv.Except(masterCsv);
This is great because its very simple. However, I now wish to identify lines in myFullCsv where a specific string appears in the line. The string will correspond to one column of the csv data. I know that I can do this by reading each line of the var and splitting it up, then comparing the field I'm interested in to the string that I am searching for. However, this seems like a very long and inefficient approach as compared to my code above using the 'Except' command.
Is there some way that I can get the lines from myFullCsv with a very simple command or will I have to do it the long way? Please don't ask me to show the long way as that's what I am trying to avoid having to code although I can do it.
Sample csv data:
07801.jpg,67466,9452d316,\Folder1\FolderA\,
07802.jpg,78115,e50492d8,\Folder1\FolderB\,
07803.jpg,41486,37b6a100,\Folder1\FolderC\,
07804.jpg,93500,acdffc2b,\Folder2\FolderA\,
07805.jpg,67466,9452d316,\Folder2\FolderB\,
Sample desired output (I'm always looking for the entry in the 3rd column to match a string, in this case 9452d316):
07801.jpg,67466,9452d316,\Folder1\FolderA\,
07805.jpg,67466,9452d316,\Folder2\FolderB\,
Well you could use:
var results = myFullCsv.Where(line => line.Split(',')[2] == targetValue)
.ToList();
That's just doing the "splitting and checking" you mention in the question but it's pretty simple code. It could be more efficient if you only consider as far as the third comma, but I wouldn't worry about that until it's proved to be a problem.
Personally I'd probably parse each line to an object with meaningful properties rather than treating is as a string, but that's probably what you mean by "the long way".
Note that this doesn't perform any validation, or try to handle escaped commas, or lines with fewer columns etc. Depending on your data source, you may need to make it a lot more robust.
You could use a regex. It doesn't require every line to have at least 3 elements. It doesn't allocate a string array for each line. Therefore it may be faster, but you'd have to test it to prove it.
var regex = new Regex("^.+?,.+?," + Regex.Escape(targetValue) + ",");
var results = myFullCsv.Where(l => regex.IsMatch(l)).ToList();
My program reads a file which has thousands of lines of something like this below
"Timestamp","LiveStandby","Total1","Total2","Total3", etc..
each line is different
What is the best way to split by , and delete the "" as well as put the values in a list
this is what I have
while ((line = file.ReadLine()) != null)
{
List<string> title_list = new List<string>(line.Split(','));
}
the step above still missing the deletion of the quotes. I can do foreach but that kinda defeat the purpose of having List and Split in just 1 line. What is the best and smart way to do it?
The best way in my opinion is to use a library that parses CSV, such as FileHelpers.
Concretely, in your case, this would be the solution using the FileHelpers library:
Define a class that describes the structure of a record:
[DelimitedRecord(",")]
public class MyDataRecord
{
[FieldQuoted('"')]
public string TimeStamp;
[FieldQuoted('"')]
public string LiveStandby;
[FieldQuoted('"')]
public string Total1;
[FieldQuoted('"')]
public string Total2;
[FieldQuoted('"')]
public string Total3;
}
Use this code to parse the entire file:
var csvEngine = new FileHelperEngine<MyDataRecord>(Encoding.UTF8)
{
Options = { IgnoreFirstLines = 1, IgnoreEmptyLines = true }
};
var parsedItems = csvEngine.ReadFile(#"D:\myfile.csv");
Please note that this code is for illustration only and I have not compiled/run it. However, the library is pretty straightforward to use and there are good examples and documentation on the website.
Keeping it simple like this should work:
List<string> strings = new List<string>();
while ((line = file.ReadLine()) != null)
string.AddRange(line.Replace("\"").split(',').AsEnumerable());
I'm going to clarify this a bit. If you have a user formatted file that has a predictable format (ie the user has generated the data out of EXCEL or similar program) then you are way better off using an exising parser that is well tested.
Scenarios like the following are just a few examples that manual parsing will have problems with:
"column 1", 2, 0104400, $1,300, "This is an interestion question, he said"
.. and there are more with escaping, file formats etc that can be a headache for roll your own.
If you do that, then ensure you get one that can tollerate differences in columns per row as it can make a difference.
If, on the other hand, you know what's going into the data which is common in system generated files then using CSV parsers will cause more problems than they solve. For example, I have dealt with scenarios where the first part is fixed and can be strongly typed, but there are following parts in a row that are not. This can also happen if you're parsing flat file data in fixed width scenarios from legacy databases. A csv solution makes assumptions we don't want and is not the right solution in many of those cases.
If that is the case and you just want to strip out quotes after splitting on commas, then try a bit of linq. This can also be extended to replace specific characters you are worried about.
line.Split(',').Select(i => i.Replace("\"", "")).ToArray()
Hope that clears up all the conflicting advice.
You can use the Array.ConvertAll() function.
string line = "\"Timestamp\",\"LiveStandby\",\"Total1\",\"Total2\",\"Total3\"";
var list = new List<String>(Array.ConvertAll(line.Split(','), x=> x.Replace("\"","")));
Perform the Replace first, then Split into your List. Here's your code with Replace.
while ((line = file.ReadLine()) != null)
{
List<string> title_list = new List<string>(line.Replace("\"", "").Split(','));
}
Although, you're going to need a variable to hold all of the Lists, so look at using AddRange().