I need to demilitarise text by a single character, a comma. But I want to only use that comma as a delimiter if it is not encapsulated by quotation marks.
An example:
Method,value1,value2
Would contain three values: Method, value1 and value2
But:
Method,"value1,value2"
Would contain two values: Method and "value1,value2"
I'm not really sure how to go about this as when splitting a string I would use:
String.Split(',');
But that would demilitarise based on ALL commas. Is this possible without getting overly complicated and having to manually check every character of the string.
Thanks in advance
Copied from my comment: Use an available csv parser like VisualBasic.FileIO.TextFieldParser or this or this.
As requested, here is an example for the TextFieldParser:
var allLineFields = new List<string[]>();
string sampleText = "Method,\"value1,value2\"";
var reader = new System.IO.StringReader(sampleText);
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
{
parser.Delimiters = new string[] { "," };
parser.HasFieldsEnclosedInQuotes = true; // <--- !!!
string[] fields;
while ((fields = parser.ReadFields()) != null)
{
allLineFields.Add(fields);
}
}
This list now contains a single string[] with two strings. I have used a StringReader because this sample uses a string, if the source is a file use a StreamReader(f.e. via File.OpenText).
You can try Regex.Split() to split the data up using the pattern
",|(\"[^\"]*\")"
This will split by commas and by characters within quotes.
Code Sample:
using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string data = "Method,\"value1,value2\",Method2";
string[] pieces = Regex.Split(data, ",|(\"[^\"]*\")").Where(exp => !String.IsNullOrEmpty(exp)).ToArray();
foreach (string piece in pieces)
{
Console.WriteLine(piece);
}
}
}
Results:
Method
"value1,value2"
Method2
Demo
Related
I'm currently trying to read a large piece of text saved under the string TestFile, shown below. I then want to go through each line and add each word split by the delimiters to the list. I've managed to do this successfully for an array (with very different code of course), however I've not used lists before and I'm also trying to incorporate classes and methods into my coding. I'm aware the below wont work as the object isn't returning any value from the method(?), however I don't know how that's meant to be done. Any help or some good resources on lists or objects would be very much appreciated.
using System;
using System.IO;
namespace Summarizer
{
class Reader
{
public static void Main(string[] args)
{
string TestFile = #"C:\Users\ollie\Desktop\Test\Target\PracticeHolmes.txt";
ReadList TestRead = new ReadList(TestFile);
}
}
class ReadList
{
private string line;
private char[] delimiters = { ' ', '.', ',', '!', '?' };
public ReadList(string TestFile)
{
StreamReader tr = new StreamReader(TestFile);
while ((line = tr.ReadLine()) != null)
{
line.Split(delimiters);
TestRead.Add(line);
}
}
}
You can read your TestFile into a list of words with one line of code via a LINQ expression:
using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
class ListReader
{
private char[] delimiters = { ' ', '.', ',', '!', '?' };
public List<string> Read(string fileName) =>
File.ReadLines(fileName).SelectMany(l => l.Split(delimiters)).ToList();
}
And to use the ListReader class, do:
var list = new ListReader().Read(TestFile);
Notes:
An array T [] represents a list of fixed size. Once allocated you can replace items in the array but you cannot expand or shrink the array.
A List<T> represents a resizable list. You would use a list to represent a sequential collection of objects that can be accessed by index and to which you might later want to add or remove items.
File.ReadLines() enumerates through all the lines in a text file without loading the entire file into memory. It thus does the same thing as your while ((line = tr.ReadLine()) != null) loop in fewer lines of code.
SelectMany() and ToList() are extension methods from the System.Linq namespace, and allow projecting an enumeration into a enumeration of enumerations then flattening the result (here, projecting each line into an array of words, then flattening that into a sequence of words) and then materializing the enumeration into a list.
If you only needed the distinct words (i.e. the list of words with duplicates removed) you could use the .Distinct() extension method like so:
public List<string> ReadDistinct(string fileName) =>
File.ReadLines(fileName).SelectMany(l => l.Split(delimiters)).Distinct().ToList();
There is a more convenient way to read lines from a text file,
File.ReadAllLines(TestFile)
Also, don't forget to store the words somewhere, i.e. store the returned value from line.Split(delimiters)
Here is a complete example:
string TestFile = #"C:\Users\ollie\Desktop\Test\Target\PracticeHolmes.txt";
var lines = File.ReadAllLines(TestFile);
var words = new List<string>();
char[] delimiters = { ' ', '.', ',', '!', '?' };
foreach (var line in lines)
{
words.AddRange(line.Split(delimiters));
}
As for the "using objects and methods" part of the question, you can construct a class that handles what you aim to do. Lets call it Reader. Now, what do you want the reader to do for you? The reader should be able to take some input (the file or file path) and produce some output (the parsed words). You can add a method that does precisely that, lets call it Read. With that, your code could look something like this:
class Reader
{
string[] Read(string filePath)
{
// construct and return the words;
}
}
Fast way to replace text in text file.
From this: somename#somedomain.com:hello_world
To This: somename:hello_world
It needs to be FAST and support multiple lines of text file.
I tried spiting the string into three parts but it seems slow. Example in the code below.
<pre><code>
public static void Conversion()
{
List<string> list = File.ReadAllLines("ETU/Tut.txt").ToList();
Console.WriteLine("Please wait, converting in progress !");
foreach (string combination in list)
{
if (combination.Contains("#"))
{
write: try
{
using (StreamWriter sw = new
StreamWriter("ETU/UPCombination.txt", true))
{
sw.WriteLine(combination.Split('#', ':')[0] + ":"
+ combination.Split('#', ':')[2]);
}
}
catch
{
goto write;
}
}
else
{
Console.WriteLine("At least one line doesn't contain #");
}
}
}</code></pre>
So a fast way to convert every line in text file from
somename#somedomain.com:hello_world
To: somename:hello_world
then save it different text file.
!Remember the domain bit always changes!
Most likely not the fastest, but it is pretty fast with an expression similar to,
#[^:]+
and replace that with an empty string.
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"#[^:]+";
string substitution = #"";
string input = #"somename#somedomain.com:hello_world1
somename#some_other_domain.com:hello_world2";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
I'm in the process of needing to parse a file who's records are of the following format:
mr
Sean r.
Farrow
4 The Crescent
Eastleake
Loughborough
Leicestershire
LE12 6QH
01509 59213
07525945447
sean.farrow#seanfarrow.co.uk
Each record is delimited by a blank line to finish. The two phone numbers and email address are optional.
What is the best way of parsing this sort of record? I could write my own parser, but am hoping I don't have to!
FileHelpers expects each record to end with a new line, so you'd have to pre-parse the input before passing it the engine. That's straightforward to do though - something like:
var lines = File.ReadAllLines(pathToImportFile);
var sb = new StringBuilder();
var separator = ","; // use a comma as field delimiter
foreach (string line in lines)
{
if (String.IsNullOrEmpty(line))
sb.AppendLine(""); // convert empty lines into line feeds
else
sb.AppendFormat("\"{0}\"{1}", line, separator); // put quotes around the field to avoid problems with nested separators
}
var engine = new FileHelperEngine<MyClass>();
engine.ReadString(sb.ToString());
and your class would look something like
[DelimitedRecord(",")]
class MyClass
{
[FieldQuoted(QuoteMode.AlwaysQuoted)]
public string Title;
[FieldQuoted(QuoteMode.AlwaysQuoted)]
public string FullName;
[FieldQuoted(QuoteMode.AlwaysQuoted)]
public string Address1;
/// ... etc
}
I want to read 4-5 CSV files in some array in C#
I know that this question is been asked and I have gone through them...
But my use of CSVs is too much simpler for that...
I have csv fiels with columns of following data types....
string , string
These strings are without ',' so no tension...
That's it. And they aren't much big. Only about 20 records in each.
I just want to read them into array of C#....
Is there any very very simple and direct way to do that?
To read the file, use
TextReader reader = File.OpenText(filename);
To read a line:
string line = reader.ReadLine()
then
string[] tokens = line.Split(',');
to separate them.
By using a loop around the two last example lines, you could add each array of tokens into a list, if that's what you need.
This one includes the quotes & commas in fields. (assumes you're doing a line at a time)
using Microsoft.VisualBasic.FileIO; //For TextFieldParser
// blah blah blah
StringReader csv_reader = new StringReader(csv_line);
TextFieldParser csv_parser = new TextFieldParser(csv_reader);
csv_parser.SetDelimiters(",");
csv_parser.HasFieldsEnclosedInQuotes = true;
string[] csv_array = csv_parser.ReadFields();
Here is a simple way to get a CSV content to an array of strings. The CSV file can have double quotes, carriage return line feeds and the delimiter is a comma.
Here are the libraries that you need:
System.IO;
System.Collection.Generic;
System.IO is for FileStream and StreamReader class to access your file. Both classes implement the IDisposable interface, so you can use the using statements to close your streams. (example below)
System.Collection.Generic namespace is for collections, such as IList,List, and ArrayList, etc... In this example, we'll use the List class, because Lists are better than Arrays in my honest opinion. However, before I return our outbound variable, i'll call the .ToArray() member method to return the array.
There are many ways to get content from your file, I personally prefer to use a while(condition) loop to iterate over the contents. In the condition clause, use !lReader.EndOfStream. While not end of stream, continue iterating over the file.
public string[] GetCsvContent(string iFileName)
{
List<string> oCsvContent = new List<string>();
using (FileStream lFileStream =
new FileStream(iFilename, FileMode.Open, FileAccess.Read))
{
StringBuilder lFileContent = new StringBuilder();
using (StreamReader lReader = new StreamReader(lFileStream))
{
// flag if a double quote is found
bool lContainsDoubleQuotes = false;
// a string for the csv value
string lCsvValue = "";
// loop through the file until you read the end
while (!lReader.EndOfStream)
{
// stores each line in a variable
string lCsvLine = lReader.ReadLine();
// for each character in the line...
foreach (char lLetter in lCsvLine)
{
// check if the character is a double quote
if (lLetter == '"')
{
if (!lContainsDoubleQuotes)
{
lContainsDoubleQuotes = true;
}
else
{
lContainsDoubleQuotes = false;
}
}
// if we come across a comma
// AND it's not within a double quote..
if (lLetter == ',' && !lContainsDoubleQuotes)
{
// add our string to the array
oCsvContent.Add(lCsvValue);
// null out our string
lCsvValue = "";
}
else
{
// add the character to our string
lCsvValue += lLetter;
}
}
}
}
}
return oCsvContent.ToArray();
}
Hope this helps! Very easy and very quick.
Cheers!
I have to port some C# code to Java and I am having some trouble converting a string splitting command.
While the actual regex is still correct, when splitting in C# the regex tokens are part of the resulting string[], but in Java the regex tokens are removed.
What is the easiest way to keep the split-on tokens?
Here is an example of C# code that works the way I want it:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
String[] values = Regex.Split("5+10", #"([\+\-\*\(\)\^\\/])");
foreach (String value in values)
Console.WriteLine(value);
}
}
Produces:
5
+
10
I don't know how C# does it, but to accomplish it in Java, you'll have to approximate it. Look at how this code does it:
public String[] split(String text) {
if (text == null) {
text = "";
}
int last_match = 0;
LinkedList<String> splitted = new LinkedList<String>();
Matcher m = this.pattern.matcher(text);
// Iterate trough each match
while (m.find()) {
// Text since last match
splitted.add(text.substring(last_match,m.start()));
// The delimiter itself
if (this.keep_delimiters) {
splitted.add(m.group());
}
last_match = m.end();
}
// Trailing text
splitted.add(text.substring(last_match));
return splitted.toArray(new String[splitted.size()]);
}
This is because you are capturing the split token. C# takes this as a hint that you wish to retain the token itself as a member of the resulting array. Java does not support this.