this is maybe a dump question but I give it a try.
One of a common task is to import data from ascii files.
It's almost always the same beside the structure of the file.
Comma separated, line seperated, take 5 rows, take 12... whatever...
So it's always a different protocol/mapping but the same handling...
Is there a library for c# which helps to support this day-to-day scenario?
This is awesome: FileHelpers Library
Example:
File:
1732,Juan Perez,435.00,11-05-2002
554,Pedro Gomez,12342.30,06-02-2004
112,Ramiro Politti,0.00,01-02-2000
924,Pablo Ramirez,3321.30,24-11-2002
Create a class that maps your data.
[DelimitedRecord(",")]
public class Customer
{
public int CustId;
public string Name;
public decimal Balance;
[FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
public DateTime AddedDate;
}
And then parse using:
FileHelperEngine engine = new FileHelperEngine(typeof(Customer));
// To Read Use:
Customer[] res = engine.ReadFile("FileIn.txt") as Customer[];
// To Write Use:
engine.WriteFile("FileOut.txt", res);
Enumerate:
foreach (Customer cust in res)
{
Console.WriteLine("Customer Info:");
Console.WriteLine(cust.Name + " - " +
cust.AddedDate.ToString("dd/MM/yy"));
}
You may want to take a look at the FileHelpers library.
So the only thing those tasks have in common is reading a text file?
If FileHelpers is overkill for you (simple text data, etc.), standard .NET classes should be all you need (String.Split Method, Regex Class, StreamReader Class).
They provide reading delimited by characters (String.Split) or lines (StreamReader).
Related
I'm developing the android application that reads book from JSON format.In order to create such type of books i needed the desktop application due to comfortableness and i chose C#.
First of all i want to say that in my native language there are lots of chars that should be encoded in Unicode not in ASCII for example...
[ə ç ş ğ ö ü and so on]
My problem is that there is problem with Json for some char formats and i should use the instance of this chars.(Unicode instance).For instance:
string text = "asdsdas";
text = ConvertToUnicode(Text);//->/u231/u213/u123...
i tried many ways to achieve this in JavaScript but i couldn't. Now devs please help me to solve this problem in C#.Thanks in advance any suggestion would be okay for me :).
You can define an extension method:
public static class Extension {
public static string ToUnicodeString(this string str) {
StringBuilder sb = new StringBuilder();
foreach(var c in str) {
sb.Append("\\u" + ((int) c).ToString("X4"));
}
return sb.ToString();
}
}
which can be called like myString.ToUnicodeString()
Check it in this demo.
Do you know another more proper way to do the same things ?
string initialTemplate = "{0}-{1}";
string template = string.Format(initialTemplate, "first", "{0}");
string answer = string.Format(template, "second");
Also the following way has actually known, but in my current case unfortunatelyI can't use that method(i think that that way more proper and the logic more clear):
string initialTemplate = "{0}-{{0}}";
string template = string.Format(initialTemplate, "first");
string answer = string.Format(template, "second");
Maybe is there another hint how to do that?
UPDATE
I'm so sorry but from yours answers I've learnt that my question wasn't enough clear. So I've added a little bit more description.
My situation:
//that template is actually placed in *.resx file
//I want storing only one template and use that in different situations
public const string InitialTemplate = "{0}-{1}";
public static string GetMessage(string one, string two)
{
return string.Format(InitialTemplate, one, two);
}
public static string GetTemplate(string one)
{
return string.Format(InitialTemplate, one, "{0}");
}
//or morew universal way
public static string GetTemplate(params object[] args)
{
return string.Format(InitialTemplate, args, "{0}");
}
static void Main(string[] args)
{
//in almost all cases in my project i need to use string.format like this
string message = GetMessage("one", "two");
//but there is another way where i have to use
//the template have already been assigned first argument
//the result must be "one-{0}"
string getTemplateWithAssignedFirstArg = GetTemplate("one");
}
Do you know more proper way for that kind of situation ?
If you are using C# 6 you can also use string interpolation.
https://msdn.microsoft.com/en-us/library/dn961160.aspx
var answer = $"{firstVar}-{secondVar}";
string initialTemplate = "{0}-{1}";
string answer = string.Format(initialTemplate, "first", "second");
Should do the trick. Or cut out the middle man with:
string answer = string.Format("{0}-{1}", "first", "second");
String.Format is a very useful convenience, but I'd be wary of using it to build format strings that you're going to use to create other format strings. Someone trying to maintain that code, figure out what's going on, and perhaps modify it will be baffled. Using String.Format that way is technically possible, and there could even be scenarios where it's useful, but it's probably just going to result in something that works but is very difficult to understand and debug.
My first suggestion would be to use a StringBuilder. Even when you're appending to the StringBuilder you can use String.Format if needed to create the individual strings.
I wonder if perhaps what you describe in the question is taking place across multiple methods (which is why you might be building your format string in steps.) If that's the case, I recommend not building the string in steps like that. Don't actually start building the string until you have all of the data that you need together, and then build the string at once.
I'm trying to parse a CSV file from hell, using the fantastic FileHelpers library.
It's failing to handle a row of the form:
"TOYS R"" US"," INC.""",fld2,fld3,"<numberThousands>","<numberThousands>","<numberThousands>",fld7,
FileHelper is very good at handling number fields in 'thousands' format (using a custom formatter), even when wrapped in quotes, trailing commas etc, however it's causing issues with the first field.
"TOYS R"" US"," INC.""",fld2,...
This field includes both nested quotes and nested commas. FileHelper doesn't know how to handle this and is splitting it into two separate fields, which subsequently causes an exception to be thrown.
Are there any recommended ways to handle this?
First, you need to make all of your fields optionally quoted.
[DelimitedRecord(",")]
public class contactTemplate
{
[FieldQuoted('"', QuoteMode.OptionalForBoth)]
public string CompanyName;
[FieldQuoted('"', QuoteMode.OptionalForBoth)]
public string fld2;
// etc...
}
Then you need replace the escaped delimiters with something else (e.g., a single quote) in a BeforeReadRecord event.
var engine = new FileHelperEngine<MyFileHelpersSpec>();
engine.BeforeReadRecord += (sender, args) =>
args.RecordLine = args.RecordLine.Replace(#"""", "'");
I am trying the following stemming class :
static class StemmerSteps
{
public static string stepSufixremover(this string str, string suffex)
{
if (str.EndsWith(suffex))
{
................
}
return str;
}
public static string stepPrefixemover(this string str, string prefix)
{
if (str.StartsWith(prefix)
{
.....................
}
return str;
}
}
this class works with one prefix or suffix. is there any suggestion to allow a list of prefixes or suffixes to go through the class and compare against each (str). your kind action really appreciated.
Instead of creating your own class from scratch (unless this is homework) I would definitive use an existing library. This answer provides an example of code that that implements the Porter Stemming Algorithm:
https://stackoverflow.com/questions/7611455/how-to-perform-stemming-in-c
Put your suffix/prefixes in a collection (like a List<>), and loop through and apply each possible one. This collection would need to be passed into the method.
List<string> suffixes = ...;
for (suffix in suffixes)
if (str.EndsWith(suffix))
str = str.Remove(str.Length - suffix.Length, suffix.Length);
EDIT
Considering your comment:
"just want to look if the string starts-/endswith any of the passed strings"
may be something like this can fit your needs:
public static string stepSufixremover(this string str, IEnumerable<string> suffex)
{
string suf = suffex.Where(x=>str.EndsWith(x)).SingleOrDefault();
if(!string.IsNullOrEmpty(suf))
{
str = str.Remove(str.Length - suf.Length, suf.Length);
}
return str;
}
If you use this like:
"hello".stepone(new string[]{"lo","l"}).Dump();
it produces:
hel
The simplest code would involve regular expressions.
For example, this would identify some English suffixes:
'^(.*?)(ing|ly|ed|ious|ies|ive|es|s|ment)?$'
One problem is that stemming is not as accurate as lemmatization. Lematization would require POS tagging for accuracy. For example, you don't want to add an -ing suffix to dove if it's a noun.
Another problem is that some suffixes also require prefixes. For example, you must add en- to -rich- to add a -ment suffix in en-rich-ment -- unlike a root like -govern- where you can add the suffix without any prefix.
So at the moment our ERP/PSA software produces an EFT (Electronic Fund Transfer) .txt file which contains Bank and employee bank information which is then sent to the bank.
Problem is as follows the format to which the EFT File is currently being produced is US standard and not suitable to Canadian bank standards. But I have the required canadian bank standard format.
The format of the file is all about number of columns in a file and the number of characters they contain (if the data for the column doesnt reach the number of characters it is filled with spaces).
So I.e.
1011234567Joe,Bloggs 1234567
And for example lets say I try transform to Canadian Standard
A101Joe,Bloggs 1234567 1234567
Where for example "A" needs to be added to first line in the record.
I'm just wondering how to go about a task like this in C#
I.e.
Read in text file.
Line by Line parse data in terms of start and end of characters
Assign values to variables
Rebuild new file with these variables with different ordering and additional data
I don't have my IDE open so my syntax might be a tad off, but I'll try to point you in the right direction. Anyways, what fun would it be to give you the solution outright?
First you're going to want to get a list of lines:
IEnumerable<string> lines = text.Split('\n');
You said that the columns don't have delimiters but rather are of fixed widths, but you didn't mention where the columns sizes are defined. Generally, you're going to want to pull out the text of each column with
colText = line.Substring(startOfColumn, lengthOfColumn);
For each column you'll have to calculate startOfColumn and lengthOfColumn, depending on the positions and lengths of the columns.
Hopefully that's a good enough foundation for you to get started.
I think that your best bet is to create a class to hold the logical data that is present in the file and have methods in this class for parsing the data from a given format and saving it back to a given format.
For example, assume the following class:
public class EFTData
{
public string Name { get; set; }
public string RoutingNumber { get; set; }
public string AccountNumber { get; set; }
public string Id { get; set; }
public void FromUSFormat(string sLine)
{
this.Id = sLine.Substring(0, 3);
this.RoutingNumber = sLine.Substring(3, 7);
this.Name = sLine.Substring(10, 20);
this.AccountNumber = sLine.Substring(30, 7);
}
public string ToCanadianFormat()
{
var sbText = new System.Text.StringBuilder(100);
// Note that you can pad or trim fields as needed here
sbText.Append("A");
sbText.Append(this.Id);
sbText.Append(this.RoutingNumber);
sbText.Append(this.AccountNumber);
return sbText.ToString();
}
}
You can then read from a US file and write to a Canadian file as follows:
// Assume there is only a single line in the file
string sLineToProcess = System.IO.File.ReadAllText("usin.txt");
var oData = new EFTData();
// Parse the us data
oData.FromUSFormat(sLineToProcess);
// Write the canadian data
using (var oWriter = new StreamWriter("canout.txt"))
{
oWriter.Write(oData.ToCanadianFormat());
}
var lines = File.ReadAllLines(inputPath);
var results = new List<string>();
foreach (var line in lines)
{
results.Add(string.Format("A{0}", line));
}
File.WriteAllLines(outputPath, results.ToArray());