Resolving File Name Permutations - c#

I am attempting to import a .CSV file into my database which is a table export from an image management system. This system allows end-users to take images and (sometimes) split them into multiple images. There is a column in this report that signifies the file name of the image that I am tracking. If items are split in the image management system, the file name receives an underscore ("_") on the report. The previous file name is not kept. The way the items can possibly exist on the CSV are shown below:
Report 1 # 8:00AM: ABC.PNG
Report 2 # 8:30AM: ABC_1.PNG
ABC_2.PNG
Report 3 # 9:00AM: ABC_1_1.PNG
ABC_1_2.PNG
ABC_2_1.PNG
ABC_2_2.PNG
Report 4 # 9:30AM ABC_1_1_1.PNG
ABC_1_1_2.PNG
ABC_1_2.PNG
ABC_2_1.PNG
ABC_2_2.PNG
I am importing each file name into its own record. When an item is split, I would like to identify the previous version and update the original record, then add the new split record into my database. The key to knowing if an item is split is locating an underscore ("_").
I am not sure what I should do to recreate previous child names, I have to test every previous iteration of the file name to see if it exists. My problem is interpreting the current state of the file name and rebuilding all previous possibilities. I do not need the original name, only the first possible split name up until the current name. The code below shows kind of what I am getting at, but I am not sure how to do this cleanly.
String[] splitName = theStringToSplit.Split('_');
for (int i = 1; i < splitName.Length - 1; i++)
{
//should concat everything between 0 and i, not just 0 and I
//not sure if this is the best way or what I should do
MessageBox.Show(splitName[0] + "_" + splitName[i] + ".PNG");
}

The thing you are looking for is part of string.
So string.Join() might help you joining an array to a delimited string:
It also contains a parameter start index and number of items to use.
string[] s = new string[] { "2", "a", "b" };
string joined = string.Join("_", s, 0 ,3);
// joined will be "2_a_b"
Maybe you are using the wrong tool for you problem. If you want to keep the last "_", you may want to use LastIndexOf() or even Regular Expressions. Anyways: You should not unnecessarily rip of names and re-glue them. If done, do it cultrue invariant and not culture specific (there might be different interpretations of "-" or the low letter of "I".
string fnwithExt = "Abc_12_23.png";
string fn = System.IO.Path.GetFileName(fnwithExt);
int indexOf = fn.LastIndexOf('_');
string part1 = fn.Substring(0, indexOf-1);
string part2 = fn.Substring(indexOf+1);
string part3 = System.IO.Path.GetExtension(fnwithExt);
string original = System.IO.Path.ChangeExtension(part1 + "_"+ part2, part3);

Related

How can i analise millions of strings that merge into each other?

I have millions of strings, around 8GB worth of HEX; each string is 3.2kb in length.
Each of these strings contains multiple parts of data I need to extract.
This is an example of one such string:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ$GPGGA,104646.091,,,,,0,0,,,M,,M,,*41$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test2ÿÿ3ÿÿ4ÿÿ5ÿÿ6ÿÿ7ÿÿ8ÿÿ9ÿÿ:ÿÿ;ÿÿ<ÿÿ=ÿÿ>ÿÿ?ÿÿ#ÿÿAÿÿBÿÿCÿÿDÿÿEÿÿFÿÿGÿÿHÿÿIÿÿJÿÿ$GPGGA,104647.091,,,,,0,0,,,M,,M,,*40$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header TestKÿÿLÿÿMÿÿNÿÿOÿÿPÿÿQÿÿRÿÿSÿÿTÿÿUÿÿVÿÿWÿÿXÿÿYÿÿZÿÿ[ÿÿ\ÿÿ]ÿÿ^ÿÿ_ÿÿ`ÿÿaÿÿbÿÿcÿÿ$GPGGA,104648.091,,,,,0,0,,,M,,M,,*4F$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Testdÿÿeÿÿfÿÿgÿÿhÿÿiÿÿjÿÿkÿÿlÿÿmÿÿnÿÿoÿÿpÿÿqÿÿrÿÿsÿÿtÿÿuÿÿvÿÿwÿÿxÿÿyÿÿzÿÿ{ÿÿ|ÿÿ$GPGGA,104649.091,,,,,0,0,,,M,,M,,*4E$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test}ÿÿ~ÿÿ.ÿÿ€ÿÿ.ÿÿ‚ÿÿƒÿÿ„ÿÿ…ÿÿ†ÿÿ‡ÿÿˆÿÿ‰ÿÿŠÿÿ‹ÿÿŒÿÿ.ÿÿŽÿÿ.ÿÿ.ÿÿ‘ÿÿ’ÿÿ“ÿÿ”ÿÿ•ÿÿ$GPGGA,104650.091,,,,,0,0,,,M,,M,,*46$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Head
as you can see it is pretty much this repeated:
GPGGA,104644.091,,,,,0,0,,,M,,M,,*43$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ$GPGGA,104645.091,,,,,0,0,,,M,,M,,*42$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*32Header Test.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
I want to separate this string into two lists like this:
_GPSList
$GPGGA,104644.091,,,,,0,0,,,M,,M,,*43
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N*
$GPVTG,0.00,T,,M,0.00,N,0.00,K,N
_WavList
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ
32HeaderTest.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ.ÿÿ ÿÿ!ÿÿ"ÿÿ#ÿÿ$ÿÿ%ÿÿ&ÿÿ'ÿÿ(ÿÿ)ÿÿ*ÿÿ+ÿÿ,ÿÿ-ÿÿ.ÿÿ/ÿÿ0ÿÿ1ÿÿ
Issue 1:
This repetition isn't containing within a single string, it overflows into the next string. so if some data crosses the end and start of two strings how to I deal with that?
Issue 2: How do I analyse the string and extract only the parts I need?
The solution I'm providing is not a complete answer but more like an idea which might help you get what you want.
Everything else which I present is an assumption on my behalf.
//Assuming your data is stored in a file "yourdatafile"
//Splitting all the text on "$" assuming this will separate GPSData
string[] splittedstring = File.ReadAllText("yourdatafile").Split('$');
//I found an extra string lingering in the sample you provided
//because I splitted on "$", so you gotta take that into account
var GPSList = new List<string>();
var WAVList = new List<string>();
foreach (var str in splittedstring)
{
//So if the string contains "Header" we would want to separate it from GPS data
if (str.Contains("Header"))
{
string temp = str.Remove(str.IndexOf("Header"));
int indexOfAsterisk = temp.LastIndexOf("*");
string stringBeforeAsterisk = str.Substring(0, indexOfAsterisk + 1);
string stringAfterAsterisk = str.Replace(stringBeforeAsterisk, "");
WAVList.Add(stringAfterAsterisk);
GPSList.Add("$" + stringBeforeAsterisk);
}
else
GPSList.Add("$" + str);
}
This provides the exact output as you need, only exception is with that extra string. Also some non-standard characters might look like black blocks.

Read two lines of data from a text file

I have a text file which is as follows:
Ali
M*59*AB
John
M*68*B
Shirley
F*35*B
Peter
M*88*A
Fiona
F*55*O
Mary
F*46*B
How do I effectively read two lines of data from a text file and assign into variables where 1st line is name, and 2nd line is GENDERWEIGHTBLOODTYPE?
There are a lot of ways to accomplish this. All of them involve iterating through the lines of your text file.
Here's one such solution that plays on the new ValueTuple type available in C# 7.
string path = "file path here";
Dictionary<string, (string Gender, string Weight, string BloodType)> records =
new Dictionary<string, (string Gender, string Weight, string BloodType)>();
Stack<string> stack =
new Stack<string>();
foreach (string line in File.ReadLines(path))
{
if (stack.Count != 1)
{
stack.Push(line);
continue;
}
string[] fields =
line.Split('*');
records.Add(
stack.Pop(),
(Gender: fields[0],
Weight: fields[1],
BloodType: fields[2]));
}
This snippet streams lines from the file one at a time. First it pushes the name line onto a stack. Once there's a name on the stack, the next loop pops it off, parses the current line for record information, and adds it all to the records dictionary using the name as a key.
While this solution will get you started, there are some obvious areas in which you can improve it's robustness with some insight into your data environment.
For example, this solution doesn't handle cases where either the name or the record information may be missing, nor does it handle the case where the record information may not have all three fields.
You should think carefully about how to handle such cases in your implementing code.

Can't get Messagebox to display List

I'm trying to have a MessageBox appear that shows the changelog inside my C# program
This is the text file.
Current Version 0.2.3.4
Added Hash decoder
Attempted to change code into OOP design
Cleaned up random code with ReSharper
Version 0.1.3.4 - 8/29/2016
No change logs before this point
The goal is to get the text between Current Version 0.2.3.4 and Version 0.1.3.4 - 8/29/2016
I've had tried doing this with the code below
Regex changeLogMatch = new Regex("Current Version\\s.*?\\n(.*?\\n)+Version\\s.*?\\s\\-\\s\\d");
Match changeLogInfo = changeLogMatch.Match(changeLog);
int changeLogCount = Regex.Matches(changeLog, "Current Version\\s.*?\\n(.*?\\n)+Version\\s.*?\\s\\-\\s\\d").Count;
List<string> changeLogList = new List<string>();
for (int i = 0; i < changeLogCount; i++)
{
changeLogList.Add(changeLogInfo.Groups[1].Captures[i].ToString());
}
string changeLogString = string.Join(Environment.NewLine, changeLogList);
Console.WriteLine(changeLogString);
MessageBox.Show("New Changes" + Environment.NewLine + changeLogString
, "New Version Found: " + newVersion);
The issue I'm having is that changeLogString only displays Added Hash decoder and nothing else.
Any ideas on what I'm doing wrong?
In your case changeLogCount always be 1. So in changeLogList will be always changeLogInfo.Groups[1].Captures[0].ToString() what is refers to Added Hash decoder string.
You are checking for "Current Version\\s.*?\\n((.*?\\n)+)Version\\s.*?\\s\\-\\s\\d" regex, it is matching the whole string and matches 1 time. But the first group (.*?\\n) matches 3 times. So, if you are checking for count of matches of full regex - you will get 1, if you want to get number of captures of first group - you will get 3.
So you should fix your code in the following manner:
Regex changeLogMatch = new Regex("Current Version\\s.*?\\n(.*?\\n)+Version\\s.*?\\s\\-\\s\\d");
Match changeLogInfo = changeLogMatch.Match(changeLog);
string changeLogString = string.Join(Environment.NewLine, changeLogInfo.Groups[1].Captures.OfType<Capture>());
Console.WriteLine(changeLogString);
Note, that you have no need to iterate through captures - the required string will be stored in changeLogString.

How to avoid false separators in csv / XML

I've been trying to understand how XML and CSV parsing work, without actually writing any code yet. I might have to parse a .csv file in the ongoing project and I'd like to be ready. (I'll have to convert them to .ofx files)
I'm also aware there's probably a thousand XLM and csv parsers out there, so I'm more curious than I am worried. I intend on using the XMLReader that I believe microsoft provides.
Let's say I have the following .csv file
02/02/2016 ; myfirstname ; mylastname ; somefield ; 321654 ; commentary ; blabla
Sometimes a field will be missing. Which means, for the sake of the example, that the lastname isn't mandatory, and somefield could be right after the first name.
My questions are :
How do I avoid the confusion between somefield and lastname?
I could count the total number of fields, but in my situation two are optional, if there is only one missing, I can't be sure which one it is.
How do I avoid false "tags"? I mean, if the user first comment includes a ;, how can I be sure it's a part of his comment and not the start of the following tag?
Again, I could count the remaining fields and find out where I am, but that excludes the optional fields problem.
My questions also apply to XML, what can I do if the user starts writing XML in his form ? Wether I decide to export the form as .csv or .xml, there can be trouble.
Right now I'm on the assumption that the c# Xml reader/parser are awesome enough to deal with it ; and if they are, I'm really curious on the how.
Assuming the CSV/XML data has been exported properly, none of this will be a problem. Missing fields will be handled by repeated separators:
02/02/2016;myfirstname;;somefield
Semi-colons within a field will normally be handled by quoting:
02/02/2016;"myfirst;name";
Quotes are escaped within a string:
02/02/2016;"my""first""name";
With XML it's even less of an issue since the tags or attributes will all have names.
If your CSV data is NOT well-formed, then you have a much bigger problem, as it may be impossible to distinguish missing fields and non-quoted separators.
How do I avoid false "tags"? String values should be quoted if the (can) contain separator characters. If you create the CSV file, quote and unquote all string values.
How do I avoid the confusion between somefield and lastname? No general solution for this, all case must be handled one by one. Can a general algorithm decide wheather first name or last name is missing? No.
If you know what field(s) can be omitted, you can write an "intelligent" handling.
Use XML and all of your problem will be solved.
Fisrt
How do I avoid the confusion between somefield and lastname?
There is no way to do this without change the logic of file. For example: when "mylastname" is empty You may have a "" value, empty string or like this ;;
How do I avoid false "tags"? I mean, if the user first comment includes a ;, how can I be sure it's a part of his comment and not the start of the following tag?
It is simple you have to file like this:
; - separor of columns
"" - delimetr of columns
value;value;"value;;;;value";value
To split this only for separtor ; without the separator in "" this code do this is tested and compiled
public static string[] SplitWithDelimeter(this string line, char separator, char checkSeparator, bool eraseCheckSeparator)
{
var separatorsIndexes = new List<int>();
var open = false;
for (var i = 0; i < line.Length; i++)
{
if (line[i] == checkSeparator)
{
open = !open;
}
if (!open && line[i] == separator )
{
separatorsIndexes.Add(i);
}
}
separatorsIndexes.Add(line.Length);
var result = new string[separatorsIndexes.Count];
var first = 0;
for (var j = 0; j < separatorsIndexes.Count; j++)
{
var tempLine = line.Substring(first, separatorsIndexes[j] - first);
result[j] = eraseCheckSeparator ? tempLine.Replace(checkSeparator, ' ').Trim() : tempLine;
first = separatorsIndexes[j] + 1;
}
return result;
}
Return would be:
value
value
"value;;;;value"
value

Comma Separated text file to Generic List

Having a bit of trouble with converting a comma separated text file to a generic list. I have a class (called "Customers") defined with the following attributes:
Name (string)
City (string)
Balance (double)
CardNumber (int)
The values will be stored in a text file in this format: Name,City, Balance, CarNumber e.g. John,Memphis,10,200789. There will be multiple lines of this. What I want to do is have each line be placed in a list item when the user clicks a button.
I've worked out I can break each line up using the .Split() method, but have no idea how to make the correct value go into the correct attribute of the list. (Please note: I know how to use get/set properties, and I am not allowed to use LINQ to solve the problem).
Any help appreciated, as I am only learning and have been working on this for a for while with no luck. Thanks
EDIT:
Sorry, it appears I'm not making myself clear. I know how to use .add.
If I have two lines in the text file:
A,B,1,2 and
C,D,3,4
What I don't know how to do is make the name "field" in the list item in position 0 equal "A", and the name "field" in the item in position 1 equal "C" and so on.
Sorry for the poor use of terminology, I'm only learning. Hope you understand what I'm asking (I'm sure it's really easy to do once you know)
The result of string.Split will give you an array of strings:
string[] lineValues = line.Split(',');
You can access values in an array by index:
string name = lineValues[0];
string city = lineValues[1];
You can convert strings to double or int using their respective Parse methods:
double balance = double.Parse(lineValues[2]);
int cardNumber = int.Parse(lineValues[3]);
You can instantiate the class and assign to it very simply:
Customer customerForCurrentLine = new Customer()
{
Name = name,
City = city,
Balance = balance,
CardNumber = cardNumber,
};
Simply loop over the lines, instantiate a Customer for that line, and add it to a variable you've created of the type List<Customer>
If you want your program to be bulletproof, you're going to have to do a lot of checking to skip over lines that don't have enough values, or that would fail to parse to the correct number type. For example, check lineValues.Length == 4 and use int.TryParse(...) and double.TryParse(...).
Read a file and split its text based on newline character. Then for total line count run a loop that will split based on comma and create a new object and insert values in its properties and add that object to a list.
This way
List<Customers> lst = new List<Customers>();
string[] str = System.IO.File.ReadAllText(#"C:\CutomersFile.txt")
.Split(new string[] { Environment.NewLine },
StringSplitOptions.None);
for (int i = 0; i < str.Length; i++)
{
string[] s = str[i].Split(',');
Customers c = new Customers();
c.Name = s[0];
c.City = s[1];
c.Balance = Convert.ToDouble(s[2]);
c.CardNumber = Convert.ToInt32(s[3]);
lst.Add(c);
}
BTW class name should be Customer and not Customers
Split() generates an array of strings in the order they appeared in the source string. Thus, if your name field is the first column in the CSV file, it will always be the first index in the array.
someCustomer.Name = splitResult[0];
And so on. You'll also need to investigate String.TryParse for your class's numerically typed properties.

Categories

Resources