Splitting a string at certain differing character counts - c#

I have a semi-complicated file full of lines. These lines can fall under one of two formats:
Person
Company
The specification for a company line is like so:
10 Characters = company identification, record specification and status of company
22 Characters = Blank spaces (filler)
8 Characters = number of employees and length of company name
Max of 161 characters = Company name + "<" delimiter
And the specification for a person:
12 Characters = Parent company number, appointment date and type
12 Characters = unique reference number
1 Character = corporate indicator
7 Characters = Blank spaces (filler)
16 Characters = confirmed appointment date and resignation date
8 Characters = Postcode
8 Characters = Date of Birth
4 Characters = length of variable data
Max of 1125 characters = Variable data delimited by "<"
First, I need to test the 11 character to determine the type of string. Pseudo-code:
if (string.count(11) = " ")
{
ItsACompany();
}
else
{
ItsAPerson();
}
Then I need to do a custom count for every type of specification - so far, all I've found is a method to split strings every nth character, and reads to the end of the string. This is recursive and not what I need.
I need an option that allows n to change per specification, and allows me to select all characters between char n and char y. Does such a thing exist?

To extract a block of text from a string you could write an extension method like this
namespace StringExtension
{
public static class MyExtensions
{
public static string TakeBlock(this string input, int x, int y)
{
if(y > input.Length) y = input.Length;
if(x > y) x = y;
int length = y - (x-1);
return input.Substring(x-1, length);
}
}
}
And then you could call it from your main code with (supposing to be inside the method that extracts data for the company line)
string parentCompany = line.TakeBlock(1, 12);
string uniqueRef = line.TakeBlock(13,24);

Related

Split a string from last comma if character count exceed 20 C# [duplicate]

This question already has answers here:
First index before a position
(5 answers)
Closed 3 years ago.
I have a string which contains a address. Sometimes character count is exceed 20. Then I want to split it to two string from the last comma before coming 20.
string address = "address1, address2, address3, address4.";
This contains 39 characters which exceeds 20 character count.
Then I want to split it from comma that comes after address2
string addr1 = "address1, address2,";
string addr2 = "address3, address4.";
Updated :
What I have tried so far. This split from last comma in the string. This is not the correct way every time.
string address = rankList[k].ADDRESS;
if (address.Length > 20) {
int idx = address.LastIndexOf(',');
if (idx != -1)
{
Console.WriteLine(address.Substring(0, idx));
Console.WriteLine(address.Substring(idx + 1));
}
}
LastIndexOf
Reports the zero-based index of the last occurrence of a specified
string within the current String object. The search starts at a
specified character position and proceeds backward toward the
beginning of the string. A parameter specifies the type of comparison
to perform when searching for the specified string.
StartIndex parameter : The search starting position. The search proceeds from startIndex toward the beginning of this instance.
string address = "address1, address2, address3, address4.";
if (address.Length > 20)
{
var lastCommaPosition = address.LastIndexOf(',', 20);
var address1 = address.Substring(0, lastCommaPosition+1);
var address2 = address.Substring(lastCommaPosition+1, address.Length - (lastCommaPosition+1));
}

Is it possible to look at defined length in a string

Is there a way in c# to look at a defined number of char in a string?
meaning -
I have the following number
336-5010-0000-00-10 that needs to change to this number - 336-5993-0000-00-10
I only want to check if the 2nd segment of numbers are between 2999 and 6000
I am reading this data from a CSV, I just don't want this statement a million times
if (columns[0].Contains(“5010”)) columns[0] = columns[0].Replace(“5010”, #”5993”);
string test = "336-5010-0000-00-10";
string foo = Regex.Replace(test, #"(\d+\-)(2999|[3-5][0-9]{3}|6000)((-\d+){3})", "${1}5993${3}");
Alternative crazy LINQ answer:
string foo2 = string.Join("-", from number in test.Split('-').Select((str, index) => new { str = str, index = index })
let num = Convert.ToInt32(number.str)
select number.index == 1 && num >= 2999 && num <= 6000 ? "5993" : number.str);
http://dotnetfiddle.net/yzROoK
Edit: More obvious regex.
you want to
split the string on the '-' character into a string[] called stringParts
convert stringParts[1] to an integer
perform validation of the integer value from step 2
string[] stringParts = columns[0].Split('-');
int valueOfConcern = Convert.ToInt32(stringParts[1]);
if (valueOfConcern >= 2999 && valueOfConcern <= 6000)
{
//take your action
}
string numberStr = "336-5010-0000-00-10";
int number = int.Parse(numberStr.Substring(4, 4));
numberStr.Substring(4, 4) gives you only the four-digit number starting at the 4th character, which int.Parse() converts into a number.
Then you can test that number is between your specified range, replace it, etc.

How to find wide characters from the given input string?

How to find wide characters from the given input string(English letters)?
I have a business requirement to get last name(English letters) with max length of 12 by considering wide character( length 2) and normal character ( length 1). Based on that, input box should accept number of characters.
UPDATED:
If you are talking about asian characters (like Japanese 全角) then here is one way.
public static bool isZenkaku(string str)
{
int num = sjisEnc.GetByteCount(str);
return num == str.Length * 2;
}
You would use it like this:
string test = "testTEST!+亜+123!123";
var widechars = test.Where(c => isZenkaku(c.ToString())).ToList();
foreach (var c in widechars)
{
Console.WriteLine(c); //result is TEST!+亜123
}
I was looking into this a while ago and the String class's Length property tells you the number of characters not the number of bytes. You can do something where when the Length of the string is greater then 12 return the left 12 characters. There could be anything up to 24 bytes in the string.

C# , Substring How to access last elements of an array/string using substring

I am generating 35 strings which have the names ar15220110910, khwm20110910 and so on.
The string contains the name of the Id (ar152,KHWM), and the date (20110910). I want to extract the Id, date from the string and store it in a textfile called StatSummary.
My code statement is something like this
for( int 1= 0;i< filestoextract.count;1++)
{
// The filestoextract contains 35 strings
string extractname = filestoextract(i).ToString();
statSummary.writeline( extractname.substring(0,5) + "" +
extractname.substring(5,4) + "" + extractname.substring(9,2) + "" +
extractname.substring(11,2));
}
When the station has Id containing 5 letters, then this code executes correctly but when the station Id is KHWM or any other 4 letter name then the insertion is all messed up. I am running this inside a loop. So I have tried keeping the code as dynamic as possible. Could anyone help me to find a way without hardcoding it. For instance accessing the last 8 elements to get the date??? I have searched but am not able to find a way to do that.
For the last 8 digits, it's just:
extractname.Substring(extractname.Length-8)
oh, I'm sorry, and so for your code could be:
int l = extractname.Length;
statSummary.WriteLine(extractname.substring(0,l-8) + "" +
extractname.Substring(l-8,4) + "" + extractname.Substring(l-4,2) + "" +
extractname.Substring(l-2,2));
As your ID length isn't consistent, it would probably be a better option to extract the date (which is always going to be 8 chars) and then treat the remainder as your ID e.g.
UPDATED - more robust by actually calculating the length of the date based on the format. Also validates against the format to make sure you have parsed the data correctly.
var dateFormat = "yyyyMMdd"; // this could be pulled from app.config or some other config source
foreach (var file in filestoextract)
{
var dateStr = file.Substring(file.Length-dateFormat.Length);
if (ValidateDate(dateStr, dateFormat))
{
var id = file.Substring(0, file.Length - (dateFormat.Length+1));
// do something with data
}
else
{
// handle invalid filename
}
}
public bool ValidateDate(stirng date, string date_format)
{
try
{
DateTime.ParseExact(date, date_format, DateTimeFormatInfo.InvariantInfo);
}
catch
{
return false;
}
return true;
}
You could use a Regex :
match = Regex.Match ("khwm20110910","(?<code>.*)(?<date>.{6})" );
Console.WriteLine (match.Groups["code"] );
Console.WriteLine (match.Groups["date"] );
To explain the regex pattern (?<code>.*)(?<date>.{6}) the brackets groups creates a group for each pattern. ?<code> names the group so you can reference it easily.
The date group takes the last six characters of the string. . says take any character and {6} says do that six times.
The code group takes all the remaining characters. * says take as many characters as possible.
for each(string part in stringList)
{
int length = part.Length;
int start = length - 8;
string dateString = part.Substring(start, 8);
}
That should solve the variable length to get the date. The rest of the pull is most likely dependent on a pattern (suggested) or the length of string (when x then the call is 4 in length, etc)
If you ID isn't always the same amount of letters you should seperate the ID and the Date using ',' or somthing then you use this:
for( int 1= 0;i< filestoextract.count;1++)
{
string extractname = filestoextract[i].ToString();
string ID = extractname.substring(0, extractname.IndexOf(','));
string Date = extractname.substring(extractname.IndexOf(','));
Console.WriteLine(ID + Date);
}

Remove formatting from a string: "(123) 456-7890" => "1234567890"?

I have a string when a telephone number is inputted - there is a mask so it always looks like "(123) 456-7890" - I'd like to take the formatting out before saving it to the DB.
How can I do that?
One possibility using linq is:
string justDigits = new string(s.Where(c => char.IsDigit(c)).ToArray());
Adding the cleaner/shorter version thanks to craigmoliver
string justDigits = new string(s.Where(char.IsDigit).ToArray())
You can use a regular expression to remove all non-digit characters:
string phoneNumber = "(123) 456-7890";
phoneNumber = Regex.Replace(phoneNumber, #"[^\d]", "");
Then further on - depending on your requirements - you can either store the number as a string or as an integer. To convert the number to an integer type you will have the following options:
// throws if phoneNumber is null or cannot be parsed
long number = Int64.Parse(phoneNumber, NumberStyles.Integer, CultureInfo.InvariantCulture);
// same as Int64.Parse, but returns 0 if phoneNumber is null
number = Convert.ToInt64(phoneNumber);
// does not throw, but returns true on success
if (Int64.TryParse(phoneNumber, NumberStyles.Integer,
CultureInfo.InvariantCulture, out number))
{
// parse was successful
}
Since nobody did a for loop.
long GetPhoneNumber(string PhoneNumberText)
{
// Returns 0 on error
StringBuilder TempPhoneNumber = new StringBuilder(PhoneNumberText.Length);
for (int i=0;i<PhoneNumberText.Length;i++)
{
if (!char.IsDigit(PhoneNumberText[i]))
continue;
TempPhoneNumber.Append(PhoneNumberText[i]);
}
PhoneNumberText = TempPhoneNumber.ToString();
if (PhoneNumberText.Length == 0)
return 0;// No point trying to parse nothing
long PhoneNumber = 0;
if(!long.TryParse(PhoneNumberText,out PhoneNumber))
return 0; // Failed to parse string
return PhoneNumber;
}
used like this:
long phoneNumber = GetPhoneNumber("(123) 456-7890");
Update
As pr commented many countries do have zero's in the begining of the number, if you need to support that, then you have to return a string not a long. To change my code to do that do the following:
1) Change function return type from long to string.
2) Make the function return null instead of 0 on error
3) On successfull parse make it return PhoneNumberText
You can make it work for that number with the addition of a simple regex replacement, but I'd look out for higher initial digits. For example, (876) 543-2019 will overflow an integer variable.
string digits = Regex.Replace(formatted, #"\D", String.Empty, RegexOptions.Compiled);
Aside from all of the other correct answers, storing phone numbers as integers or otherwise stripping out formatting might be a bad idea.
Here are a couple considerations:
Users may provide international phone numbers that don't fit your expectations. See these examples So the usual groupings for standard US numbers wouldn't fit.
Users may NEED to provide an extension, eg (555) 555-5555 ext#343 The # key is actually on the dialer/phone, but can't be encoded in an integer. Users may also need to supply the * key.
Some devices allow you to insert pauses (usually with the character P), which may be necessary for extensions or menu systems, or dialing into certain phone systems (eg, overseas). These also can't be encoded as integers.
[EDIT]
It might be a good idea to store both an integer version and a string version in the database. Also, when storing strings, you could reduce all punctuation to whitespace using one of the methods noted above. A regular expression for this might be:
// (222) 222-2222 ext# 333 -> 222 222 2222 # 333
phoneString = Regex.Replace(phoneString, #"[^\d#*P]", " ");
// (222) 222-2222 ext# 333 -> 2222222222333 (information lost)
phoneNumber = Regex.Replace(phoneString, #"[^\d]", "");
// you could try to avoid losing "ext" strings as in (222) 222-2222 ext.333 thus:
phoneString = Regex.Replace(phoneString, #"ex\w+", "#");
phoneString = Regex.Replace(phoneString, #"[^\d#*P]", " ");
Try this:
string s = "(123) 456-7890";
UInt64 i = UInt64.Parse(
s.Replace("(","")
.Replace(")","")
.Replace(" ","")
.Replace("-",""));
You should be safe with this since the input is masked.
You could use a regular expression or you could loop over each character and use char.IsNumber function.
You would be better off using regular expressions. An int by definition is just a number, but you desire the formatting characters to make it a phone number, which is a string.
There are numerous posts about phone number validation, see A comprehensive regex for phone number validation for starters.
As many answers already mention, you need to strip out the non-digit characters first before trying to parse the number. You can do this using a regular expression.
Regex.Replace("(123) 456-7890", #"\D", String.Empty) // "1234567890"
However, note that the largest positive value int can hold is 2,147,483,647 so any number with an area code greater than 214 would cause an overflow. You're better off using long in this situation.
Leading zeros won't be a problem for North American numbers, as area codes cannot start with a zero or a one.
Alternative using Linq:
string phoneNumber = "(403) 259-7898";
var phoneStr = new string(phoneNumber.Where(i=> i >= 48 && i <= 57).ToArray());
This is basically a special case of C#: Removing common invalid characters from a string: improve this algorithm. Where your formatng incl. White space are treated as "bad characters"
'you can use module / inside sub main form VB.net
Public Function ClearFormat(ByVal Strinput As String) As String
Dim hasil As String
Dim Hrf As Char
For i = 0 To Strinput.Length - 1
Hrf = Strinput.Substring(i, 1)
If IsNumeric(Hrf) Then
hasil &= Hrf
End If
Next
Return Strinput
End Function
'you can call this function like this
' Phone= ClearFormat(Phone)
public static string DigitsOnly(this string phoneNumber)
{
return new string(
new[]
{
// phoneNumber[0], (
phoneNumber[1], // 6
phoneNumber[2], // 1
phoneNumber[3], // 7
// phoneNumber[4], )
// phoneNumber[5],
phoneNumber[6], // 8
phoneNumber[7], // 6
phoneNumber[8], // 7
// phoneNumber[9], -
phoneNumber[10], // 5
phoneNumber[11], // 3
phoneNumber[12], // 0
phoneNumber[13] // 9
});
}

Categories

Resources