Regular Expression for allowing multiple language input - c#

Quick question regarding regular expression validation on textbox entry. Basically I have a textbox that I am using for user input in the form of a website address. The user can input anything (it doesn't have to be a valid website address - i.e. www.facebook.com. They could enter "blah blah", and that's fine but it will not run.
What I am after is to validate different languages, Arabic, Greek, Chinese, etc etc, because at present I only allow English characters.
The code for the method is below. I believe I will have to switch this from a whitelist to blacklist, so instead of seeing what matches, change the expression to invalid characters, and if the user enters one of these, don't allow it.
public static bool IsValidAddress(string path)
{
bool valid = false;
valid = (path.Length > 0);
if (valid)
{
string regexPattern = #"([0-9a-zA-Z*?]{1})([-0-9a-zA-Z_\.*?]{0,254})";
// Elimate the '"' character first up so it simplifies regular expressions.
valid = (path.Contains("\"") == false);
if (valid)
{
valid = IsValidAddress(path, regexPattern);
}
if (valid)
{
// Need an additional check to determine that the address does not begin with xn--,
// which is not permitted by the Internationalized Domain Name standard.
valid = (path.IndexOf("xn--") != 0);
}
}
return valid;
}
As you can see, I have the 0-9a-zA-Z included, but by default this will eliminate other languages, whereas I wish to include the languages.
Any help is greatly appreciated. If I've confused anyone, sorry! I can give more information if it is needed.
Thanks.

I don't know why you're trying to validate Uri's with Regex. .Net's Uri class is surely a much better match to your task, no?
Uri uri;
if(!Uri.TryParse(uriString, UriKind.Absolute, out uri))
{
//it's a bad URI
}

Related

How can I check if a string follows the pattern of american currency?

I'm working on a problem that wants me to get a string input from a user, run it through a method which will check every character to see if it follows the pattern of American currency. It has to be a string that goes into the method. the amount can be any where from 1 dollar to a thousand but must have the format entered as $x.xx, $xx.xx, $xxx.xx, as long as the user enters an amount that is consistent with the above formats then my program should output that its "valid" anything else would be a "invalid format" output. first character must be the '$' and I cannot use regex.
I get the user input and then validate it with .NullOrWhiteSpace. and then send the string value holding the user input down to my created method. from this point I have no idea how to continue. I've tried .ToCharArray, I have also tried making a long and complicated if statement and I have researched for a few hours now but can't find a solid way to write this out.
class Program
{
static void Main(string[] args)
{
Console.WriteLine("enter amount between $1.00 and $1000.00");
string valueUS = Console.ReadLine();
while (string.IsNullOrWhiteSpace(valueUS))
{
Console.WriteLine("Please enter in an amount");
valueUS = Console.ReadLine();
}
currencyChecker(valueUS);
}
public static string currencyChecker(string currencyString)
{
char[] currencyArray;
currencyArray = currencyString.ToCharArray();
for (int i = 0; i < currencyArray.Length; i++)
{
if (currencyArray[0] == '$')
{
}
}
return currencyString;
the method below should check every character entered by the user and verify that it matches the above described pattern for American currency and output that its "valid" anything else should be reported back as "invalid"
Usually, you would use a regular expression for something like this.
A simple regex for this would be ^\$\d+.\d\d$. Basically, it means the string should start with a $ sign, have at last one digit, a dot, and two more digits.
However, this can be done without regular expressions, and since it seems like a homework task, I'll give you a nudge in the right direction.
So you need to test the string starts with $, the char 3rd from the right is a ., and everything else are digits.
Your method should return a bool indicating valid / invalid results - so you should do something like this:
static bool IsCurrency(string currency)
{
// Check if the string is not null or empty - if not, return false
// check if the string is at least 5 chars long, since you need at least $x.xx - if not, return false
// Check if the first char is $ - if not, return false
// Check if the 3rd char from the end is . - if not, return false
// check if all the other chars are digits - if not, return false
// If all checks are valid -
return true;
}
Note that the order of the tests is critical, for instance if you check the 3rd digit from the right is a . before you check you have at least 5 digits, you might attempt to check a string that is only 2 digits long and get an exception.
Since this is (probably) homework I'm going to leave the code-writing part for you, so you would actually learn something from this.

How to limit which characters are permitted in strings?

I'm currently making a user class in c# that contains a first name, last name, username and email.
The username can only contain numbers [0-9], lower-case letters [a-z] and underscores '_'
The email can only contain [a-z], [A-Z], [0-9], as well as dot '.', comma ',', underscore '_' and hyphen '-'
How are these limitations set to strings in c#?
You could do this logic in the setters of the properties.
ex:
private string _firstName;
public string FirstName
{
get { return this._firstName; }
set
{
if (Regex.Match(value, YOUR_REGEX).Success)
this._firstName = value;
}
}
Sounds like the goal of this project is to get you to know how to use Regular Expressions. A good online resource for this which contains common patterns and testing is located at the Regular Expression Library
The simple email validation requirement you have can also be done with RegEx, but is actually incorrect as it will allow invalid addresses through; commas can be in an email address but they would need to be quoted which your pattern does not allow. The specifications for the local portion of an email address is probably one of the most poorly implemented standards on the internet.
For future reference, a cheat for checking email addresses within .Net is to use the MailAddress class of the System.Net.Mail namespace. You can use a try...catch routine to see if the submitted address can be converted. The only problems to be aware of using this is it will allow addresses through which most people would not consider to be valid even though they are; such as a server name without an extension (.com etc).
private bool isValidEmail (string emailAddress) {
bool ReturnValue;
try {
MailAddress ma = new MailAddress(emailAddress);
ReturnValue = true;
}
catch (Exception) { ReturnValue = false; }
return ReturnValue;
}

How to retrieve the locale(country) code from URL?

I have a URL, which is like http://example.com/UK/Deal.aspx?id=322
My target is to remove the locale(country) part, to make it like http://example.com/Deal.aspx?id=322
Since the URL may have other similar formats like: https://ssl.example.com/JP/Deal.aspx?id=735, using "substring" function is not a good idea.
What I can think about is to use the following method for separating them, and map them back later.
HttpContext.Current.Request.Url.Scheme
HttpContext.Current.Request.Url.Host
HttpContext.Current.Request.Url.AbsolutePath
HttpContext.Current.Request.Url.Query
And, suppose HttpContext.Current.Request.Url.AbsolutePath will be:
/UK/Deal.aspx?id=322
I am not sure how to deal with this since my boss asked me not to use "regular expression"(he thinks it will impact performance...)
Except "Regular Expression", is there any other way to remove UK from it?
p.s.: the UK part may be JP, DE, or other country code.
By the way, for USA, there is no country code, and the url will be http://example.com/Deal.aspx?id=322
Please also take this situation into consideration.
Thank you.
Assuming that you'll have TwoLetterCountryISOName in the Url. yYou can use UriBuilder class to remove the path from Uri without using the Regex.
E.g.
var originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
if (IsLocaleEnabled(sourceUri))
{
var builder = new UriBuilder(sourceUri);
builder.Path
= builder.Path.Replace(sourceUri.Segments[1] /* remove UK/ */, string.Empty);
// Construct the Uri with new path
Uri newUri = builder.Uri;;
}
Update:
// Cache the instance for performance benefits.
static readonly Regex regex = new Regex(#"^[aA-zZ]{2}\/$", RegexOptions.Compiled);
/// <summary>
/// Regex to check if Url segments have the 2 letter
/// ISO code as first ocurrance after root
/// </summary>
private bool IsLocaleEnabled(Uri sourceUri)
{
// Update: Compiled regex are way much faster than using non-compiled regex.
return regex.IsMatch(sourceUri.Segments[1]);
}
For performance benefits you must cache it (means keep it in static readonly field). There's no need to parse a pre-defined regex on every request. This way you'll get all the performance benefits you can get.
Result - http://example.com/Deal.aspx?id=322
It all depends on whether the country code always has the same position. If it's not, then some more details on the possible formats are required.. Maybe you could check, if the first segment has two chars or something, to be sure it really is a country code (not sure if this is reliable though). Or you start with the filename, if it's always in the format /[optionalCountryCode]/deal.aspx?...
How about these two approaches (on string level):
public string RemoveCountryCode()
{
Uri originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
string hostAndPort = originalUri.GetLeftPart(UriPartial.Authority);
// v1: if country code is always there, always has same position and always
// has format 'XX' this is definitely the easiest and fastest
string trimmedPathAndQuery = originalUri.PathAndQuery.Substring("/XX/".Length);
// v2: if country code is always there, always has same position but might
// not have a fixed format (e.g. XXX)
trimmedPathAndQuery = string.Join("/", originalUri.PathAndQuery.Split('/').Skip(2));
// in both cases you need to join it with the authority again
return string.Format("{0}/{1}", hostAndPort, trimmedPathAndQuery);
}
If the AbsolutePath will always have the format /XX/...pagename.aspx?id=### where XX is the two letter country code, then you can just strip off the first 3 characters.
Example that removes the first 3 characters:
var targetURL = HttpContext.Current.Request.Url.AbsolutePath.Substring(3);
If the country code could be different lengths, then you could find the index of the second / character and start the substring from there.
var sourceURL = HttpContext.Current.Request.Url.AbsolutePath;
var firstOccurance = sourceURL.IndexOf('/')
var secondOccurance = sourceURL.IndexOf('/', firstOccurance);
var targetURL = sourceURL.Substring(secondOccurance);
The easy way would be to treat as string, split it by the "/" separator, remove the fourth element, and then join them back with the "/" separator again:
string myURL = "https://ssl.example.com/JP/Deal.aspx?id=735";
List<string> myURLsplit = myURL.Split('/').ToList().RemoveAt(3);
myURL = string.Join("/", myURLsplit);
RESULT: https://ssl.example.com/Deal.aspx?id=735

Code an elegant way to strip strings

I am using C# and in one of the places i got list of all peoples names with their email id's in the format
name(email)\n
i just came with this sub string stuff just off my head. I am looking for more elegant, fast ( in the terms of access time, operations it performs), easy to remember line of code to do this.
string pattern = "jackal(jackal#gmail.com)";
string email = pattern.SubString(pattern.indexOf("("),pattern.LastIndexOf(")") - pattern.indexOf("("));
//extra
string email = pattern.Split('(',')')[1];
I think doing the above would do sequential access to each character until it finds the index of the character. Works ok now since name is short, but would struggle when having a large name ( hope people don't have one)
A dirty hack would be to let microsoft do it for you.
try
{
new MailAddress(input);
//valid
}
catch (Exception ex)
{
// invalid
}
I hope they would do a better job than a custom reg-ex.
Maintaining a custom reg-ex that takes care of everything might involve some effort.
Refer: MailAddress
Your format is actually very close to some supported formats.
Text within () are treated as comments, but if you replace ( with < and ) with > and get a supported format.
The second parameter in Substring() is the length of the string to take, not the ending index.
Your code should read:
string pattern = "jackal(jackal#gmail.com)";
int start = pattern.IndexOf("(") + 1;
int end = pattern.LastIndexOf(")");
string email = pattern.Substring(start, end - start);
Alternatively, have a look at Regular Expression to find a string included between two characters while EXCLUDING the delimiters

How to remove PROTOCOL from URI

how can I remove the protocol from URI? i.e. remove HTTP
You can use this the System.Uri class like this:
System.Uri uri = new Uri("http://stackoverflow.com/search?q=something");
string uriWithoutScheme = uri.Host + uri.PathAndQuery + uri.Fragment;
This will give you stackoverflow.com/search?q=something
Edit: this also works for about:blank :-)
The best (and to me most beautiful) way is to use the Uri class for parsing the string to an absolute URI and then use the GetComponents method with the correct UriComponents enumeration to remove the scheme:
Uri uri;
if (Uri.TryCreate("http://stackoverflow.com/...", UriKind.Absolute, out uri))
{
return uri.GetComponents(UriComponents.AbsoluteUri &~ UriComponents.Scheme, UriFormat.UriEscaped);
}
For further reference: the UriComponents enumeration is a decorated with the FlagsAttribute, so bitwise operations (eg. & and |) can be used on it. In this case the &~ removes the bits for UriComponents.Scheme from UriComponents.AbsoluteUri using the AND operator in combination with the bitwise complement operator.
In the general sense (not limiting to http/https), an (absolute) uri is always a scheme followed by a colon, followed by scheme-specific data. So the only safe thing to do is cut at the scheme:
string s = "http://stackoverflow.com/questions/4517240/";
int i = s.IndexOf(':');
if (i > 0) s = s.Substring(i + 1);
In the case of http and a few others you may also want to .TrimStart('/'), but this is not part of the scheme, and is not guaranteed to exist. Trivial example: about:blank.
You could use the RegEx for this. The below sample would meet your need.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="http://www.google.com";
string re1="((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))"; // HTTP URL 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String httpurl1=m.Groups[1].ToString();
Console.Write("("+httpurl1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
Let me know if this helps
It's not the most beautiful way, but try something like this:
var uri = new Uri("http://www.example.com");
var scheme = uri.Scheme;
var result = uri.ToString().SubString(scheme.Length + 3);
The above answers work in most cases, but IMO it's not a complete solution:
uri.Host + uri.PathAndQuery + uri.Fragment;
drops port if specified (e.g. http://www.example.com:8080/path/ becomes www.example.com/path/ )
uri.GetComponents(UriComponents.AbsoluteUri & ~UriComponents.Scheme, UriFormat.UriEscaped)
preserves ports and seems generally better, but in some cases, (which are most likely to be incorrect, but not impossible), I got some characters escaped that shouldn't.
In both cases we get '/' added at the end, so if your url is potentially sensitive to that difference, or you care how it looks, you need need to check if it was present before and if not TrimEnd it.
On top of that both of those solution throw exception if Uri is considered invalid, so if your url already doesn't have the 'schema' (e.g. www.example.com) the code above fails.
If you want something really generic and working for input over which you might not have control (e.g. user input), I'd probably stick to a simpler solution, e.g:
var endOfSchemaIdx = url.IndexOf("://");
if(endOfSchemaIdx != -1)
return url.Substring(endOfSchemaIdx+3);
return url;
You can also fetch the schema via a library like FLURL (doesn't throw exception on www.example.com) and look up the first occurrence of "url.Schema" + "://", then delete it if exists. I feel safer if the rest of the url is not processed by any library, unless that is your intention.

Categories

Resources