Convert emailaddress to valid path - c#

I created an winforms application, which creates a small SQLite database for every user. The database name contains the e-mailaddress of the user.
Scenario:
a user with e-mailaddress: test#test.com
uses the application, the following database will be created:
test#test.com.sqlite
I know that there are scenario's that the e-mailaddress converted to a path, will not be converted to a valid path.
the following characters are reserved by path: / \ ? % * : | " < > . and a space
I'm currently using the next method to filter the e-mailaddress:
public string ReturnFilteredUsername(string username)
{
username = username.Replace("<", "x"); username = username.Replace(">", "x"); username = username.Replace(":", "x"); username = username.Replace("\"", "x");
username = username.Replace("/", "x"); username = username.Replace(#"\", "x"); username = username.Replace("|", "x"); username = username.Replace("?", "x");
username = username.Replace("*", "x");
return username;
}
All reserved characters get replaced by an "x" at the moment, I want to convert all invalid characters to "%20". It is possible for me to just replace the "x" with "%20", but I don't think that's a 'clean' method
Can somebody come up with a more 'clean' method?
Thanks in advance!

Rather than using the user's email to name the database, identify each user by a numeric ID, and use that to name the databases.
In addition to always being valid as a file name, this allows users to change their email address without you having to worry about still pointing to the correct database.
Alternately, if you're set on using email addresses for the names, see this answer about which characters are valid in an email address. As long as all of the email address are valid, you only need to worry about escaping those characters that are both valid for emails and invalid for path.
I'd suggest either using a valid-for-path and invalid-for-email character to start an escape sequence (different sequence for each character you'll have to escape), or selecting a less common character valid for both, and using that as an escape character (remembering to escape it as well!).
Example:
Both % and ? are listed as valid for emails and invalid for paths. Assuming we use & to escape (valid for both!), we would create a mapping like this:
"&" = &00;
"%" = &01;
"?" = &02;
You would go through each email address and replace each occurence of invalid characters with their escaped equivalent, making it both as unique as the email address and safe as a path anme.
"a&b?100%#example.com" would becomee "a&00;b&02;100&01;#example.com".

Related

Regex for getting domain and subdomain in C#

I am having a requirement to correctly get the domain/subdomain based on the current url. this is required in order to correctly fetch the data from database and further call web api with correct parameters.
In perticular, I am facing issues with local and production urls. for ex.
In local, i have
http://sample.local.example.com
http://test.dev.example.com
In production, i have
http://client.example.com
http://program.live.example.com
i need
Subdomain as: sample / test / client / program
Domain as: exmpale
So far i tried to use c# with following code to identify the same. It works fine on my local but i am sure this will create an issue on production at some point of time. Basically, for Subdomain, get the first part and for Domain, get the last part before ''.com''
var host = Request.Url.Host;
var domains = host.Split('.');
var subDomain = domains[0];
string mainDomain = string.Empty;
#if DEBUG
mainDomain = domains[2];
#else
mainDomain = domains[1];
#endif
return Tuple.Create(mainDomain, subDomain);
Instead of a regex, I think Linq should help your here. Try:
public static (string, string) GetDomains(Uri url)
{
var domains = url.Host.Substring(0, url.Host.LastIndexOf(".")).Split('.');
var subDomain = string.Join("/", domains.Take(domains.Length - 1));
var mainDomain = domains.Last();
return (mainDomain, subDomain);
}
output for "http://program.live.example.com"
example
program/live
Try it Online!
This regex should work for you:
Match match = Regex.Match(temp, #"http://(\w+)\.?.*\.(\w+).com$");
string subdomain = match.Groups[1].Value;
string domain = match.Groups[2].Value;
http://(\w+)\. matches 1 or more word characters as group 1 before a dot and after http://
.* matches zero or more occurences of any character
\.(\w+).com matches 1 or more word characters as group 2 before .com and after a dot
$ specifies the end of the string
\.? makes the dot optional to catch the case if there is nothing between group 1 and 2 like in http://client.example.com
You are doing the right and you can get the domain name as the second last value in the array.
var host = Request.Url.Host;
var domains = host.Split('.');
string subDomain = domains[0].Split('/')[2];
string mainDomain = domains[domains.Length-2];
return Tuple.Create(mainDomain, subDomain);
If you want all the subdomains you can put a loop here.

In c# I'm looking for a regex that will only match the first occurrence

In the following piece of code (C#), I would like to replace the value of the Order and GUID in the ContentType annotation:
[ContentType(
DisplayName = "My First Block",
Order = 133536,
GUID = "0f02e38a-a6e2-4333-9bd1-c61cf573d8d3",
Description = "Just an example block.",
GroupName = "Blocks.Content"
)]
public class MyFirstBlock : BaseBlock
{
[CultureSpecific]
[Display(
Name = "Title",
Order = 100,
Description = "The title",
GroupName = "Information")]
[Required]
public virtual XhtmlString Title { get; set; }
}
I'm using the following regular expressions to find the values:
Order: (?<=Order = )\d{4,}(?=[,)])
GUID: (?<=GUID = \").*(?=\")
And these work but they have some shortcomings. For the Order regex, I would like to not have to look for a minimum of 4 digits. I'd much rather do (?<=Order = )\d*(?=[,)]) so it will also find the right location if the current order value is less than 4 digits or even not entered at all. But this will also match the order in the Display annotation for the Title. I've tried making the expression not greedy, as is the accepted answer in just about every search result I find when googling my question, but that doesn't seem to do anything.
For the GUID, I'm running into the same problem. I can't be sure that there will not be another GUID somewhere in the document, that I don't want to replace. So for this expression the problem is basically the same, I only want to find the value of the first GUID in the document.
Another approach I've tried is to look for the Order and GUID inside the ContentType block, but I've not been able to get that to work.
A little background information to put this question in context: I'm writing a VS Extension that will generate the order number based on the text selected by the user and also replace the GUID with a newly generated GUID. I'm using EnvDTE.TextDocument.ReplacePattern() to replace the value for the order and GUID after they've been generated.
You may use the following solution:
var result = Regex.Replace(
Regex.Replace(input, #"(?s)(\[ContentType\((?:(?!\)]).)*?\bOrder\s*=\s*)\d*(.*?\)])", "${1}<<ORDER>>$2"),
#"(?s)(\[ContentType\((?:(?!\)]).)*?\bGUID\s*=\s*""?)[\w-]*(.*?\)])",
"${1}<<GUID>>$2");
See the C# online demo that shows that the Order and GUID values are only replaced in the ContentType part:
Order = <<ORDER>>,
GUID = "<<GUID>>",
Note that the replacement backreferences are made unambiguous by using curly braces since most probably your replacements will be starting with digits and that could create an invalid group reference.
The pattern matches:
(?s) - enables . to match newlines
(\[ContentType\((?:(?!\)]).)*?\bGUID\s*=\s*"?) - Group 1 capturing:
\[ContentType\( - a [ContentType( substring
(?:(?!\)]).)*? - any char not starting a )] sequence, as few as possible,
\bGUID - a whole word GUID (or Order)
\s*=\s* - a = enclosed with 0+ whitespaces
"? - an optional "
[\w-]* - 0 or more word or - chars
(.*?\)]) - Group 2: any chars as few as possible up to the first )] including them.

How to limit which characters are permitted in strings?

I'm currently making a user class in c# that contains a first name, last name, username and email.
The username can only contain numbers [0-9], lower-case letters [a-z] and underscores '_'
The email can only contain [a-z], [A-Z], [0-9], as well as dot '.', comma ',', underscore '_' and hyphen '-'
How are these limitations set to strings in c#?
You could do this logic in the setters of the properties.
ex:
private string _firstName;
public string FirstName
{
get { return this._firstName; }
set
{
if (Regex.Match(value, YOUR_REGEX).Success)
this._firstName = value;
}
}
Sounds like the goal of this project is to get you to know how to use Regular Expressions. A good online resource for this which contains common patterns and testing is located at the Regular Expression Library
The simple email validation requirement you have can also be done with RegEx, but is actually incorrect as it will allow invalid addresses through; commas can be in an email address but they would need to be quoted which your pattern does not allow. The specifications for the local portion of an email address is probably one of the most poorly implemented standards on the internet.
For future reference, a cheat for checking email addresses within .Net is to use the MailAddress class of the System.Net.Mail namespace. You can use a try...catch routine to see if the submitted address can be converted. The only problems to be aware of using this is it will allow addresses through which most people would not consider to be valid even though they are; such as a server name without an extension (.com etc).
private bool isValidEmail (string emailAddress) {
bool ReturnValue;
try {
MailAddress ma = new MailAddress(emailAddress);
ReturnValue = true;
}
catch (Exception) { ReturnValue = false; }
return ReturnValue;
}

Microsoft Exchange Services - How to get exact match using Resolve

Here is a question related to a Microsoft Exchange-integration.
I am calling the Microsoft Exchange Services-method ResolveName (string):
I am passing in a username, e.g. myusername , and I get two matches -one match with the username myusername and one with myusername2.
Now the question is: Is there any possibility to do a call that only returns direct matches, so that only matches with the exact username are returned?
Here follows the code:
:
var service = Service.GetService();
username = Regex.Replace(username, ".*\\\\(.*)", "$1", RegexOptions.None);
var resolvedNames = service.ResolveName(username);
foreach (var resolvedName in resolvedNames)
{
mailboxname = resolvedName.Mailbox.Address;
}
That method actually resolves e-mail addresses, so for an exact match you'd need to do something like this.
string username = "myUserName";
string domain = "myDomain.com";
string emailAddress = username + "#" + domain;
NameResolutionCollection resolvedContactList = _service.ResolveName(emailAddress);
If you cannot specify the 'username' any further than myusername (as Amicable's answer assumes you can do) then the only thing to do is write a wrapper around ResolveName that again matches all results against your search string, this time requiring an exact match.
And for doing so you would have to parse the domain name off again, because you get the full primary SMTP email address returned in .Mailbox.Address.
I'm doing the exact same thing in my Delphi code ;-)

Regular Expression for allowing multiple language input

Quick question regarding regular expression validation on textbox entry. Basically I have a textbox that I am using for user input in the form of a website address. The user can input anything (it doesn't have to be a valid website address - i.e. www.facebook.com. They could enter "blah blah", and that's fine but it will not run.
What I am after is to validate different languages, Arabic, Greek, Chinese, etc etc, because at present I only allow English characters.
The code for the method is below. I believe I will have to switch this from a whitelist to blacklist, so instead of seeing what matches, change the expression to invalid characters, and if the user enters one of these, don't allow it.
public static bool IsValidAddress(string path)
{
bool valid = false;
valid = (path.Length > 0);
if (valid)
{
string regexPattern = #"([0-9a-zA-Z*?]{1})([-0-9a-zA-Z_\.*?]{0,254})";
// Elimate the '"' character first up so it simplifies regular expressions.
valid = (path.Contains("\"") == false);
if (valid)
{
valid = IsValidAddress(path, regexPattern);
}
if (valid)
{
// Need an additional check to determine that the address does not begin with xn--,
// which is not permitted by the Internationalized Domain Name standard.
valid = (path.IndexOf("xn--") != 0);
}
}
return valid;
}
As you can see, I have the 0-9a-zA-Z included, but by default this will eliminate other languages, whereas I wish to include the languages.
Any help is greatly appreciated. If I've confused anyone, sorry! I can give more information if it is needed.
Thanks.
I don't know why you're trying to validate Uri's with Regex. .Net's Uri class is surely a much better match to your task, no?
Uri uri;
if(!Uri.TryParse(uriString, UriKind.Absolute, out uri))
{
//it's a bad URI
}

Categories

Resources