How to retrieve the locale(country) code from URL?

How to retrieve the locale(country) code from URL? - c#

I have a URL, which is like http://example.com/UK/Deal.aspx?id=322
My target is to remove the locale(country) part, to make it like http://example.com/Deal.aspx?id=322
Since the URL may have other similar formats like: https://ssl.example.com/JP/Deal.aspx?id=735, using "substring" function is not a good idea.
What I can think about is to use the following method for separating them, and map them back later.
HttpContext.Current.Request.Url.Scheme
HttpContext.Current.Request.Url.Host
HttpContext.Current.Request.Url.AbsolutePath
HttpContext.Current.Request.Url.Query
And, suppose HttpContext.Current.Request.Url.AbsolutePath will be:
/UK/Deal.aspx?id=322
I am not sure how to deal with this since my boss asked me not to use "regular expression"(he thinks it will impact performance...)
Except "Regular Expression", is there any other way to remove UK from it?
p.s.: the UK part may be JP, DE, or other country code.
By the way, for USA, there is no country code, and the url will be http://example.com/Deal.aspx?id=322
Please also take this situation into consideration.
Thank you.

Assuming that you'll have TwoLetterCountryISOName in the Url. yYou can use UriBuilder class to remove the path from Uri without using the Regex.
E.g.
var originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
if (IsLocaleEnabled(sourceUri))
{
var builder = new UriBuilder(sourceUri);
builder.Path
= builder.Path.Replace(sourceUri.Segments[1] /* remove UK/ */, string.Empty);
// Construct the Uri with new path
Uri newUri = builder.Uri;;
}
Update:
// Cache the instance for performance benefits.
static readonly Regex regex = new Regex(#"^[aA-zZ]{2}\/$", RegexOptions.Compiled);
/// <summary>
/// Regex to check if Url segments have the 2 letter
/// ISO code as first ocurrance after root
/// </summary>
private bool IsLocaleEnabled(Uri sourceUri)
{
// Update: Compiled regex are way much faster than using non-compiled regex.
return regex.IsMatch(sourceUri.Segments[1]);
}
For performance benefits you must cache it (means keep it in static readonly field). There's no need to parse a pre-defined regex on every request. This way you'll get all the performance benefits you can get.
Result - http://example.com/Deal.aspx?id=322

It all depends on whether the country code always has the same position. If it's not, then some more details on the possible formats are required.. Maybe you could check, if the first segment has two chars or something, to be sure it really is a country code (not sure if this is reliable though). Or you start with the filename, if it's always in the format /[optionalCountryCode]/deal.aspx?...
How about these two approaches (on string level):
public string RemoveCountryCode()
{
Uri originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
string hostAndPort = originalUri.GetLeftPart(UriPartial.Authority);
// v1: if country code is always there, always has same position and always
// has format 'XX' this is definitely the easiest and fastest
string trimmedPathAndQuery = originalUri.PathAndQuery.Substring("/XX/".Length);
// v2: if country code is always there, always has same position but might
// not have a fixed format (e.g. XXX)
trimmedPathAndQuery = string.Join("/", originalUri.PathAndQuery.Split('/').Skip(2));
// in both cases you need to join it with the authority again
return string.Format("{0}/{1}", hostAndPort, trimmedPathAndQuery);
}

If the AbsolutePath will always have the format /XX/...pagename.aspx?id=### where XX is the two letter country code, then you can just strip off the first 3 characters.
Example that removes the first 3 characters:
var targetURL = HttpContext.Current.Request.Url.AbsolutePath.Substring(3);
If the country code could be different lengths, then you could find the index of the second / character and start the substring from there.
var sourceURL = HttpContext.Current.Request.Url.AbsolutePath;
var firstOccurance = sourceURL.IndexOf('/')
var secondOccurance = sourceURL.IndexOf('/', firstOccurance);
var targetURL = sourceURL.Substring(secondOccurance);

The easy way would be to treat as string, split it by the "/" separator, remove the fourth element, and then join them back with the "/" separator again:
string myURL = "https://ssl.example.com/JP/Deal.aspx?id=735";
List<string> myURLsplit = myURL.Split('/').ToList().RemoveAt(3);
myURL = string.Join("/", myURLsplit);
RESULT: https://ssl.example.com/Deal.aspx?id=735

Related

How can I make a string out of a complex URL address

I've been trying to make this URL a workable string in C#, but unfortunately using extra "" or "#" is not cutting it. Even breaking it into smaller strings is proving difficult. I want to be able to convert the entire address into a single string.
this is the full address:
<https://my.address.com/BOE/OpenDocument/opendoc/openDocument.jsp?iDocID=ATTPCi6c.mZInSt5o3t_Xr8&sIDType=CUID&&sInstance=Last&lsMZV_MAT="+URLEncode(""+[Material].[Material - Key])+"&lsIZV_MAT=>
I've also tried this:
string url = #"https://my.address.com/BOE/OpenDocument/opendoc/openDocument.jsp?iDocID=ATTPCi6c.mZInSt5o3t_Xr8&sIDType=CUID&&sInstance=Last&lsMZV_MAT=";
string url2 = #"+ URLEncode("" +[Material].[Material - Key]) + """"";
string url3 = #"&lsIZV_MAT=";
Any help is appreciated.

The simplest solution is put additional quotes inside string literal and use string.Concat to join all of them into single URL string:
string url = #"https://my.address.com/BOE/OpenDocument/opendoc/openDocument.jsp?iDocID=ATTPCi6c.mZInSt5o3t_Xr8&sIDType=CUID&&sInstance=Last&lsMZV_MAT=";
string url2 = #"""+URLEncode(""+[Material].[Material - Key])+""";
string url3 = #"&lsIZV_MAT=";
string resultUrl = string.Concat(url, url2, url3);
NB: You can use Equals method or == operator to check if the generated string matches with desired URL string.

This may be a bit of a workaround rather than an actual solution but if you load the string from a text file and run to a breakpoint after it you should be able to find the way the characters are store or just run it from that.
You may also have the issue of some of the spaces you've added being left over which StringName.Replace could solve if that's causing issues.
I'd recommend first checking what exactly is being produced after the third statement and then let us know so we can try and see the difference between the result and original.

You are missing the triple quotes at the beginning of url2
string url = #"https://my.address.com/BOE/OpenDocument/opendoc/openDocument.jsp?iDocID=ATTPCi6c.mZInSt5o3t_Xr8&sIDType=CUID&&sInstance=Last&lsMZV_MAT=";
string url2 = #"""+URLEncode(""+[Material].[Material - Key])+""";
string url3 = #"&lsIZV_MAT=";

I just made two updates
t&lsMZV_MAT=" to t&lsMZV_MAT="" AND
[Material - Key])+" to [Material - Key])+""
string s = #"<https://my.address.com/BOE/OpenDocument/opendoc/openDocument.jsp?iDocID=ATTPCi6c.mZInSt5o3t_Xr8&sIDType=CUID&&sInstance=Last&lsMZV_MAT=""+ URLEncode([Material].[Material - Key])+""&lsIZV_MAT=>";
Console.Write(s);
Console.ReadKey();

Parse Line and Break it into Variables

I have a text file that contain only the FULL version number of an application that I need to extract and then parse it into separate Variables.
For example lets say the version.cs contains 19.1.354.6
Code I'm using does not seem to be working:
char[] delimiter = { '.' };
string currentVersion = System.IO.File.ReadAllText(#"C:\Applicaion\version.cs");
string[] partsVersion;
partsVersion = currentVersion.Split(delimiter);
string majorVersion = partsVersion[0];
string minorVersion = partsVersion[1];
string buildVersion = partsVersion[2];
string revisVersion = partsVersion[3];

Altough your problem is with the file, most likely it contains other text than a version, why dont you use Version class which is absolutely for this kind of tasks.
var version = new Version("19.1.354.6");
var major = version.Major; // etc..

What you have works fine with the correct input, so I would suggest making sure there is nothing else in the file you're reading.
In the future, please provide error information, since we can't usually tell exactly what you expect to happen, only what we know should happen.
In light of that, I would also suggest looking into using Regex for parsing in the future. In my opinion, it provides a much more flexible solution for your needs. Here's an example of regex to use:
var regex = new Regex(#"([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9])");
var match = regex.Match("19.1.354.6");
if (match.Success)
{
Console.WriteLine("Match[1]: "+match.Groups[1].Value);
Console.WriteLine("Match[2]: "+match.Groups[2].Value);
Console.WriteLine("Match[3]: "+match.Groups[3].Value);
Console.WriteLine("Match[4]: "+match.Groups[4].Value);
}
else
{
Console.WriteLine("No match found");
}
which outputs the following:
// Match[1]: 19
// Match[2]: 1
// Match[3]: 354
// Match[4]: 6

Regex matching dynamic words within an html string

I have an html string to work with as follows:
string html = new MvcHtmlString(item.html.ToString()).ToHtmlString();
There are two different types of text I need to match although very similar. I need the initial ^^ removed and the closing |^^ removed. Then if there are multiple clients I need the ^ separating clients changed to a comma(,).
^^Client One- This text is pretty meaningless for this task, but it will exist in the real document.|^^
^^Client One^Client Two^Client Three- This text is pretty meaningless for this task, but it will exist in the real document.|^^
I need to be able to match each client and make it bold.
Client One- This text is pretty meaningless for this task, but it will exist in the real document.
Client One, Client Two, Client Three- This text is pretty meaningless for this task, but it will exist in the real document.
A nice stack over flow user provided the following but I could not get it to work or find any matches when I tested it on an online regex tester.
const string pattern = #"\^\^(?<clients>[^-]+)(?<text>-.*)\|\^\^";
var result = Regex.Replace(html, pattern,
m =>
{
var clientlist = m.Groups["clients"].Value;
var newClients = string.Join(",", clientlist.Split('^').Select(s => string.Format("<strong>{0}</strong>", s)));
return newClients + m.Groups["text"];
});
I am very new to regex so any help is appreciated.

I'm new to C# so forgive me if I make rookie mistakes :)
const string pattern = #"\^\^([^-]+)(-[^|]+)\|\^\^";
var temp = Regex.Replace(html, pattern, "<strong>$1</strong>$2");
var result = Regex.Replace(temp, #"\^", "</strong>, <strong>");
I'm using $1 even though MSDN is vague about using that syntax to reference subgroups.
Edit: if it's possible that the text after - contains a ^ you can do this:
var result = Regex.Replace(temp, #"\^(?=.*-)", "</strong>, <strong>");

Taking params from a url

Take these two URLs:
www.mySite.com?name=ssride360
www.mySite.com/ssride360
I know that to get the name param from url 1 you would do:
string name = Request.Params['name'];
But how would I get that for the second url?
I was thinking about attempting to copy the url and remove the known information (www.mySite.com) and then from there I could set name to the remainder.
How would I do a url copy like that? Is there a better way to get 'ssride360' from the second url?
Edit Looking on SO I found some info on copying URLs
string url = HttpContext.Current.Request.Url.AbsoluteUri;
// http://localhost:1302/TESTERS/Default6.aspx
string path = HttpContext.Current.Request.Url.AbsolutePath;
// /TESTERS/Default6.aspx
Is this the best way for me? each url have one additional param (mySite.com/ssride360?site=SO) for example. Also I know that mySite.com/ssride360 would reference a folder in my project so wouldn't i be getting that file along with it (mySite.com/ssride360/Default6.aspx)?
At this point I think there are better ways then a url copy.
Suggestions?

Uri x = new Uri("http://www.mySite.com/ssride360");
Console.WriteLine (x.AbsolutePath);
prints /ssride360

This method will allow you to get the name even if there is something after it. It is also a good model to use if you plan on putting other stuff after the name and want to get those values.
char [] delim = new char[] {'/'};
string url = "www.mySite.com/ssride360";
string name = url.Split(delim)[1];
Then if you had a URL that included an ID after the name you could do:
char [] delim = new char[] {'/'};
string url = "www.mySite.com/ssride360/abc1234";
string name = url.Split(delim)[1];
string id = url.Split(delim)[2];

URL rewriting is a common solution for this problem. You give it the patterns of the URL's you want to match and what it needs to change it into. So it would detect www.mySite.com/ssride360 and transform it into www.mySite.com?name=ssride360. The user of the website sees the original URL and doesn't know anything changed, but your code sees the transformed URL so you can access the variables in the normal way. Another big plus is that the rules allow you to set the patterns that get transformed as well as the ones that just get passed through to actual folders / files.
http://learn.iis.net/page.aspx/461/creating-rewrite-rules-for-the-url-rewrite-module/

Like javascript? If so...
<script type="text/javascript">
function getName() {
var urlParts = window.location.pathname.split('/'); //split the URL.
return urlParts[1]; //get the value to the right of the '/'.
}
</script>

How to remove PROTOCOL from URI

how can I remove the protocol from URI? i.e. remove HTTP

You can use this the System.Uri class like this:
System.Uri uri = new Uri("http://stackoverflow.com/search?q=something");
string uriWithoutScheme = uri.Host + uri.PathAndQuery + uri.Fragment;
This will give you stackoverflow.com/search?q=something
Edit: this also works for about:blank :-)

The best (and to me most beautiful) way is to use the Uri class for parsing the string to an absolute URI and then use the GetComponents method with the correct UriComponents enumeration to remove the scheme:
Uri uri;
if (Uri.TryCreate("http://stackoverflow.com/...", UriKind.Absolute, out uri))
{
return uri.GetComponents(UriComponents.AbsoluteUri &~ UriComponents.Scheme, UriFormat.UriEscaped);
}
For further reference: the UriComponents enumeration is a decorated with the FlagsAttribute, so bitwise operations (eg. & and |) can be used on it. In this case the &~ removes the bits for UriComponents.Scheme from UriComponents.AbsoluteUri using the AND operator in combination with the bitwise complement operator.

In the general sense (not limiting to http/https), an (absolute) uri is always a scheme followed by a colon, followed by scheme-specific data. So the only safe thing to do is cut at the scheme:
string s = "http://stackoverflow.com/questions/4517240/";
int i = s.IndexOf(':');
if (i > 0) s = s.Substring(i + 1);
In the case of http and a few others you may also want to .TrimStart('/'), but this is not part of the scheme, and is not guaranteed to exist. Trivial example: about:blank.

You could use the RegEx for this. The below sample would meet your need.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="http://www.google.com";
string re1="((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))"; // HTTP URL 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String httpurl1=m.Groups[1].ToString();
Console.Write("("+httpurl1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
Let me know if this helps

It's not the most beautiful way, but try something like this:
var uri = new Uri("http://www.example.com");
var scheme = uri.Scheme;
var result = uri.ToString().SubString(scheme.Length + 3);

The above answers work in most cases, but IMO it's not a complete solution:
uri.Host + uri.PathAndQuery + uri.Fragment;
drops port if specified (e.g. http://www.example.com:8080/path/ becomes www.example.com/path/ )
uri.GetComponents(UriComponents.AbsoluteUri & ~UriComponents.Scheme, UriFormat.UriEscaped)
preserves ports and seems generally better, but in some cases, (which are most likely to be incorrect, but not impossible), I got some characters escaped that shouldn't.
In both cases we get '/' added at the end, so if your url is potentially sensitive to that difference, or you care how it looks, you need need to check if it was present before and if not TrimEnd it.
On top of that both of those solution throw exception if Uri is considered invalid, so if your url already doesn't have the 'schema' (e.g. www.example.com) the code above fails.
If you want something really generic and working for input over which you might not have control (e.g. user input), I'd probably stick to a simpler solution, e.g:
var endOfSchemaIdx = url.IndexOf("://");
if(endOfSchemaIdx != -1)
return url.Substring(endOfSchemaIdx+3);
return url;
You can also fetch the schema via a library like FLURL (doesn't throw exception on www.example.com) and look up the first occurrence of "url.Schema" + "://", then delete it if exists. I feel safer if the rest of the url is not processed by any library, unless that is your intention.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to retrieve the locale(country) code from URL? - c#

Related

How can I make a string out of a complex URL address

Parse Line and Break it into Variables

Regex matching dynamic words within an html string

Taking params from a url

How to remove PROTOCOL from URI

Categories

Resources