C# Regex.Replace - c#

this is my Set of string inside richtextbox1..
/Category/5
/Category/4
/Category/19
/Category/22
/Category/26
/Category/27
/Category/24
/Category/3
/Category/1
/Category/15
http://example.org/Category/15/noneedtoadd
i want to change all the starting "/" with some url like "http://example.com/"
output:
http://example.com/Category/5
http://example.com/Category/4
http://example.com/Category/19
http://example.com/Category/22
http://example.com/Category/26
http://example.com/Category/27
http://example.com/Category/24
http://example.com/Category/3
http://example.com/Category/1
http://example.com/Category/15
http://example.org/Category/15/noneedtoadd
just asking, what is the pattern for that? :)

You don't need a regular expression here. Iterate through the items in your list and use String.Format to build the desired URL.
String.Format(#"http://example.com{0}", str);
If you want to check to see whether one of the items in that textbox is a fully-formed URL before prepending the string, then use String.StartsWith (doc).
if (!String.StartsWith("http://")) {
// use String.Format
}

Since you're dealing with URIs, you can take advantage of the Uri Class which can resolve relative URIs:
Uri baseUri = new Uri("http://example.com/");
Uri result1 = new Uri(baseUri, "/Category/5");
// result1 == {http://example.com/Category/5}
Uri result2 = new Uri(baseUri, "http://example.org/Category/15/noneedtoadd");
// result2 == {http://example.org/Category/15/noneedtoadd}

The raw regex pattern is ^/ which means that it will match a slash at the beginning of the line.
Regex.Replace (text, #"^/", "http://example.com/")

Related

How to retrieve the locale(country) code from URL?

I have a URL, which is like http://example.com/UK/Deal.aspx?id=322
My target is to remove the locale(country) part, to make it like http://example.com/Deal.aspx?id=322
Since the URL may have other similar formats like: https://ssl.example.com/JP/Deal.aspx?id=735, using "substring" function is not a good idea.
What I can think about is to use the following method for separating them, and map them back later.
HttpContext.Current.Request.Url.Scheme
HttpContext.Current.Request.Url.Host
HttpContext.Current.Request.Url.AbsolutePath
HttpContext.Current.Request.Url.Query
And, suppose HttpContext.Current.Request.Url.AbsolutePath will be:
/UK/Deal.aspx?id=322
I am not sure how to deal with this since my boss asked me not to use "regular expression"(he thinks it will impact performance...)
Except "Regular Expression", is there any other way to remove UK from it?
p.s.: the UK part may be JP, DE, or other country code.
By the way, for USA, there is no country code, and the url will be http://example.com/Deal.aspx?id=322
Please also take this situation into consideration.
Thank you.
Assuming that you'll have TwoLetterCountryISOName in the Url. yYou can use UriBuilder class to remove the path from Uri without using the Regex.
E.g.
var originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
if (IsLocaleEnabled(sourceUri))
{
var builder = new UriBuilder(sourceUri);
builder.Path
= builder.Path.Replace(sourceUri.Segments[1] /* remove UK/ */, string.Empty);
// Construct the Uri with new path
Uri newUri = builder.Uri;;
}
Update:
// Cache the instance for performance benefits.
static readonly Regex regex = new Regex(#"^[aA-zZ]{2}\/$", RegexOptions.Compiled);
/// <summary>
/// Regex to check if Url segments have the 2 letter
/// ISO code as first ocurrance after root
/// </summary>
private bool IsLocaleEnabled(Uri sourceUri)
{
// Update: Compiled regex are way much faster than using non-compiled regex.
return regex.IsMatch(sourceUri.Segments[1]);
}
For performance benefits you must cache it (means keep it in static readonly field). There's no need to parse a pre-defined regex on every request. This way you'll get all the performance benefits you can get.
Result - http://example.com/Deal.aspx?id=322
It all depends on whether the country code always has the same position. If it's not, then some more details on the possible formats are required.. Maybe you could check, if the first segment has two chars or something, to be sure it really is a country code (not sure if this is reliable though). Or you start with the filename, if it's always in the format /[optionalCountryCode]/deal.aspx?...
How about these two approaches (on string level):
public string RemoveCountryCode()
{
Uri originalUri = new Uri("http://example.com/UK/Deal.aspx?id=322");
string hostAndPort = originalUri.GetLeftPart(UriPartial.Authority);
// v1: if country code is always there, always has same position and always
// has format 'XX' this is definitely the easiest and fastest
string trimmedPathAndQuery = originalUri.PathAndQuery.Substring("/XX/".Length);
// v2: if country code is always there, always has same position but might
// not have a fixed format (e.g. XXX)
trimmedPathAndQuery = string.Join("/", originalUri.PathAndQuery.Split('/').Skip(2));
// in both cases you need to join it with the authority again
return string.Format("{0}/{1}", hostAndPort, trimmedPathAndQuery);
}
If the AbsolutePath will always have the format /XX/...pagename.aspx?id=### where XX is the two letter country code, then you can just strip off the first 3 characters.
Example that removes the first 3 characters:
var targetURL = HttpContext.Current.Request.Url.AbsolutePath.Substring(3);
If the country code could be different lengths, then you could find the index of the second / character and start the substring from there.
var sourceURL = HttpContext.Current.Request.Url.AbsolutePath;
var firstOccurance = sourceURL.IndexOf('/')
var secondOccurance = sourceURL.IndexOf('/', firstOccurance);
var targetURL = sourceURL.Substring(secondOccurance);
The easy way would be to treat as string, split it by the "/" separator, remove the fourth element, and then join them back with the "/" separator again:
string myURL = "https://ssl.example.com/JP/Deal.aspx?id=735";
List<string> myURLsplit = myURL.Split('/').ToList().RemoveAt(3);
myURL = string.Join("/", myURLsplit);
RESULT: https://ssl.example.com/Deal.aspx?id=735

Fetching a segment of the URL in the address bar

This is what I tried:
string myURL= "http://mysite.com/articles/healthrelated";
String idStr = myURL.Substring(myURL.LastIndexOf('/') + 1);
I need to fetch "healthrelated" ie the text after the last slash in the URL. Now the problem is that my URL can also be like :
"http://mysite.com/articles/healthrelated/"
ie "a Slash" at the end of that text too. Now the last slash becomes the one AFTER "healthrelated" and so the result I get using
String idStr = myURL.Substring(myURL.LastIndexOf('/') + 1);
is empty string..
what should my code be like so I always get that text "healthrelated" no matter if there's a slash in the end or not. I just need to fetch that text somehow.
Try this.
var lastSegment = url
.Split(new string[]{"/"}, StringSplitOptions.RemoveEmptyEntries)
.ToList()
.Last();
Why don't you use Uri class of .NET and use segments property:
http://msdn.microsoft.com/en-us/library/system.uri.segments.aspx
What you can do in this situation is either using REGEX (which I'm not an expert on, but I'm shure other ppl here are ;) ) or a simple:
string[] urlParts = myURL.Split('/');
and take the last string in this array.

Need help about Regular Expression Syntax

I tried to take only this part(after the "j&q") from link
(http://www.google.com/aclk?sa=lai=CEvAD5thCTfHPCIq5gwe2lOWKD6n_uOIB4bzDkxm8uIhRCAAQASDrxZ0GKANQgI6s1ANgybblirSk2A-gAYem9NwDyAEBqQLN5n97JxulPqoEGk_QITE_eyPbZTKIyNFl8dQhptl05oxQ2fHjgAWQTg&sig=AGiWqtwLGY6f1Gnci0e0ojoRsLBxr9joLg&adurl=http://www.mediterraholidays.com/egypt/cairo-and-nile-cruise&rct=j&q=egpyt%20package%20trips).
I used ^.*q=.*$ but with this. I need only after the j&q part if it has.
Why don't you use System.Uri class for this:
Uri url = new Uri("http://www.google.com/aclk?sa=lai=CEvAD5thCTfHPCIq5gwe2lOWKD6n_uOIB4bzDkxm8uIhRCAAQASDrxZ0GKANQgI6s1ANgybblirSk2A-gAYem9NwDyAEBqQLN5n97JxulPqoEGk_QITE_eyPbZTKIyNFl8dQhptl05oxQ2fHjgAWQTg&sig=AGiWqtwLGY6f1Gnci0e0ojoRsLBxr9joLg&adurl=http://www.mediterraholidays.com/egypt/cairo-and-nile-cruise&rct=j&q=egpyt%20package%20trips");
var queryString = HttpUtility.ParseQueryString(url.Query);
var q = queryString["q"];
The q variable holds the value: egpyt package trips
&q=(?<data>[^&]*)
The answer needs to be at least 30 chars, so I add some joke:
“Knock, knock.”
“Who’s there?”
very long pause….
“Java.”

trim url string. c#

how can i trim a youtube url so it only returns the video id for example http://www.youtube.com/watch?v=VPqTW-9U9nU. how would i return VPqTW-9U9nU. this has to be for several url inputted. I would like to use regex but I do not understand it at all. so if somebody has a solution with regex could you explain it in abit more details :)
Without doing any string manipulation you can use Uri and ParseQueryString
Uri uri = new Uri("http://www.youtube.com/watch?v=VPqTW-9U9nU");
var s = HttpUtility.ParseQueryString(uri.Query).Get("v");
No RegEx needed in this case:
string url = "http://www.youtube.com/watch?v=VPqTW-9U9nU";
string videoId = url.Substring(url.IndexOf("?v=") + 3);
Why not just stick with something simple?
string youTubeUrl = "http://www.youtube.com/watch?v=VPqTW-9U9nU";
string id = youTubeUrl.Replace("http://www.youtube.com/watch?v=", String.Empty);
Regular expressions are handy, but sometimes overkill and can make your code harder to understand when you use them in places you don't need them.
Try something like this:
string url = "http://www.youtube.com/watch?v=VPqTW-9U9nU";
string video_id = url.Substring(0,url.LastIndexOf("=')+1);
The other answers look right, too.
You could also use String.Split():
url.Split(new[] { '=' }, 2)[1]

How to remove PROTOCOL from URI

how can I remove the protocol from URI? i.e. remove HTTP
You can use this the System.Uri class like this:
System.Uri uri = new Uri("http://stackoverflow.com/search?q=something");
string uriWithoutScheme = uri.Host + uri.PathAndQuery + uri.Fragment;
This will give you stackoverflow.com/search?q=something
Edit: this also works for about:blank :-)
The best (and to me most beautiful) way is to use the Uri class for parsing the string to an absolute URI and then use the GetComponents method with the correct UriComponents enumeration to remove the scheme:
Uri uri;
if (Uri.TryCreate("http://stackoverflow.com/...", UriKind.Absolute, out uri))
{
return uri.GetComponents(UriComponents.AbsoluteUri &~ UriComponents.Scheme, UriFormat.UriEscaped);
}
For further reference: the UriComponents enumeration is a decorated with the FlagsAttribute, so bitwise operations (eg. & and |) can be used on it. In this case the &~ removes the bits for UriComponents.Scheme from UriComponents.AbsoluteUri using the AND operator in combination with the bitwise complement operator.
In the general sense (not limiting to http/https), an (absolute) uri is always a scheme followed by a colon, followed by scheme-specific data. So the only safe thing to do is cut at the scheme:
string s = "http://stackoverflow.com/questions/4517240/";
int i = s.IndexOf(':');
if (i > 0) s = s.Substring(i + 1);
In the case of http and a few others you may also want to .TrimStart('/'), but this is not part of the scheme, and is not guaranteed to exist. Trivial example: about:blank.
You could use the RegEx for this. The below sample would meet your need.
using System;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string txt="http://www.google.com";
string re1="((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))"; // HTTP URL 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String httpurl1=m.Groups[1].ToString();
Console.Write("("+httpurl1.ToString()+")"+"\n");
}
Console.ReadLine();
}
}
}
Let me know if this helps
It's not the most beautiful way, but try something like this:
var uri = new Uri("http://www.example.com");
var scheme = uri.Scheme;
var result = uri.ToString().SubString(scheme.Length + 3);
The above answers work in most cases, but IMO it's not a complete solution:
uri.Host + uri.PathAndQuery + uri.Fragment;
drops port if specified (e.g. http://www.example.com:8080/path/ becomes www.example.com/path/ )
uri.GetComponents(UriComponents.AbsoluteUri & ~UriComponents.Scheme, UriFormat.UriEscaped)
preserves ports and seems generally better, but in some cases, (which are most likely to be incorrect, but not impossible), I got some characters escaped that shouldn't.
In both cases we get '/' added at the end, so if your url is potentially sensitive to that difference, or you care how it looks, you need need to check if it was present before and if not TrimEnd it.
On top of that both of those solution throw exception if Uri is considered invalid, so if your url already doesn't have the 'schema' (e.g. www.example.com) the code above fails.
If you want something really generic and working for input over which you might not have control (e.g. user input), I'd probably stick to a simpler solution, e.g:
var endOfSchemaIdx = url.IndexOf("://");
if(endOfSchemaIdx != -1)
return url.Substring(endOfSchemaIdx+3);
return url;
You can also fetch the schema via a library like FLURL (doesn't throw exception on www.example.com) and look up the first occurrence of "url.Schema" + "://", then delete it if exists. I feel safer if the rest of the url is not processed by any library, unless that is your intention.

Categories

Resources