C# code to linkify urls in a string - c#

Does anyone have any good c# code (and regular expressions) that will parse a string and "linkify" any urls that may be in the string?

It's a pretty simple task you can acheive it with Regex and a ready-to-go regular expression from:
http://regexlib.com/
Something like:
var html = Regex.Replace(html, #"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+" +
"\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?" +
"([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$",
"$1");
You may also be interested not only in creating links but in shortening URLs. Here is a good article on this subject:
Resolve and shorten URLs in C#
See also:
Regular Expression Workbench at MSDN
Converting a URL into a Link in C# Using Regular Expressions
Regex to find URL within text and make them as link
Regex.Replace Method at MSDN
The Problem With URLs by Jeff Atwood
Parsing URLs with Regular Expressions and the Regex Object
Format URLs in string to HTML Links in C#
Automatically hyperlink URL and Email in ASP.NET Pages with C#

well, after a lot of research on this, and several attempts to fix times when
people enter in http://www.sitename.com and www.sitename.com in the same post
fixes to parenthisis like (http://www.sitename.com) and http://msdn.microsoft.com/en-us/library/aa752574(vs.85).aspx
long urls like: http://www.amazon.com/gp/product/b000ads62g/ref=s9_simz_gw_s3_p74_t1?pf_rd_m=atvpdkikx0der&pf_rd_s=center-2&pf_rd_r=04eezfszazqzs8xfm9yd&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846
we are now using this HtmlHelper extension... thought I would share and get any comments:
private static Regex regExHttpLinks = new Regex(#"(?<=\()\b(https?://|www\.)[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|](?=\))|(?<=(?<wrap>[=~|_#]))\b(https?://|www\.)[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|](?=\k<wrap>)|\b(https?://|www\.)[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]", RegexOptions.Compiled | RegexOptions.IgnoreCase);
public static string Format(this HtmlHelper htmlHelper, string html)
{
if (string.IsNullOrEmpty(html))
{
return html;
}
html = htmlHelper.Encode(html);
html = html.Replace(Environment.NewLine, "<br />");
// replace periods on numeric values that appear to be valid domain names
var periodReplacement = "[[[replace:period]]]";
html = Regex.Replace(html, #"(?<=\d)\.(?=\d)", periodReplacement);
// create links for matches
var linkMatches = regExHttpLinks.Matches(html);
for (int i = 0; i < linkMatches.Count; i++)
{
var temp = linkMatches[i].ToString();
if (!temp.Contains("://"))
{
temp = "http://" + temp;
}
html = html.Replace(linkMatches[i].ToString(), String.Format("{1}", temp.Replace(".", periodReplacement).ToLower(), linkMatches[i].ToString().Replace(".", periodReplacement)));
}
// Clear out period replacement
html = html.Replace(periodReplacement, ".");
return html;
}

protected string Linkify( string SearchText ) {
// this will find links like:
// http://www.mysite.com
// as well as any links with other characters directly in front of it like:
// href="http://www.mysite.com"
// you can then use your own logic to determine which links to linkify
Regex regx = new Regex( #"\b(((\S+)?)(#|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b", RegexOptions.IgnoreCase );
SearchText = SearchText.Replace( " ", " " );
MatchCollection matches = regx.Matches( SearchText );
foreach ( Match match in matches ) {
if ( match.Value.StartsWith( "http" ) ) { // if it starts with anything else then dont linkify -- may already be linked!
SearchText = SearchText.Replace( match.Value, "<a href='" + match.Value + "'>" + match.Value + "</a>" );
}
}
return SearchText;
}

It's not that easy as you can read in this blog post by Jeff Atwood. It's especially hard to detect where an URL ends.
For example, is the trailing parenthesis part of the URL or not:
http​://en.wikipedia.org/wiki/PCTools(CentralPointSoftware)
an URL in parentheses (http​://en.wikipedia.org) more text
In the first case, the parentheses are part of the URL. In the second case they are not!

Have found following regular expression
http://daringfireball.net/2010/07/improved_regex_for_matching_urls
for me looks very good. Jeff Atwood solution doesn't handle many cases. josefresno seem to me handle all cases. But when I have tried to understand it (in case of any support requests) my brain was boiled.

There is class:
public class TextLink
{
#region Properties
public const string BeginPattern = "((http|https)://)?(www.)?";
public const string MiddlePattern = #"([a-z0-9\-]*\.)+[a-z]+(:[0-9]+)?";
public const string EndPattern = #"(/\S*)?";
public static string Pattern { get { return BeginPattern + MiddlePattern + EndPattern; } }
public static string ExactPattern { get { return string.Format("^{0}$", Pattern); } }
public string OriginalInput { get; private set; }
public bool Valid { get; private set; }
private bool _isHttps;
private string _readyLink;
#endregion
#region Constructor
public TextLink(string input)
{
this.OriginalInput = input;
var text = Regex.Replace(input, #"(^\s)|(\s$)", "", RegexOptions.IgnoreCase);
Valid = Regex.IsMatch(text, ExactPattern);
if (Valid)
{
_isHttps = Regex.IsMatch(text, "^https:", RegexOptions.IgnoreCase);
// clear begin:
_readyLink = Regex.Replace(text, BeginPattern, "", RegexOptions.IgnoreCase);
// HTTPS
if (_isHttps)
{
_readyLink = "https://www." + _readyLink;
}
// Default
else
{
_readyLink = "http://www." + _readyLink;
}
}
}
#endregion
#region Methods
public override string ToString()
{
return _readyLink;
}
#endregion
}
Use it in this method:
public static string ReplaceUrls(string input)
{
var result = Regex.Replace(input.ToSafeString(), TextLink.Pattern, match =>
{
var textLink = new TextLink(match.Value);
return textLink.Valid ?
string.Format("{1}", textLink, textLink.OriginalInput) :
textLink.OriginalInput;
});
return result;
}
Test cases:
[TestMethod]
public void RegexUtil_TextLink_Parsing()
{
Assert.IsTrue(new TextLink("smthing.com").Valid);
Assert.IsTrue(new TextLink("www.smthing.com/").Valid);
Assert.IsTrue(new TextLink("http://smthing.com").Valid);
Assert.IsTrue(new TextLink("http://www.smthing.com").Valid);
Assert.IsTrue(new TextLink("http://www.smthing.com/").Valid);
Assert.IsTrue(new TextLink("http://www.smthing.com/publisher").Valid);
// port
Assert.IsTrue(new TextLink("http://www.smthing.com:80").Valid);
Assert.IsTrue(new TextLink("http://www.smthing.com:80/").Valid);
// https
Assert.IsTrue(new TextLink("https://smthing.com").Valid);
Assert.IsFalse(new TextLink("").Valid);
Assert.IsFalse(new TextLink("smthing.com.").Valid);
Assert.IsFalse(new TextLink("smthing.com-").Valid);
}
[TestMethod]
public void RegexUtil_TextLink_ToString()
{
// default
Assert.AreEqual("http://www.smthing.com", new TextLink("smthing.com").ToString());
Assert.AreEqual("http://www.smthing.com", new TextLink("http://www.smthing.com").ToString());
Assert.AreEqual("http://www.smthing.com/", new TextLink("smthing.com/").ToString());
Assert.AreEqual("https://www.smthing.com", new TextLink("https://www.smthing.com").ToString());
}

This works for me:
str = Regex.Replace(str,
#"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?)",
"<a target='_blank' href='$1'>$1</a>");

Related

Encode only accented characters in an HTML string

I have the following function that accepts an HTML string, for example "<p>áêö</p>":
public string EncodeString(string input)
{
// ...
return System.Net.WebUtility.HtmlEncode(input);
}
I'd like to modify that function to output the same string, but with the accented characters as HTML entities. Using System.Net.WebUtility.HtmlEncode() encodes the entire string, including the HTML tags. I'd like to preserve the HTML tags if possible, since the string is parsed and rendered elsewhere in the application. Is this something that is better solved with a regex?
This is quite possibly the oddest solution, but...
public static string EncodeString(string input)
{
string startTag = input.Substring(0, input.IndexOf(">") + 1);
string endTag = input.Substring(input.IndexOf("</"), startTag.Length + 1);
input = input.Substring(startTag.Length, input.Length - endTag.Length - startTag.Length);
return startTag + System.Net.WebUtility.HtmlEncode(input) + endTag;
}
You can use a library like AngleSharp to replace the content of an html element:
public static async Task<string> EncodeString(string input)
{
var context = BrowsingContext.New(Configuration.Default);
var document = await context.OpenAsync(req => req.Content(input));
var pElement = document.QuerySelector("p");
pElement.TextContent = System.Net.WebUtility.HtmlEncode(pElement.TextContent);
return pItem.ToHtml();
}
See it in action here: .NET Fiddle
For more general situations where you have nested elements, here's the adapted code:
public static async Task<string> EncodeString(string input)
{
var context = BrowsingContext.New(Configuration.Default);
var document = await context.OpenAsync(req => req.Content(input));
return await EncodeString(document.Body.FirstChild);
}
private static async Task<string> EncodeString(INode content)
{
foreach(var node in content.ChildNodes)
{
node.NodeValue = node.NodeType == NodeType.Text ?
System.Net.WebUtility.HtmlEncode(node.NodeValue) :
await EncodeString(node);
}
return content.ToHtml();
}

Split special string in c#

I want to split the below string with given output.
Can anybody help me to do this.
Examples:
/TEST/TEST123
Output: /Test/
/TEST1/Test/Test/Test/
Output: /Test1/
/Text/12121/1212/
Output: /Text/
/121212121/asdfasdf/
Output: /121212121/
12345
Output: 12345
I have tried string.split function but it is not worked well. Is there any idea or logic that i can implement to achieve this situation.
If the answer in regular expression that would be fine for me.
You simply want the first result of Spiting by /
string output = input.Split('/')[0];
But in case that you have //TEST/ and output should be /TEST you can use regex.
string output = Regex.Matches(input, #"\/?(.+?)\/")[0].Groups[1].Value;
For your 5th case : you have to separate the logic. for example:
public static string Method(string input)
{
var split = input.Split(new[] {'/'}, StringSplitOptions.RemoveEmptyEntries);
if (split.Length == 0) return input;
return split[0];
}
Or using regex.
public static string Method(string input)
{
var matches = Regex.Matches(input, #"\/?(.+?)\/");
if (matches.Count == 0) return input;
return matches[0].Groups[1].Value;
}
Some results using method:
TEST/54/ => TEST
TEST => TEST
/TEST/ => TEST
I think this would work:
string s1 = "/TEST/TEST123";
string s2 = "/TEST1/Test/Test/Test/";
string s3 = "/Text/12121/1212/";
string s4 = "/121212121/asdfasdf/";
string s5 = "12345";
string pattern = #"\/?[a-zA-Z0-9]+\/?";
Console.WriteLine(Regex.Matches(s1, pattern)[0]);
Console.WriteLine(Regex.Matches(s2, pattern)[0]);
Console.WriteLine(Regex.Matches(s3, pattern)[0]);
Console.WriteLine(Regex.Matches(s4, pattern)[0]);
Console.WriteLine(Regex.Matches(s5, pattern)[0]);
class Program
{
static void Main(string[] args)
{
string example = "/TEST/TEST123";
var result = GetFirstItem(example);
Console.WriteLine("First in the list : {0}", result);
}
static string GetFirstItem(string value)
{
var collection = value?.Split(new char[] { '/' }, StringSplitOptions.RemoveEmptyEntries);
var result = collection[0];
return result;
}
}
StringSplitOptions.RemoveEmptyEntries is an enum which tells the Split function that when it has split the string into an array, if there are elements in the array that are empty strings, the function should not include the empty elements in the results. Basically you want the collection to contain only values.
public string functionName(string input)
{
if(input.Contains('/'))
{
string SplitStr = input.Split('/')[1];
return "/"+SplitStr .Substring(0, 1) +SplitStr.Substring(1).ToLower()+"/"
}
return input;
}
output = (output.Contains("/"))? '/' +input.Split('/')[1]+'/':input;
private void button1_Click(object sender, EventArgs e)
{
string test = #"/Text/12121/1212/";
int first = test.IndexOf("/");
int last = test.Substring(first+1).IndexOf("/");
string finall = test.Substring(first, last+2);
}
i try this code with all your examples and get correct output. try this.
The following method may help you.
public string getValue(string st)
{
if (st.IndexOf('/') == -1)
return st;
return "/" + st.Split('/')[1] + "/";
}

Replace item in querystring

I have a URL that also might have a query string part, the query string might be empty or have multiple items.
I want to replace one of the items in the query string or add it if the item doesn't already exists.
I have an URI object with the complete URL.
My first idea was to use regex and some string magic, that should do it.
But it seems a bit shaky, perhaps the framework has some query string builder class?
I found this was a more elegant solution
var qs = HttpUtility.ParseQueryString(Request.QueryString.ToString());
qs.Set("item", newItemValue);
Console.WriteLine(qs.ToString());
Lets have this url:
https://localhost/video?param1=value1
At first update specific query string param to new value:
var uri = new Uri("https://localhost/video?param1=value1");
var qs = HttpUtility.ParseQueryString(uri.Query);
qs.Set("param1", "newValue2");
Next create UriBuilder and update Query property to produce new uri with changed param value.
var uriBuilder = new UriBuilder(uri);
uriBuilder.Query = qs.ToString();
var newUri = uriBuilder.Uri;
Now you have in newUri this value:
https://localhost/video?param1=newValue2
Maybe you could use the System.UriBuilder class. It has a Query property.
I use following method:
public static string replaceQueryString(System.Web.HttpRequest request, string key, string value)
{
System.Collections.Specialized.NameValueCollection t = HttpUtility.ParseQueryString(request.Url.Query);
t.Set(key, value);
return t.ToString();
}
string link = page.Request.Url.ToString();
if(page.Request.Url.Query == "")
link += "?pageIndex=" + pageIndex;
else if (page.Request.QueryString["pageIndex"] != "")
{
var idx = page.Request.QueryString["pageIndex"];
link = link.Replace("pageIndex=" + idx, "pageIndex=" + pageIndex);
}
else
link += "&pageIndex=" + pageIndex;
This seems to work really well.
No, the framework doesn't have any existing QueryStringBuilder class, but usually the querystring information in a HTTP request is available as an iterable and searchable NameValueCollection via the Request.Querystring property.
Since you are starting off with a Uri object, however, you will need to obtain the querystring portion using the Query property of the Uri object. This will yield a string of the form:
Uri myURI = new Uri("http://www.mywebsite.com/page.aspx?Val1=A&Val2=B&Val3=C");
string querystring = myURI.Query;
// Outputs: "?Val1=A&Val2=B&Val3=C". Note the ? prefix!
Console.WriteLine(querystring);
You can then split this string on the ampersand character to differentiate it into different querystring parameters-value pairs. Then again split each parameter on the "=" character to differentiate it into a key and value.
Since your final goal is to search for a particular querystring key and if necessary create it, you should try to (re)create a collection (preferably, a generic one) that allows you easily search in the collection, similar to the facility provided by the NameValueCollection class.
I used the following code to append/replace the value of a parameter in the current request URL:
public static string CurrentUrlWithParam(this UrlHelper helper, string paramName, string paramValue)
{
var url = helper.RequestContext.HttpContext.Request.Url;
var sb = new StringBuilder();
sb.AppendFormat("{0}://{1}{2}{3}",
url.Scheme,
url.Host,
url.IsDefaultPort ? "" : ":" + url.Port,
url.LocalPath);
var isFirst = true;
if (!String.IsNullOrWhiteSpace(url.Query))
{
var queryStrings = url.Query.Split(new[] { '?', ';' });
foreach (var queryString in queryStrings)
{
if (!String.IsNullOrWhiteSpace(queryString) && !queryString.StartsWith(paramName + "="))
{
sb.AppendFormat("{0}{1}", isFirst ? "?" : ";", queryString);
isFirst = false;
}
}
}
sb.AppendFormat("{0}{1}={2}", isFirst ? "?" : ";", paramName, paramValue);
return sb.ToString();
}
Maybe this helps others when finding this topic.
Update:
Just saw the hint about UriBuilder and did a second version using UriBuilder, StringBuilder and Linq:
public static string CurrentUrlWithParam(this UrlHelper helper, string paramName, string paramValue)
{
var url = helper.RequestContext.HttpContext.Request.Url;
var ub = new UriBuilder(url.Scheme, url.Host, url.Port, url.LocalPath);
// Query string
var sb = new StringBuilder();
var isFirst = true;
if (!String.IsNullOrWhiteSpace(url.Query))
{
var queryStrings = url.Query.Split(new[] { '?', ';' });
foreach (var queryString in queryStrings.Where(queryString => !String.IsNullOrWhiteSpace(queryString) && !queryString.StartsWith(paramName + "=")))
{
sb.AppendFormat("{0}{1}", isFirst ? "" : ";", queryString);
isFirst = false;
}
}
sb.AppendFormat("{0}{1}={2}", isFirst ? "" : ";", paramName, paramValue);
ub.Query = sb.ToString();
return ub.ToString();
}
I agree with Cerebrus. Sticking to the KISS principle, you have the querystring,
string querystring = myURI.Query;
you know what you are looking for and what you want to replace it with.
So use something like this:-
if (querystring == "")
myURI.Query += "?" + replacestring;
else
querystring.replace (searchstring, replacestring); // not too sure of syntax !!
I answered a similar question a while ago. Basically, the best way would be to use the class HttpValueCollection, which the QueryString property actually is, unfortunately it is internal in the .NET framework.
You could use Reflector to grab it (and place it into your Utils class). This way you could manipulate the query string like a NameValueCollection, but with all the url encoding/decoding issues taken care for you.
HttpValueCollection extends NameValueCollection, and has a constructor that takes an encoded query string (ampersands and question marks included), and it overrides a ToString() method to later rebuild the query string from the underlying collection.
public class QueryParams : Dictionary<string,string>
{
private Uri originolUrl;
private Uri ammendedUrl;
private string schemeName;
private string hostname;
private string path;
public QueryParams(Uri url)
{
this.originolUrl = url;
schemeName = url.Scheme;
hostname = url.Host;
path = url.AbsolutePath;
//check uri to see if it has a query
if (url.Query.Count() > 1)
{
//we grab the query and strip of the question mark as we do not want it attached
string query = url.Query.TrimStart("?".ToArray());
//we grab each query and place them into an array
string[] parms = query.Split("&".ToArray());
foreach (string str in parms)
{
// we split each query into two strings(key) and (value) and place into array
string[] param = str.Split("=".ToArray());
//we add the strings to this dictionary
this.Add(param[0], param[1]);
}
}
}
public QueryParams Set(string paramName, string value)
{
if(this.ContainsKey(paramName))
{
//if key exists change value
this[paramName] = value;
return (this);
}
else
{
this.Add(paramName, value);
return this;
}
}
public QueryParams Set(string paramName, int value)
{
if (this.ContainsKey(paramName))
{
//if key exists change value
this[paramName] = value.ToString();
return (this);
}
else
{
this.Add(paramName, value);
return this;
}
}
public void Add(string key, int value)
{
//overload, adds a new keypair
string strValue = value.ToString();
this.Add(key, strValue);
}
public override string ToString()
{
StringBuilder queryString = new StringBuilder();
foreach (KeyValuePair<string, string> pair in this)
{
//we recreate the query from each keypair
queryString.Append(pair.Key + "=" + pair.Value + "&");
}
//trim the end of the query
string modifiedQuery = queryString.ToString().TrimEnd("&".ToArray());
if (this.Count() > 0)
{
UriBuilder uriBuild = new UriBuilder(schemeName, hostname);
uriBuild.Path = path;
uriBuild.Query = modifiedQuery;
ammendedUrl = uriBuild.Uri;
return ammendedUrl.AbsoluteUri;
}
else
{
return originolUrl.ToString();
}
}
public Uri ToUri()
{
this.ToString();
return ammendedUrl;
}
}
}

Path.Combine for URLs?

Path.Combine is handy, but is there a similar function in the .NET framework for URLs?
I'm looking for syntax like this:
Url.Combine("http://MyUrl.com/", "/Images/Image.jpg")
which would return:
"http://MyUrl.com/Images/Image.jpg"
Uri has a constructor that should do this for you: new Uri(Uri baseUri, string relativeUri)
Here's an example:
Uri baseUri = new Uri("http://www.contoso.com");
Uri myUri = new Uri(baseUri, "catalog/shownew.htm");
Note from editor: Beware, this method does not work as expected. It can cut part of baseUri in some cases. See comments and other answers.
This may be a suitably simple solution:
public static string Combine(string uri1, string uri2)
{
uri1 = uri1.TrimEnd('/');
uri2 = uri2.TrimStart('/');
return string.Format("{0}/{1}", uri1, uri2);
}
There's already some great answers here. Based on mdsharpe suggestion, here's an extension method that can easily be used when you want to deal with Uri instances:
using System;
using System.Linq;
public static class UriExtensions
{
public static Uri Append(this Uri uri, params string[] paths)
{
return new Uri(paths.Aggregate(uri.AbsoluteUri, (current, path) => string.Format("{0}/{1}", current.TrimEnd('/'), path.TrimStart('/'))));
}
}
And usage example:
var url = new Uri("http://example.com/subpath/").Append("/part1/", "part2").AbsoluteUri;
This will produce http://example.com/subpath/part1/part2
If you want to work with strings instead of Uris then the following will also produce the same result, simply adapt it to suit your needs:
public string JoinUriSegments(string uri, params string[] segments)
{
if (string.IsNullOrWhiteSpace(uri))
return null;
if (segments == null || segments.Length == 0)
return uri;
return segments.Aggregate(uri, (current, segment) => $"{current.TrimEnd('/')}/{segment.TrimStart('/')}");
}
var uri = JoinUriSegements("http://example.com/subpath/", "/part1/", "part2");
You use Uri.TryCreate( ... ) :
Uri result = null;
if (Uri.TryCreate(new Uri("http://msdn.microsoft.com/en-us/library/"), "/en-us/library/system.uri.trycreate.aspx", out result))
{
Console.WriteLine(result);
}
Will return:
http://msdn.microsoft.com/en-us/library/system.uri.trycreate.aspx
There is a Todd Menier's comment above that Flurl includes a Url.Combine.
More details:
Url.Combine is basically a Path.Combine for URLs, ensuring one
and only one separator character between parts:
var url = Url.Combine(
"http://MyUrl.com/",
"/too/", "/many/", "/slashes/",
"too", "few?",
"x=1", "y=2"
// result: "http://www.MyUrl.com/too/many/slashes/too/few?x=1&y=2"
Get Flurl.Http on NuGet:
PM> Install-Package Flurl.Http
Or get the stand-alone URL builder without the HTTP features:
PM> Install-Package Flurl
Ryan Cook's answer is close to what I'm after and may be more appropriate for other developers. However, it adds http:// to the beginning of the string and in general it does a bit more formatting than I'm after.
Also, for my use cases, resolving relative paths is not important.
mdsharp's answer also contains the seed of a good idea, although that actual implementation needed a few more details to be complete. This is an attempt to flesh it out (and I'm using this in production):
C#
public string UrlCombine(string url1, string url2)
{
if (url1.Length == 0) {
return url2;
}
if (url2.Length == 0) {
return url1;
}
url1 = url1.TrimEnd('/', '\\');
url2 = url2.TrimStart('/', '\\');
return string.Format("{0}/{1}", url1, url2);
}
VB.NET
Public Function UrlCombine(ByVal url1 As String, ByVal url2 As String) As String
If url1.Length = 0 Then
Return url2
End If
If url2.Length = 0 Then
Return url1
End If
url1 = url1.TrimEnd("/"c, "\"c)
url2 = url2.TrimStart("/"c, "\"c)
Return String.Format("{0}/{1}", url1, url2)
End Function
This code passes the following test, which happens to be in VB:
<TestMethod()> Public Sub UrlCombineTest()
Dim target As StringHelpers = New StringHelpers()
Assert.IsTrue(target.UrlCombine("test1", "test2") = "test1/test2")
Assert.IsTrue(target.UrlCombine("test1/", "test2") = "test1/test2")
Assert.IsTrue(target.UrlCombine("test1", "/test2") = "test1/test2")
Assert.IsTrue(target.UrlCombine("test1/", "/test2") = "test1/test2")
Assert.IsTrue(target.UrlCombine("/test1/", "/test2/") = "/test1/test2/")
Assert.IsTrue(target.UrlCombine("", "/test2/") = "/test2/")
Assert.IsTrue(target.UrlCombine("/test1/", "") = "/test1/")
End Sub
Path.Combine does not work for me because there can be characters like "|" in QueryString arguments and therefore the URL, which will result in an ArgumentException.
I first tried the new Uri(Uri baseUri, string relativeUri) approach, which failed for me because of URIs like http://www.mediawiki.org/wiki/Special:SpecialPages:
new Uri(new Uri("http://www.mediawiki.org/wiki/"), "Special:SpecialPages")
will result in Special:SpecialPages, because of the colon after Special that denotes a scheme.
So I finally had to take mdsharpe/Brian MacKays route and developed it a bit further to work with multiple URI parts:
public static string CombineUri(params string[] uriParts)
{
string uri = string.Empty;
if (uriParts != null && uriParts.Length > 0)
{
char[] trims = new char[] { '\\', '/' };
uri = (uriParts[0] ?? string.Empty).TrimEnd(trims);
for (int i = 1; i < uriParts.Length; i++)
{
uri = string.Format("{0}/{1}", uri.TrimEnd(trims), (uriParts[i] ?? string.Empty).TrimStart(trims));
}
}
return uri;
}
Usage: CombineUri("http://www.mediawiki.org/", "wiki", "Special:SpecialPages")
Based on the sample URL you provided, I'm going to assume you want to combine URLs that are relative to your site.
Based on this assumption I'll propose this solution as the most appropriate response to your question which was: "Path.Combine is handy, is there a similar function in the framework for URLs?"
Since there the is a similar function in the framework for URLs I propose the correct is: "VirtualPathUtility.Combine" method.
Here's the MSDN reference link: VirtualPathUtility.Combine Method
There is one caveat: I believe this only works for URLs relative to your site (that is, you cannot use it to generate links to another web site. For example, var url = VirtualPathUtility.Combine("www.google.com", "accounts/widgets");).
Path.Combine("Http://MyUrl.com/", "/Images/Image.jpg").Replace("\\", "/")
I just put together a small extension method:
public static string UriCombine (this string val, string append)
{
if (String.IsNullOrEmpty(val)) return append;
if (String.IsNullOrEmpty(append)) return val;
return val.TrimEnd('/') + "/" + append.TrimStart('/');
}
It can be used like this:
"www.example.com/".UriCombine("/images").UriCombine("first.jpeg");
Witty example, Ryan, to end with a link to the function. Well done.
One recommendation Brian: if you wrap this code in a function, you may want to use a UriBuilder to wrap the base URL prior to the TryCreate call.
Otherwise, the base URL MUST include the scheme (where the UriBuilder will assume http://). Just a thought:
public string CombineUrl(string baseUrl, string relativeUrl) {
UriBuilder baseUri = new UriBuilder(baseUrl);
Uri newUri;
if (Uri.TryCreate(baseUri.Uri, relativeUrl, out newUri))
return newUri.ToString();
else
throw new ArgumentException("Unable to combine specified url values");
}
An easy way to combine them and ensure it's always correct is:
string.Format("{0}/{1}", Url1.Trim('/'), Url2);
I think this should give you more flexibility as you can deal with as many path segments as you want:
public static string UrlCombine(this string baseUrl, params string[] segments)
=> string.Join("/", new[] { baseUrl.TrimEnd('/') }.Concat(segments.Select(s => s.Trim('/'))));
Combining multiple parts of a URL could be a little bit tricky. You can use the two-parameter constructor Uri(baseUri, relativeUri), or you can use the Uri.TryCreate() utility function.
In either case, you might end up returning an incorrect result because these methods keep on truncating the relative parts off of the first parameter baseUri, i.e. from something like http://google.com/some/thing to http://google.com.
To be able to combine multiple parts into a final URL, you can copy the two functions below:
public static string Combine(params string[] parts)
{
if (parts == null || parts.Length == 0) return string.Empty;
var urlBuilder = new StringBuilder();
foreach (var part in parts)
{
var tempUrl = tryCreateRelativeOrAbsolute(part);
urlBuilder.Append(tempUrl);
}
return VirtualPathUtility.RemoveTrailingSlash(urlBuilder.ToString());
}
private static string tryCreateRelativeOrAbsolute(string s)
{
System.Uri uri;
System.Uri.TryCreate(s, UriKind.RelativeOrAbsolute, out uri);
string tempUrl = VirtualPathUtility.AppendTrailingSlash(uri.ToString());
return tempUrl;
}
Full code with unit tests to demonstrate usage can be found at https://uricombine.codeplex.com/SourceControl/latest#UriCombine/Uri.cs
I have unit tests to cover the three most common cases:
As found in other answers, either new Uri() or TryCreate() can do the tick.
However, the base Uri has to end with / and the relative has to NOT begin with /; otherwise it will remove the trailing part of the base Url
I think this is best done as an extension method, i.e.
public static Uri Append(this Uri uri, string relativePath)
{
var baseUri = uri.AbsoluteUri.EndsWith('/') ? uri : new Uri(uri.AbsoluteUri + '/');
var relative = relativePath.StartsWith('/') ? relativePath.Substring(1) : relativePath;
return new Uri(baseUri, relative);
}
and to use it:
var baseUri = new Uri("http://test.com/test/");
var combinedUri = baseUri.Append("/Do/Something");
In terms of performance, this consumes more resources than it needs, because of the Uri class which does a lot of parsing and validation; a very rough profiling (Debug) did a million operations in about 2 seconds.
This will work for most scenarios, however to be more efficient, it's better to manipulate everything as strings, this takes 125 milliseconds for 1 million operations.
I.e.
public static string Append(this Uri uri, string relativePath)
{
//avoid the use of Uri as it's not needed, and adds a bit of overhead.
var absoluteUri = uri.AbsoluteUri; //a calculated property, better cache it
var baseUri = absoluteUri.EndsWith('/') ? absoluteUri : absoluteUri + '/';
var relative = relativePath.StartsWith('/') ? relativePath.Substring(1) : relativePath;
return baseUri + relative;
}
And if you still want to return a URI, it takes around 600 milliseconds for 1 million operations.
public static Uri AppendUri(this Uri uri, string relativePath)
{
//avoid the use of Uri as it's not needed, and adds a bit of overhead.
var absoluteUri = uri.AbsoluteUri; //a calculated property, better cache it
var baseUri = absoluteUri.EndsWith('/') ? absoluteUri : absoluteUri + '/';
var relative = relativePath.StartsWith('/') ? relativePath.Substring(1) : relativePath;
return new Uri(baseUri + relative);
}
I hope this helps.
I found UriBuilder worked really well for this sort of thing:
UriBuilder urlb = new UriBuilder("http", _serverAddress, _webPort, _filePath);
Uri url = urlb.Uri;
return url.AbsoluteUri;
See UriBuilder Class - MSDN for more constructors and documentation.
If you don't want to have a dependency like Flurl, you can use its source code:
/// <summary>
/// Basically a Path.Combine for URLs. Ensures exactly one '/' separates each segment,
/// and exactly on '&' separates each query parameter.
/// URL-encodes illegal characters but not reserved characters.
/// </summary>
/// <param name="parts">URL parts to combine.</param>
public static string Combine(params string[] parts) {
if (parts == null)
throw new ArgumentNullException(nameof(parts));
string result = "";
bool inQuery = false, inFragment = false;
string CombineEnsureSingleSeparator(string a, string b, char separator) {
if (string.IsNullOrEmpty(a)) return b;
if (string.IsNullOrEmpty(b)) return a;
return a.TrimEnd(separator) + separator + b.TrimStart(separator);
}
foreach (var part in parts) {
if (string.IsNullOrEmpty(part))
continue;
if (result.EndsWith("?") || part.StartsWith("?"))
result = CombineEnsureSingleSeparator(result, part, '?');
else if (result.EndsWith("#") || part.StartsWith("#"))
result = CombineEnsureSingleSeparator(result, part, '#');
else if (inFragment)
result += part;
else if (inQuery)
result = CombineEnsureSingleSeparator(result, part, '&');
else
result = CombineEnsureSingleSeparator(result, part, '/');
if (part.Contains("#")) {
inQuery = false;
inFragment = true;
}
else if (!inFragment && part.Contains("?")) {
inQuery = true;
}
}
return EncodeIllegalCharacters(result);
}
/// <summary>
/// URL-encodes characters in a string that are neither reserved nor unreserved. Avoids encoding reserved characters such as '/' and '?'. Avoids encoding '%' if it begins a %-hex-hex sequence (i.e. avoids double-encoding).
/// </summary>
/// <param name="s">The string to encode.</param>
/// <param name="encodeSpaceAsPlus">If true, spaces will be encoded as + signs. Otherwise, they'll be encoded as %20.</param>
/// <returns>The encoded URL.</returns>
public static string EncodeIllegalCharacters(string s, bool encodeSpaceAsPlus = false) {
if (string.IsNullOrEmpty(s))
return s;
if (encodeSpaceAsPlus)
s = s.Replace(" ", "+");
// Uri.EscapeUriString mostly does what we want - encodes illegal characters only - but it has a quirk
// in that % isn't illegal if it's the start of a %-encoded sequence https://stackoverflow.com/a/47636037/62600
// no % characters, so avoid the regex overhead
if (!s.Contains("%"))
return Uri.EscapeUriString(s);
// pick out all %-hex-hex matches and avoid double-encoding
return Regex.Replace(s, "(.*?)((%[0-9A-Fa-f]{2})|$)", c => {
var a = c.Groups[1].Value; // group 1 is a sequence with no %-encoding - encode illegal characters
var b = c.Groups[2].Value; // group 2 is a valid 3-character %-encoded sequence - leave it alone!
return Uri.EscapeUriString(a) + b;
});
}
I find the following useful and has the following features :
Throws on null or white space
Takes multiple params parameter for multiple Url segments
throws on null or empty
Class
public static class UrlPath
{
private static string InternalCombine(string source, string dest)
{
if (string.IsNullOrWhiteSpace(source))
throw new ArgumentException("Cannot be null or white space", nameof(source));
if (string.IsNullOrWhiteSpace(dest))
throw new ArgumentException("Cannot be null or white space", nameof(dest));
return $"{source.TrimEnd('/', '\\')}/{dest.TrimStart('/', '\\')}";
}
public static string Combine(string source, params string[] args)
=> args.Aggregate(source, InternalCombine);
}
Tests
UrlPath.Combine("test1", "test2");
UrlPath.Combine("test1//", "test2");
UrlPath.Combine("test1", "/test2");
// Result = test1/test2
UrlPath.Combine(#"test1\/\/\/", #"\/\/\\\\\//test2", #"\/\/\\\\\//test3\") ;
// Result = test1/test2/test3
UrlPath.Combine("/test1/", "/test2/", null);
UrlPath.Combine("", "/test2/");
UrlPath.Combine("/test1/", null);
// Throws an ArgumentException
So I have another approach, similar to everyone who used UriBuilder.
I did not want to split my BaseUrl (which can contain a part of the path - e.g. http://mybaseurl.com/dev/) as javajavajavajavajava did.
The following snippet shows the code + Tests.
Beware: This solution lowercases the host and appends a port. If this is not desired, one can write a string representation by e.g. leveraging the Uri Property of UriBuilder.
public class Tests
{
public static string CombineUrl (string baseUrl, string path)
{
var uriBuilder = new UriBuilder (baseUrl);
uriBuilder.Path = Path.Combine (uriBuilder.Path, path);
return uriBuilder.ToString();
}
[TestCase("http://MyUrl.com/", "/Images/Image.jpg", "http://myurl.com:80/Images/Image.jpg")]
[TestCase("http://MyUrl.com/basePath", "/Images/Image.jpg", "http://myurl.com:80/Images/Image.jpg")]
[TestCase("http://MyUrl.com/basePath", "Images/Image.jpg", "http://myurl.com:80/basePath/Images/Image.jpg")]
[TestCase("http://MyUrl.com/basePath/", "Images/Image.jpg", "http://myurl.com:80/basePath/Images/Image.jpg")]
public void Test1 (string baseUrl, string path, string expected)
{
var result = CombineUrl (baseUrl, path);
Assert.That (result, Is.EqualTo (expected));
}
}
Tested with .NET Core 2.1 on Windows 10.
Why does this work?
Even though Path.Combine will return Backslashes (on Windows atleast), the UriBuilder handles this case in the Setter of Path.
Taken from https://github.com/dotnet/corefx/blob/master/src/System.Private.Uri/src/System/UriBuilder.cs (mind the call to string.Replace)
[AllowNull]
public string Path
{
get
{
return _path;
}
set
{
if ((value == null) || (value.Length == 0))
{
value = "/";
}
_path = Uri.InternalEscapeString(value.Replace('\\', '/'));
_changed = true;
}
}
Is this the best approach?
Certainly this solution is pretty self describing (at least in my opinion). But you are relying on undocumented (at least I found nothing with a quick google search) "feature" from the .NET API. This may change with a future release so please cover the Method with Tests.
There are tests in https://github.com/dotnet/corefx/blob/master/src/System.Private.Uri/tests/FunctionalTests/UriBuilderTests.cs (Path_Get_Set) which check, if the \ is correctly transformed.
Side Note: One could also work with the UriBuilder.Uri property directly, if the uri will be used for a System.Uri ctor.
For anyone who is looking for a one-liner and simply wants to join parts of a path without creating a new method or referencing a new library or construct a URI value and convert that to a string, then...
string urlToImage = String.Join("/", "websiteUrl", "folder1", "folder2", "folder3", "item");
It's pretty basic, but I don't see what more you need. If you're afraid of doubled '/' then you can simply do a .Replace("//", "/") afterward. If you're afraid of replacing the doubled '//' in 'https://', then instead do one join, replace the doubled '/', then join the website url (however I'm pretty sure most browsers will automatically convert anything with 'https:' in the front of it to read in the correct format). This would look like:
string urlToImage = String.Join("/","websiteUrl", String.Join("/", "folder1", "folder2", "folder3", "item").Replace("//","/"));
There are plenty of answers here that will handle all the above, but in my case, I only needed it once in one location and won't need to heavily rely on it. Also, it's really easy to see what is going on here.
See: https://learn.microsoft.com/en-us/dotnet/api/system.string.join?view=netframework-4.8
My generic solution:
public static string Combine(params string[] uriParts)
{
string uri = string.Empty;
if (uriParts != null && uriParts.Any())
{
char[] trims = new char[] { '\\', '/' };
uri = (uriParts[0] ?? string.Empty).TrimEnd(trims);
for (int i = 1; i < uriParts.Length; i++)
{
uri = string.Format("{0}/{1}", uri.TrimEnd(trims), (uriParts[i] ?? string.Empty).TrimStart(trims));
}
}
return uri;
}
Here's Microsoft's (OfficeDev PnP) method UrlUtility.Combine:
const char PATH_DELIMITER = '/';
/// <summary>
/// Combines a path and a relative path.
/// </summary>
/// <param name="path"></param>
/// <param name="relative"></param>
/// <returns></returns>
public static string Combine(string path, string relative)
{
if(relative == null)
relative = String.Empty;
if(path == null)
path = String.Empty;
if(relative.Length == 0 && path.Length == 0)
return String.Empty;
if(relative.Length == 0)
return path;
if(path.Length == 0)
return relative;
path = path.Replace('\\', PATH_DELIMITER);
relative = relative.Replace('\\', PATH_DELIMITER);
return path.TrimEnd(PATH_DELIMITER) + PATH_DELIMITER + relative.TrimStart(PATH_DELIMITER);
}
Source: GitHub
// Read all above samples and as result created my self:
static string UrlCombine(params string[] items)
{
if (items?.Any() != true)
{
return string.Empty;
}
return string.Join("/", items.Where(u => !string.IsNullOrWhiteSpace(u)).Select(u => u.Trim('/', '\\')));
}
// usage
UrlCombine("https://microsoft.com","en-us")
I have an allocation-free string creation version that I've been using with great success.
NOTE:
For the first string: it trims the separator using TrimEnd(separator) - so only from the end of the string.
For the remainders: it trims the separator using Trim(separator) - so both start and end of paths
It does not append a trailing slash/separator. Though a simple modification can be done to add this ability.
Hope you find this useful!
/// <summary>
/// This implements an allocation-free string creation to construct the path.
/// This uses 3.5x LESS memory and is 2x faster than some alternate methods (StringBuilder, interpolation, string.Concat, etc.).
/// </summary>
/// <param name="str"></param>
/// <param name="paths"></param>
/// <returns></returns>
public static string ConcatPath(this string str, params string[] paths)
{
const char separator = '/';
if (str == null) throw new ArgumentNullException(nameof(str));
var list = new List<ReadOnlyMemory<char>>();
var first = str.AsMemory().TrimEnd(separator);
// get length for intial string after it's trimmed
var length = first.Length;
list.Add(first);
foreach (var path in paths)
{
var newPath = path.AsMemory().Trim(separator);
length += newPath.Length + 1;
list.Add(newPath);
}
var newString = string.Create(length, list, (chars, state) =>
{
// NOTE: We don't access the 'list' variable in this delegate since
// it would cause a closure and allocation. Instead we access the state parameter.
// track our position within the string data we are populating
var position = 0;
// copy the first string data to index 0 of the Span<char>
state[0].Span.CopyTo(chars);
// update the position to the new length
position += state[0].Span.Length;
// start at index 1 when slicing
for (var i = 1; i < state.Count; i++)
{
// add a separator in the current position and increment position by 1
chars[position++] = separator;
// copy each path string to a slice at current position
state[i].Span.CopyTo(chars.Slice(position));
// update the position to the new length
position += state[i].Length;
}
});
return newString;
}
with Benchmark DotNet output:
| Method | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen 0 | Allocated |
|---------------------- |---------:|---------:|---------:|---------:|------:|--------:|-------:|----------:|
| ConcatPathWithBuilder | 404.1 ns | 27.35 ns | 78.48 ns | 380.3 ns | 1.00 | 0.00 | 0.3347 | 1,400 B |
| ConcatPath | 187.2 ns | 5.93 ns | 16.44 ns | 183.2 ns | 0.48 | 0.10 | 0.0956 | 400 B |
A simple one liner:
public static string Combine(this string uri1, string uri2) => $"{uri1.TrimEnd('/')}/{uri2.TrimStart('/')}";
Inspired by #Matt Sharpe's answer.
Here is my approach and I will use it for myself too:
public static string UrlCombine(string part1, string part2)
{
string newPart1 = string.Empty;
string newPart2 = string.Empty;
string seperator = "/";
// If either part1 or part 2 is empty,
// we don't need to combine with seperator
if (string.IsNullOrEmpty(part1) || string.IsNullOrEmpty(part2))
{
seperator = string.Empty;
}
// If part1 is not empty,
// remove '/' at last
if (!string.IsNullOrEmpty(part1))
{
newPart1 = part1.TrimEnd('/');
}
// If part2 is not empty,
// remove '/' at first
if (!string.IsNullOrEmpty(part2))
{
newPart2 = part2.TrimStart('/');
}
// Now finally combine
return string.Format("{0}{1}{2}", newPart1, seperator, newPart2);
}
I created this function that will make your life easier:
/// <summary>
/// The ultimate Path combiner of all time
/// </summary>
/// <param name="IsURL">
/// true - if the paths are Internet URLs, false - if the paths are local URLs, this is very important as this will be used to decide which separator will be used.
/// </param>
/// <param name="IsRelative">Just adds the separator at the beginning</param>
/// <param name="IsFixInternal">Fix the paths from within (by removing duplicate separators and correcting the separators)</param>
/// <param name="parts">The paths to combine</param>
/// <returns>the combined path</returns>
public static string PathCombine(bool IsURL , bool IsRelative , bool IsFixInternal , params string[] parts)
{
if (parts == null || parts.Length == 0) return string.Empty;
char separator = IsURL ? '/' : '\\';
if (parts.Length == 1 && IsFixInternal)
{
string validsingle;
if (IsURL)
{
validsingle = parts[0].Replace('\\' , '/');
}
else
{
validsingle = parts[0].Replace('/' , '\\');
}
validsingle = validsingle.Trim(separator);
return (IsRelative ? separator.ToString() : string.Empty) + validsingle;
}
string final = parts
.Aggregate
(
(string first , string second) =>
{
string validfirst;
string validsecond;
if (IsURL)
{
validfirst = first.Replace('\\' , '/');
validsecond = second.Replace('\\' , '/');
}
else
{
validfirst = first.Replace('/' , '\\');
validsecond = second.Replace('/' , '\\');
}
var prefix = string.Empty;
if (IsFixInternal)
{
if (IsURL)
{
if (validfirst.Contains("://"))
{
var tofix = validfirst.Substring(validfirst.IndexOf("://") + 3);
prefix = validfirst.Replace(tofix , string.Empty).TrimStart(separator);
var tofixlist = tofix.Split(new[] { separator } , StringSplitOptions.RemoveEmptyEntries);
validfirst = separator + string.Join(separator.ToString() , tofixlist);
}
else
{
var firstlist = validfirst.Split(new[] { separator } , StringSplitOptions.RemoveEmptyEntries);
validfirst = string.Join(separator.ToString() , firstlist);
}
var secondlist = validsecond.Split(new[] { separator } , StringSplitOptions.RemoveEmptyEntries);
validsecond = string.Join(separator.ToString() , secondlist);
}
else
{
var firstlist = validfirst.Split(new[] { separator } , StringSplitOptions.RemoveEmptyEntries);
var secondlist = validsecond.Split(new[] { separator } , StringSplitOptions.RemoveEmptyEntries);
validfirst = string.Join(separator.ToString() , firstlist);
validsecond = string.Join(separator.ToString() , secondlist);
}
}
return prefix + validfirst.Trim(separator) + separator + validsecond.Trim(separator);
}
);
return (IsRelative ? separator.ToString() : string.Empty) + final;
}
It works for URLs as well as normal paths.
Usage:
// Fixes internal paths
Console.WriteLine(PathCombine(true , true , true , #"\/\/folder 1\/\/\/\\/\folder2\///folder3\\/" , #"/\somefile.ext\/\//\"));
// Result: /folder 1/folder2/folder3/somefile.ext
// Doesn't fix internal paths
Console.WriteLine(PathCombine(true , true , false , #"\/\/folder 1\/\/\/\\/\folder2\///folder3\\/" , #"/\somefile.ext\/\//\"));
//result : /folder 1//////////folder2////folder3/somefile.ext
// Don't worry about URL prefixes when fixing internal paths
Console.WriteLine(PathCombine(true , false , true , #"/\/\/https:/\/\/\lul.com\/\/\/\\/\folder2\///folder3\\/" , #"/\somefile.ext\/\//\"));
// Result: https://lul.com/folder2/folder3/somefile.ext
Console.WriteLine(PathCombine(false , true , true , #"../../../\\..\...\./../somepath" , #"anotherpath"));
// Result: \..\..\..\..\...\.\..\somepath\anotherpath
I found that the Uri constructor flips '\' into '/'. So you can also use Path.Combine, with the Uri constructor.
Uri baseUri = new Uri("http://MyUrl.com");
string path = Path.Combine("Images", "Image.jpg");
Uri myUri = new Uri(baseUri, path);
Why not just use the following.
System.IO.Path.Combine(rootUrl, subPath).Replace(#"\", "/")
For what it's worth, here a couple of extension methods. The first one will combine paths and the second one adds parameters to the URL.
public static string CombineUrl(this string root, string path, params string[] paths)
{
if (string.IsNullOrWhiteSpace(path))
{
return root;
}
Uri baseUri = new Uri(root);
Uri combinedPaths = new Uri(baseUri, path);
foreach (string extendedPath in paths)
{
combinedPaths = new Uri(combinedPaths, extendedPath);
}
return combinedPaths.AbsoluteUri;
}
public static string AddUrlParams(this string url, Dictionary<string, string> parameters)
{
if (parameters == null || !parameters.Keys.Any())
{
return url;
}
var tempUrl = new StringBuilder($"{url}?");
int count = 0;
foreach (KeyValuePair<string, string> parameter in parameters)
{
if (count > 0)
{
tempUrl.Append("&");
}
tempUrl.Append($"{WebUtility.UrlEncode(parameter.Key)}={WebUtility.UrlEncode(parameter.Value)}");
count++;
}
return tempUrl.ToString();
}

C# Sanitize File Name

I recently have been moving a bunch of MP3s from various locations into a repository. I had been constructing the new file names using the ID3 tags (thanks, TagLib-Sharp!), and I noticed that I was getting a System.NotSupportedException:
"The given path's format is not supported."
This was generated by either File.Copy() or Directory.CreateDirectory().
It didn't take long to realize that my file names needed to be sanitized. So I did the obvious thing:
public static string SanitizePath_(string path, char replaceChar)
{
string dir = Path.GetDirectoryName(path);
foreach (char c in Path.GetInvalidPathChars())
dir = dir.Replace(c, replaceChar);
string name = Path.GetFileName(path);
foreach (char c in Path.GetInvalidFileNameChars())
name = name.Replace(c, replaceChar);
return dir + name;
}
To my surprise, I continued to get exceptions. It turned out that ':' is not in the set of Path.GetInvalidPathChars(), because it is valid in a path root. I suppose that makes sense - but this has to be a pretty common problem. Does anyone have some short code that sanitizes a path? The most thorough I've come up with this, but it feels like it is probably overkill.
// replaces invalid characters with replaceChar
public static string SanitizePath(string path, char replaceChar)
{
// construct a list of characters that can't show up in filenames.
// need to do this because ":" is not in InvalidPathChars
if (_BadChars == null)
{
_BadChars = new List<char>(Path.GetInvalidFileNameChars());
_BadChars.AddRange(Path.GetInvalidPathChars());
_BadChars = Utility.GetUnique<char>(_BadChars);
}
// remove root
string root = Path.GetPathRoot(path);
path = path.Remove(0, root.Length);
// split on the directory separator character. Need to do this
// because the separator is not valid in a filename.
List<string> parts = new List<string>(path.Split(new char[]{Path.DirectorySeparatorChar}));
// check each part to make sure it is valid.
for (int i = 0; i < parts.Count; i++)
{
string part = parts[i];
foreach (char c in _BadChars)
{
part = part.Replace(c, replaceChar);
}
parts[i] = part;
}
return root + Utility.Join(parts, Path.DirectorySeparatorChar.ToString());
}
Any improvements to make this function faster and less baroque would be much appreciated.
To clean up a file name you could do this
private static string MakeValidFileName( string name )
{
string invalidChars = System.Text.RegularExpressions.Regex.Escape( new string( System.IO.Path.GetInvalidFileNameChars() ) );
string invalidRegStr = string.Format( #"([{0}]*\.+$)|([{0}]+)", invalidChars );
return System.Text.RegularExpressions.Regex.Replace( name, invalidRegStr, "_" );
}
A shorter solution:
var invalids = System.IO.Path.GetInvalidFileNameChars();
var newName = String.Join("_", origFileName.Split(invalids, StringSplitOptions.RemoveEmptyEntries) ).TrimEnd('.');
Based on Andre's excellent answer but taking into account Spud's comment on reserved words, I made this version:
/// <summary>
/// Strip illegal chars and reserved words from a candidate filename (should not include the directory path)
/// </summary>
/// <remarks>
/// http://stackoverflow.com/questions/309485/c-sharp-sanitize-file-name
/// </remarks>
public static string CoerceValidFileName(string filename)
{
var invalidChars = Regex.Escape(new string(Path.GetInvalidFileNameChars()));
var invalidReStr = string.Format(#"[{0}]+", invalidChars);
var reservedWords = new []
{
"CON", "PRN", "AUX", "CLOCK$", "NUL", "COM0", "COM1", "COM2", "COM3", "COM4",
"COM5", "COM6", "COM7", "COM8", "COM9", "LPT0", "LPT1", "LPT2", "LPT3", "LPT4",
"LPT5", "LPT6", "LPT7", "LPT8", "LPT9"
};
var sanitisedNamePart = Regex.Replace(filename, invalidReStr, "_");
foreach (var reservedWord in reservedWords)
{
var reservedWordPattern = string.Format("^{0}\\.", reservedWord);
sanitisedNamePart = Regex.Replace(sanitisedNamePart, reservedWordPattern, "_reservedWord_.", RegexOptions.IgnoreCase);
}
return sanitisedNamePart;
}
And these are my unit tests
[Test]
public void CoerceValidFileName_SimpleValid()
{
var filename = #"thisIsValid.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual(filename, result);
}
[Test]
public void CoerceValidFileName_SimpleInvalid()
{
var filename = #"thisIsNotValid\3\\_3.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid_3__3.txt", result);
}
[Test]
public void CoerceValidFileName_InvalidExtension()
{
var filename = #"thisIsNotValid.t\xt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("thisIsNotValid.t_xt", result);
}
[Test]
public void CoerceValidFileName_KeywordInvalid()
{
var filename = "aUx.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("_reservedWord_.txt", result);
}
[Test]
public void CoerceValidFileName_KeywordValid()
{
var filename = "auxillary.txt";
var result = PathHelper.CoerceValidFileName(filename);
Assert.AreEqual("auxillary.txt", result);
}
string clean = String.Concat(dirty.Split(Path.GetInvalidFileNameChars()));
there are a lot of working solutions here. just for the sake of completeness, here's an approach that doesn't use regex, but uses LINQ:
var invalids = Path.GetInvalidFileNameChars();
filename = invalids.Aggregate(filename, (current, c) => current.Replace(c, '_'));
Also, it's a very short solution ;)
I'm using the System.IO.Path.GetInvalidFileNameChars() method to check invalid characters and I've got no problems.
I'm using the following code:
foreach( char invalidchar in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(invalidchar, '_');
}
I wanted to retain the characters in some way, not just simply replace the character with an underscore.
One way I thought was to replace the characters with similar looking characters which are (in my situation), unlikely to be used as regular characters. So I took the list of invalid characters and found look-a-likes.
The following are functions to encode and decode with the look-a-likes.
This code does not include a complete listing for all System.IO.Path.GetInvalidFileNameChars() characters. So it is up to you to extend or utilize the underscore replacement for any remaining characters.
private static Dictionary<string, string> EncodeMapping()
{
//-- Following characters are invalid for windows file and folder names.
//-- \/:*?"<>|
Dictionary<string, string> dic = new Dictionary<string, string>();
dic.Add(#"\", "Ì"); // U+OOCC
dic.Add("/", "Í"); // U+OOCD
dic.Add(":", "¦"); // U+00A6
dic.Add("*", "¤"); // U+00A4
dic.Add("?", "¿"); // U+00BF
dic.Add(#"""", "ˮ"); // U+02EE
dic.Add("<", "«"); // U+00AB
dic.Add(">", "»"); // U+00BB
dic.Add("|", "│"); // U+2502
return dic;
}
public static string Escape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Key, replace.Value);
}
//-- handle dot at the end
if (name.EndsWith(".")) name = name.CropRight(1) + "°";
return name;
}
public static string UnEscape(string name)
{
foreach (KeyValuePair<string, string> replace in EncodeMapping())
{
name = name.Replace(replace.Value, replace.Key);
}
//-- handle dot at the end
if (name.EndsWith("°")) name = name.CropRight(1) + ".";
return name;
}
You can select your own look-a-likes. I used the Character Map app in windows to select mine %windir%\system32\charmap.exe
As I make adjustments through discovery, I will update this code.
I think the problem is that you first call Path.GetDirectoryName on the bad string. If this has non-filename characters in it, .Net can't tell which parts of the string are directories and throws. You have to do string comparisons.
Assuming it's only the filename that is bad, not the entire path, try this:
public static string SanitizePath(string path, char replaceChar)
{
int filenamePos = path.LastIndexOf(Path.DirectorySeparatorChar) + 1;
var sb = new System.Text.StringBuilder();
sb.Append(path.Substring(0, filenamePos));
for (int i = filenamePos; i < path.Length; i++)
{
char filenameChar = path[i];
foreach (char c in Path.GetInvalidFileNameChars())
if (filenameChar.Equals(c))
{
filenameChar = replaceChar;
break;
}
sb.Append(filenameChar);
}
return sb.ToString();
}
I have had success with this in the past.
Nice, short and static :-)
public static string returnSafeString(string s)
{
foreach (char character in Path.GetInvalidFileNameChars())
{
s = s.Replace(character.ToString(),string.Empty);
}
foreach (char character in Path.GetInvalidPathChars())
{
s = s.Replace(character.ToString(), string.Empty);
}
return (s);
}
Here's an efficient lazy loading extension method based on Andre's code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace LT
{
public static class Utility
{
static string invalidRegStr;
public static string MakeValidFileName(this string name)
{
if (invalidRegStr == null)
{
var invalidChars = System.Text.RegularExpressions.Regex.Escape(new string(System.IO.Path.GetInvalidFileNameChars()));
invalidRegStr = string.Format(#"([{0}]*\.+$)|([{0}]+)", invalidChars);
}
return System.Text.RegularExpressions.Regex.Replace(name, invalidRegStr, "_");
}
}
}
Your code would be cleaner if you appended the directory and filename together and sanitized that rather than sanitizing them independently. As for sanitizing away the :, just take the 2nd character in the string. If it is equal to "replacechar", replace it with a colon. Since this app is for your own use, such a solution should be perfectly sufficient.
using System;
using System.IO;
using System.Linq;
using System.Text;
public class Program
{
public static void Main()
{
try
{
var badString = "ABC\\DEF/GHI<JKL>MNO:PQR\"STU\tVWX|YZA*BCD?EFG";
Console.WriteLine(badString);
Console.WriteLine(SanitizeFileName(badString, '.'));
Console.WriteLine(SanitizeFileName(badString));
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
private static string SanitizeFileName(string fileName, char? replacement = null)
{
if (fileName == null) { return null; }
if (fileName.Length == 0) { return ""; }
var sb = new StringBuilder();
var badChars = Path.GetInvalidFileNameChars().ToList();
foreach (var #char in fileName)
{
if (badChars.Contains(#char))
{
if (replacement.HasValue)
{
sb.Append(replacement.Value);
}
continue;
}
sb.Append(#char);
}
return sb.ToString();
}
}
Based #fiat's and #Andre's approach, I'd like to share my solution too.
Main difference:
its an extension method
regex is compiled at first use to save some time with a lot executions
reserved words are preserved
public static class StringPathExtensions
{
private static Regex _invalidPathPartsRegex;
static StringPathExtensions()
{
var invalidReg = System.Text.RegularExpressions.Regex.Escape(new string(Path.GetInvalidFileNameChars()));
_invalidPathPartsRegex = new Regex($"(?<reserved>^(CON|PRN|AUX|CLOCK\\$|NUL|COM0|COM1|COM2|COM3|COM4|COM5|COM6|COM7|COM8|COM9|LPT0|LPT1|LPT2|LPT3|LPT4|LPT5|LPT6|LPT7|LPT8|LPT9))|(?<invalid>[{invalidReg}:]+|\\.$)", RegexOptions.Compiled);
}
public static string SanitizeFileName(this string path)
{
return _invalidPathPartsRegex.Replace(path, m =>
{
if (!string.IsNullOrWhiteSpace(m.Groups["reserved"].Value))
return string.Concat("_", m.Groups["reserved"].Value);
return "_";
});
}
}

Categories

Resources