I'm downloading files from the Internet inside of my application. Now I'm dealing with multiple file types so I need to able to detect what file type the file is before my application can continue. The problem that I ran into is that some of the URLs where the files are getting downloaded from contain extra parameters.
For example:
http://www.myfaketestsite.com/myaudio.mp3?id=20
Originally I was using String.EndsWith(). Obviously this doesn't work anymore. Any idea on how to detect the file type?
Wrap the URL in a Uri class. It will split it up into different segments that you can use, or you can use the helper methods on the Uri class itself:
var uri = new Uri("http://www.myfaketestsite.com/myaudio.mp3?id=20");
string path = uri.GetLeftPart(UriPartial.Path);
// path = "http://www.myfaketestsite.com/myaudio.mp3"
Your question is a duplicate of:
Truncating Query String & Returning Clean URL C# ASP.net
Get url without querystring
You could always split on the question mark to eliminate the parameters. e.g.
string s = "http://www.myfaketestsite.com/myaudio.mp3?id=20";
string withoutQueryString = s.Split('?')[0];
If no question mark exists, it won't matter, as you'll still be grabbing the value from the zero index. You can then do your logic on the withoutQueryString string.
Related
This is the simple C# code I wrote :
string file = #"D:\test(2021/02/10).docx";
var fileName = Path.GetFileNameWithoutExtension(file);
Console.WriteLine(fileName);
I thought I would get the string "test(2021/02/10)" , but I got this result "10)".
How can I solve such a problem?
I just wonder why would you want such behavior. On windows slashes are treated as separator between directory and subdirectory (or file).
So, basically you are not able to create such file name.
And since slashes are treated as described, it is very natural that method implementation just checks what's after last slash and extracts just filename.
If you are interested on how the method is implemented take a look at source code
I have a problem.
I save json from web, in json files on my computer, and the name of this json file, is the web adress of the json.
For that, I get the web json into string, and then I append it in a file, with File.AppendAllText(path, content)
After some time, i also need to read json from this file with File.ReadAllText(path)
My problem is sometimes, two json have a very similar name, for example :
*com/doc/BACr and
*com/doc/BAcr
Problem, the path given in the methods of the class File are note case sensitive, and I end writing twice in the same file, corrupting it.
I've found on the internet solutions for the same problem for the method File.Exists(path), but nothing to replace the methods I use to read or write.
Any of you know a setting, or even another method that would be case sensitive on the path ?
Thank you
Edit : I'm obviously working on windows :(
Edit bis : I can't change the filename, because in some others json, there is reference to web path, and when I play again my local jsons, if the filename is modified, it won't be found. It's the reason I need both write and read with case sensitive path.
You need something that makes your files unique and in the same time something that allows you to rebuild this uniqueness when you want to read back these files.
Suppose that your couple of files is named "BAcr" and "BACr". You can get the HashCode of these two strings and you will get two different values
string file1 = "BAcr";
int file1Hash = file1.GetHashCode(); //742971449
string file2 = "BACr";
int file1Hash = file2.GetHashCode(); //-681949991
Now if you concatenate this hashcode to your filename you will get two different files and you will be able to recalculate the same hashcode for the same input filename
string newFile1 = $"{file1}.{file1Hash}";
string newFile2 = $"{file2}.{file2Hash}";
you will save your data in these two recalculated filenames and when you need to reload them you use the same trick to get the filename used to save the data starting from the same input "BAcr" or "BACr".
But string.GetHashCode doesn't guarantee uniqueness in its results so, still using the same general idea Jeroen Mostert uses this method to get an unique code from the input value
string unique1 = string.Join("", file1.Select(c => char.IsUpper(c) ? "1" : "0"))
string newFileName1 = $"{file1}.{unique1}";
Windows paths are indeed case insensitive, so you cannot have these filenames.
One solution would be to change the filename if it already exists...
For example;
if (File.Exists(fileNameToSaveTo)){
// Note: Your example file names did not have an extension,
// but if they do, you will need to first extract that then add it back on
fileNameToSaveTo = fileNameToSaveTo + "1";
}
If using this solution, you would have to also update whatever identifier your program uses to read back from the file at a later date... as you have not posted any code I cannot guess as to what form this takes, but hopefully you get the idea?
Edit:
Upon re-reading your question... it appears you use AppendAllText... In this case, this should not 'corrupt' the file as you suggest, but should simply add the contents to the end of the file? Is this not what you observe?
Edit2:
After reading comments Iomed - you could use Convert.ToBase64String on the filename in your write before writing the file, the use Convert.FromBase64String on the filename in your read function before reading the file. This will allow the filename to be different based on the capitalization.
Another alternative would be to parse the JSON (the new one AND the existing file) and add the objects to an array, then write that to the file instead, avoiding your 'corruption' issue?
given paths: PathA, patha
for two files, use base64 trick:
string PathToFile(string url) => System.Convert.ToBase64String(Encoding.UTF8.GetBytes(url));
so:
Console.WriteLine(PathToFile("pathA")); //cGF0aEE=
Console.WriteLine(PathToFile("patha")); //cGF0aGE=
This question already has answers here:
How to check whether a string is a valid HTTP URL?
(11 answers)
Closed 8 years ago.
I am trying to filter out invalid url from valid ones using .NET.
I am using Uri.TryCreate() method for this.
It has the following syntax
public static bool TryCreate(Uri baseUri,string relativeUri,out Uri result)
Now I am doing this....
Uri uri = null;
var domainList = new List<string>();
domainList.Add("asas");
domainList.Add("www.stackoverflow.com");
domainList.Add("www.codera.org");
domainList.Add("www.joker.testtest");
domainList.Add("about.me");
domainList.Add("www.ma.tt");
var correctList = new List<string>();
foreach (var item in domainList)
{
if(Uri.TryCreate(item, UriKind.RelativeOrAbsolute, out uri))
{
correctList.Add(item);
}
}
I am trying the above code I expect it to remove asas and www.joker.testtest from the list, but it doesnt.
Can some one help me out on this.
Update :
just tried out with Uri.IsWellFormedUriString this too did'nt help.
More Update
List of Valid uri
http://www.ggogle.com
www.abc.com
www.aa.org
www.aas.co
www.hhh.net
www.ma.tt
List of invalid uri
asas
as##SAd
this.not.valid
www.asa.toptoptop
You seem to be confused about what exactly URL (or URI, the difference is not significant here) is. For example, http://stackoverflow.com is a valid absolute URL. On the other hand, stackoverflow.com is technically a valid relative URL, but it would refer to the file named stackoverflow.com in the current directory, not the website with that name. But stackoverflow.com is a registered domain name.
If you want to check whether a domain name is valid, you need to define what exactly do you mean by “valid”:
Is it a valid domain name? Check whether the string consists of parts separated by dots, each part can contain letters, numbers and a hyphen (-). For example, asas and this.not.valid are both valid domain names.
Could it be an Internet domain name? Domain names on the Internet (as opposed to intranet) are specific in that they always have a TLD (top-level domain). So, asas certainly isn't an Internet domain name, but this.not.valid could be.
Is it a domain name under existing TLD? You can download the list of all TLDs and check against that. For example, this.not.valid wouldn't be considered valid under this rule, but thisisnotvalid.com would.
Is it a registered domain name?
Does the domain name resolve to an IP address? A domain name could be registered, but it still may not have an IP address in its DNS record.
Does the computer the domain name points to respond to requests? The requests that make the most sense are a simple HTTP request (e.g. trying to access http://domaininquestion/) or ping.
Try this one:
public static bool IsWellFormedUriString(
string uriString,
UriKind uriKind
)
Or Alternativly you can do this using RegExp like :
^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$
Take alook at this list
The problem is that none of the urls you have added here will classify as Absolute URLs. For that you have to prefix the protocol of the URL to it.
You can test and find out that
www.stackoverflow.com - Relative URL
http://www.stackoverflow.com - Absolute URL
//www.stackoverflow.com - Absolute URL ( No surprise here. Refer RFC 3986: "Uniform Resource Identifier (URI): Generic Syntax", Section 4.2 )
The point is that you have to prefix at least // to show that its an absolute URL.
So, in a nutshell, since all your URLs are relative URLs, it passes all your tests.
All your examples are valid,
some are absolute URLs some are relative, thats why none are getting removed.
Else for each Uri, you might try and construct a HttpWebRequest class
and then check for correct responses.
After checking other's answer I am aware that you are not looking for existence of domain and ping back you need to test them based on your GRAMMER... or Syntax of domain name right?
For that you need to rely on regex test only... and make proper rule to eveluate the domain name and if they fail exclude them from the list.
You can adopt these patterns and modify one to suite your need and then test them with every element in the list.
all of your URIs are Well-Formatted URIs so TryCreate and IsWellFormedUriString will not work in your case.
from here, the solutions is trying to open the URI:
using(var client = new MyClient()) {
client.HeadOnly = true;
// fine, no content downloaded
string s1 = client.DownloadString("www.stackoverflow.com");
// throws 404
string s2 = client.DownloadString("www.joker.testtest");
}
In my application I build a static string when a user uploads or downloads a file. In that string the filename is passed from the frontend in that string. In this way the user could do things like ..\..\another file.file to tamper and get data from other users. Therefor I need to filter the filename that I get to prevent this. What are the characters that need to be filtered to prevent tampering? I now have the double dot and the back and forward slashes. Is there anything else I should take into consideration? Is there maybe a standard way to do this in C#?
I would suggest using Path.GetInvalidFileNameChars:
public static bool IsValidFileName(string fileName)
{
return fileName.IndexOfAny(Path.GetInvalidFileNameChars()) == -1;
}
.. is typically only dangerous when preceded and/or succeeded by a \ or /, both of which are included in the array returned by GetInvalidFileNameChars. By itself, .. is harmless (unless you’re specifically resolving directory paths), and you shouldn’t forbid it since people might want to introduce ellipses in their filename (e.g. The A...Z of Programming.pdf).
What if different users save a file with the same name? Are you creating a folder for each user?
Most likely what you should be doing is storing the name they provide in a database record, which also contains a pointer to the actual file (which uses a file name which you generate, perhaps a guid). You could also consider using the filestream data type if you'd like to save the document in the database as well.
Nothing good can come from letting your users determine file names on your server :)
I have a variable in code that can have file path or url as value. Examples:
http://someDomain/someFile.dat
file://c:\files\someFile.dat
c:\files\someFile.dat
So there are two ways to represent a file and I can't ignore any of them.
What is the correct name for such a variable: path, url, location?
I'm using a 3rd party api so I can't change semantics or separate to more variables.
The first two are URLs, the third is a file path. Of course, the file:/// protocol is only referring to a file also.
When using the Uri class, you can use the IsFile and the LocalPath properties to handle file:/// Uris, and in that case you should also name it like that.
Personally, I'd call the variable in question "fileName"
in fact a formal URL will be file:///c|/files/someFile.dat
urls always starts with protocol:// and then path + names, with '/' as seperator.
evil windows IE sometimes use '\' to replace '/', but the formal usage is '/'.
Pick one that you'll be using internally to start with. If you need to support URLs, use URLs internally everywhere, and have any method that can set the variable check if it got a file path, and coerce it to an URL immediately.
If the values are not opaque to your application you may find it better to model them as a class. Otherwise, whenever you are going to act upon the values you may find yourself writing code like this:
if (variable.StartsWith("http://") || variable.StartsWith("file://")) {
// Handle url
}
else {
// Handle file path
}
You may fold some of the functionality regarding treatment of the values into your class, but it is properly better to treat it as an immutable value type.
Use a descriptive name for your class like FileLocation or whatever fits your nomenclature. It will then be very natural to declare FileLocation variables named fileLocation or inputFileLocation or even fl if you are sloppy.
if the path you are using includes the protocol "file://" then it is in fact a url.