This question already has answers here:
How to Generate unique file names in C#
(20 answers)
Closed 4 years ago.
I have this code here that take base 64 string and creates bytes, next I create a file name for these bytes.
byte[] bytes = System.Convert.FromBase64String(landingCells.imageBytes);
var filePath = landingCells.jobNo + DateTime.Now.ToString("yyyyMMddHHmmssffffff");
next I have save these bytes:
System.IO.File.WriteAllBytes("C:/app/Images/" + filePath + ".jpg", bytes);
The problem I am having is I am calling these lines of code in a loop via an iOS app and sometimes the yyyyMMddHHmmssffffff is the same as the previous item in the loop. My question, how can I make the file names more unique so this does not happen.
Try This by using Guid.NewGuid():
var uniquecode=Guid.NewGuid();
var filePath = landingCells.jobNo + DateTime.Now.ToString("yyyyMMddHHmmssffffff")+uniquecode;
Using a date-based name will limit your file creation rate to the frequency of the system clock (and is also not threadsafe) which is why you are seeing duplicate file names when you complete iterations of your loop too quickly. You have several options to make it more unique that depend on what your requirements are:
Add an incrementing counter suffix to the file name when the date is the same as the date of the last file written
Incorporate a GUID into the file name. This will be less readable than the counter suffix but will guarantee uniqueness even across a distributed system and won't require you to maintain a counter.
Incorporate some other original information about the file or its metadata into the name that when combined with the date will be unique
Come up with some custom name generation algorithm that will generate unique names for every (even repeated) input. How you do this depends on the domain you're working within and the data you're dealing with.
I'm not sure what kind app you're building, but it's worth reevaluating whether you actually need to write that many images to disk per second and if you do whether a video would be better. Throttling the writes would probably not be a bad idea and it would also solve the naming problem.
Related
I have 12 media files with some short music. These files are some how distinguish, as having all (I mean one file content from beginning to end) same content or different contents.
File names are:
a1_same.wav // from beginning to end it contains the same content
a2_diff.wav // from beginning to end it contains the different content
a3_diff.wav
a4_diff.wav
a5_same.wav
......
till 12.
Now I read all these files and iterate through the file name to distinguish
if the contents are same or diff
// just a pseudo code - syntax may be wrong
foreach(var file in abcCollection)
{
if(file.FilePath.Contains("Same"))
{
// then same
}
else
{
// different
}
}
But I am not satisfied with this kind of check with (checking with file name string for same or different).
Is there any other way to do the same? I mean keeping some say primary key in memory or maintaining some in memory dictionary or list etc...honestly I do not have any clue :-(
If you have any idea then please share.
You could use a hashing function such as MD5 to quickly find if the files physical contents are the same.
The hashing function takes a piece of input data (the file contents) and runs it through a repeatable algorithm that will always return the same value given the same input data, but will return a different value if the input data is in any way different.
This technique is commonly used by download sites and content distributors to help the downloader verify that a file has not been corrupted or tampered with, as they can compare the hash value of the received file against the published hash value provided by the file host.
EDIT: Note that this relies on the files being binary equal, it is not an audio comparison and will not work for files which contain the same audio clip but have different amounts of silent lead-in or lead-out at the start and end of the clips, or if they were different bit rates or had different meta data (MP3 tags etc.) in the file.
MD5 - Wikipedia, the free encyclopedia
Recently I got the exception:
Message:
System.IO.IOException: The file 'C:\Windows\TEMP\635568456627146499.xlsx' already exists.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
This was the result of the following code I used for generating file names:
Path.Combine(Path.GetTempPath(), DateTime.Now.Ticks + ".xlsx");
After realising that it is possible to create two files in one Tick, I changed the code to:
Path.Combine(Path.GetTempPath(), Path.GetRandomFileName() + ".xlsx");
But I am still wondering what is the probability of the above exception in the new case?
Internally, GetRandomFileName uses RNGCryptoServiceProvider to generate 11-character (name:8+ext:3) string. The string represents a base-32 encoded number, so the total number of possible strings is 3211 or 255.
Assuming uniform distribution, the chances of making a duplicate are about 2-55, or 1 in 36 quadrillion. That's pretty low: for comparison, your chances of winning NY lotto are roughly one million times higher.
The probability of getting duplicate names with GetRandomFileName are really low, but if you look at it source here, you see that they don't check if the name is duplicate (They can't because you can't tell the path where this file should be created)
Instead the Path.GetTempFileName return an unique file name inside the Temp directory.
(So removing also the need to build the temp path in your code)
GetTempFileName uses the Win32 API GetTempFileName requesting the creation of an unique file name.
The Win32 API creates the file with a zero length and release the handle.
So you don't fall in concurrency scenarios. Better use this one.
GetRandomFileName() returns 8.3 char string. This is 11 characters that can vary. Assuming it contains only letters and digits, this gives us an "alphabet" of 36 characters. So the number of variations is least 36^11, which makes the probability of above exception extremely low.
I would like to put my answer in comment area rather than here, but I don't have enough reputation to add comment.
For your first snippet, I think you can precheck if file exists or not.
For the second one, code will generate random name but random means you still have tiny teeny possibility to get the exception....but I don't think you need worry about this. Existence check will help.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Formulas to generate a unique id?
Basically I need to generate a unique number but I don't want it to be too long, such as a UUID. Something half of that size (if not smaller).
Can anyone think of any ways to do this?
Basically I'm going to have an app which might be in use by multiple people and the app generates files and uploads them to the web server. Those names need to be unique.
I'm not looking to use a database table to keep track of this stuff, by the way.
Generate a UUID, and only take the first half the string.
If you're concerned about generating duplicate IDs, your options are to make them non-random and auto-increment, or to check for the existence of newly generated IDs:
do {
newId = generateNewId();
} while (idExists(newId));
If you need it unique and short go with UUID and use a url shortener.
Piqued my curiosity:
// create a 32-bit uid:
var i = BitConverter.ToUInt32(Guid.NewGuid().ToByteArray(), (new Random()).Next(0, 12));
// create a 64-bit uid
var l = BitConverter.ToUInt64(Guid.NewGuid().ToByteArray(), (new Random()).Next(0, 8));
Of course the following may be equally applicable because you lose most of the features of a guid when you truncate it, and might as well resort to a random number:
l = BitConverter.ToUInt64(BitConverter.GetBytes((new Random()).NextDouble()), 0);
... if you're looking for a 64-bit integer.
I am writing a program to diff, and copy entire files or segments based on changes on either end (Rsync-esque... but more like Unison). The main idea is to keep my music folder (all mp3s) up to date over multiple locations.
I'd like to send segmented updates if only small portions of the file have changed, as opposed to copying the entire file. For this, I need a way to diff segments of the file.
I initially tried generating hashes for blocks of every file (Every n bytes I'd hash the segment). I noticed that when I changed one attribute (id3v2 tag on an mp3) all the hashed blocks would change. This makes sense, as I would guess the header is growing as it acquired new information.
This leads me to my actual question. I would like to know how to determine the length of an mp3's header, so I could create 2 comparable hashes.
1) The meta info of the file (header)
2) The actual mpeg stream with audio (This hash should remain unchanged if all I do is alter tag info)
Am I missing anything else?
Thanks!
Ty
If all you want to check the length of is id3v2 tags, then you can find out information about its structure at http://www.id3.org/id3v2.4.0-structure.
If you read the first 3 bytes, and they are equal to "ID3", then skip to the 7th byte, then read the header size. Be careful though, because the size is stored as a "synchsafe integer".
If you want to determine the header information, you'll either:
a) need to use a mp3 library that can do the parsing for you, or
b) go to the mp3 specification and parse it out as needed.
I wound up using TagLibSharp. developer.novell.com/wiki/index.php/TagLib_Sharp
I am trying to compare two large datasets from a SQL query. Right now the SQL query is done externally and the results from each dataset is saved into its own csv file. My little C# console application loads up the two text/csv files and compares them for differences and saves the differences to a text file.
Its a very simple application that just loads all the data from the first file into an arraylist and does a .compare() on the arraylist as each line is read from the second csv file. Then saves the records that don't match.
The application works but I would like to improve the performance. I figure I can greatly improve performance if I can take advantage of the fact that both files are sorted, but I don't know a datatype in C# that keeps order and would allow me to select a specific position. Theres a basic array, but I don't know how many items are going to be in each list. I could have over a million records. Is there a data type available that I should be looking at?
If data in both of your CSV files is already sorted and have the same number of records, you could skip the data structure entirely and do in-place analysis.
StreamReader one = new StreamReader("C:\file1.csv");
StreamReader two = new StreamReader("C:\file2.csv");
String lineOne;
String lineTwo;
StreamWriter differences = new StreamWriter("Output.csv");
while (!one.EndOfStream)
{
lineOne = one.ReadLine();
lineTwo = two.ReadLine();
// do your comparison.
bool areDifferent = true;
if (areDifferent)
differences.WriteLine(lineOne + lineTwo);
}
one.Close();
two.Close();
differences.Close();
System.Collections.Specialized.StringCollection allows you to add a range of values and, using the .IndexOf(string) method, allows you to retrieve the index of that item.
That being said, you could likely just load up a couple of byte[] from a filestream and do byte comparison... don't even worry about loading that stuff into a formal datastructure like StringCollection or string[]; if all you're doing is checking for differences, and you want speed, I would wreckon byte differences are where it's at.
This is an adaptation of David Sokol's code to work with varying number of lines, outputing the lines that are in one file but not the other:
StreamReader one = new StreamReader("C:\file1.csv");
StreamReader two = new StreamReader("C:\file2.csv");
String lineOne;
String lineTwo;
StreamWriter differences = new StreamWriter("Output.csv");
lineOne = one.ReadLine();
lineTwo = two.ReadLine();
while (!one.EndOfStream || !two.EndOfStream)
{
if(lineOne == lineTwo)
{
// lines match, read next line from each and continue
lineOne = one.ReadLine();
lineTwo = two.ReadLine();
continue;
}
if(two.EndOfStream || lineOne < lineTwo)
{
differences.WriteLine(lineOne);
lineOne = one.ReadLine();
}
if(one.EndOfStream || lineTwo < lineOne)
{
differences.WriteLine(lineTwo);
lineTwo = two.ReadLine();
}
}
Standard caveat about code written off the top of my head applies -- you may need to special-case running out of lines in one while the other still has lines, but I think this basic approach should do what you're looking for.
Well, there are several approaches that would work. You could write your own data structure that did this. Or you can try and use SortedList. You can also return the DataSets in code, and then use .Select() on the table. Granted, you would have to do this on both tables.
You can easily use a SortedList to do fast lookups. If the data you are loading is already sorted, insertions into the SortedList should not be slow.
If you are looking simply to see if all lines in FileA are included in FileB you could read it in and just compare streams inside a loop.
File 1
Entry1
Entry2
Entry3
File 2
Entry1
Entry3
You could loop through with two counters and find omissions, going line by line through each file and see if you get what you need.
Maybe I misunderstand, but the ArrayList will maintain its elements in the same order by which you added them. This means you can compare the two ArrayLists within one pass only - just increment the two scanning indices according to the comparison results.
One question I have is have you considered "out-sourcing" your comparison. There are plenty of good diff tools that you could just call out to. I'd be surprised if there wasn't one that let you specify two files and get only the differences. Just a thought.
I think the reason everyone has so many different answers is that you haven't quite got your problem specified well enough to be answered. First off, it depends what kind of differences you want to track. Are you wanting the differences to be output like in a WinDiff where the first file is the "original" and second file is the "modified" so you can list changes as INSERT, UPDATE or DELETE? Do you have a primary key that will allow you to match up two lines as different versions of the same record (when fields other than the primary key are different)? Or is is this some sort of reconciliation where you just want your difference output to say something like "RECORD IN FILE 1 AND NOT FILE 2"?
I think the asnwers to these questions will help everyone to give you a suitable answer to your problem.
If you have two files that are each a million lines as mentioned in your post, you might be using up a lot of memory. Some of the performance problem might be that you are swapping from disk. If you are simply comparing line 1 of file A to line one of file B, line2 file A -> line 2 file B, etc, I would recommend a technique that does not store so much in memory. You could either read write off of two file streams as a previous commenter posted and write out your results "in real time" as you find them. This would not explicitly store anything in memory. You could also dump chunks of each file into memory, say one thousand lines at a time, into something like a List. This could be fine tuned to meet your needs.
To resolve question #1 I'd recommend looking into creating a hash of each line. That way you can compare hashes quick and easy using a dictionary.
To resolve question #2 one quick and dirty solution would be to use an IDictionary. Using itemId as your first string type and the rest of the line as your second string type. You can then quickly find if an itemId exists and compare the lines. This of course assumes .Net 2.0+