Get unique files from s3 bucket - c#

I am worknig on .net c# project. My code is fetching files from s3 bucket. In s3 bucket there are some duplicate files but their names are different. I want to fetch only unique files from s3 bucket.
I am using this query to fetch all files from s3 bucket
ListObjectsV2Request listRequest = new ListObjectsV2Request { BucketName = awsBucketName, Prefix = fullPath };
var listResult = await client.ListObjectsV2Async(listRequest);
var obj = listResult.S3Objects.Where(x => x.Key.EndsWith(".pdf") && x.Size > 0)
.OrderByDescending(x => x.LastModified)
How to get files with unique content to avoid duplicates?
Should I just read all files 1 by 1 and remove duplicate ones?
Or is there any easy way to avoid duplicates?

You can group by the ETag which files that have the same content will have the same Etag and grouping them by ETag will group all the duplicates so you can choose only one of them
ListObjectsV2Request listRequest = new ListObjectsV2Request { BucketName = awsBucketName, Prefix = fullPath };
var listResult = await client.ListObjectsV2Async(listRequest);
var obj = listResult.S3Objects.Where(x => x.Key.EndsWith(".pdf") && x.Size > 0)
.OrderByDescending(x => x.LastModified).GroupBy(x => x.ETag)
.Select(x => x.First())
.ToList();

Related

Microsoft Graph client - retrieve more than 15 users?

I'm using Microsoft Graph client and want to retrieve users based on a list of objectIds. So far I've managed to do it like this:
// Create filterstring based on objectId string list.
var filterString = string.Join(" or ", objectIds.Where(x => !string.IsNullOrEmpty(x)).Select(objectId => $"id eq '{objectId}'"));
// Get users by filter
var users = await _graphServiceClient.Users.Request()
.Select(x => new { x.UserPrincipalName, x.Id })
.Filter(filterString)
.GetAsync(ct).ConfigureAwait(false);
But I've hit this error here:
Too many child clauses specified in search filter expression containing 'OR' operators: 22. Max allowed: 15.
Is there another way to only get a portion of users? Or do I need to "chunk" the list up in 15 each?
You should probably split your query and send a BATCH request to the Graph API.
This will send only 1 request to the server, but allow you to query for more data at once.
https://learn.microsoft.com/en-us/graph/sdks/batch-requests?tabs=csharp
This could look something like this: (untested code)
var objectIds = new string[0];
var batchRequestContent = new BatchRequestContent();
var requestList = new List<string>();
for (var i = 0; i < objectIds.Count(); i += 15)
{
var batchObjectIds = objectIds.Where(x => !string.IsNullOrEmpty(x)).Skip(i).Take(15);
var filterString = string.Join(" or ", batchObjectIds.Select(objectId => $"id eq '{objectId}'"));
var request = _graphServiceClient.Users.Request()
.Select(x => new { x.UserPrincipalName, x.Id })
.Filter(filterString);
var requestId = batchRequestContent.AddBatchRequestStep(request);
requestList.Add(requestId);
}
var batchResponse = await _graphServiceClient.Batch.Request().PostAsync(batchRequestContent);
var allUsers = new List<User>();
foreach (var it in requestList)
{
var users = await batchResponse.GetResponseByIdAsync<GraphServiceUsersCollectionResponse>(it);
allUsers.AddRange(users.Value.CurrentPage);
}

get files in order of directory path

I am trying to pick files from directory below is the format of files
14094901-1_SCAN_f568aecd-5f5a-424d-bb54-b2a7ee60ca9e
14094901-2_SCAN_90b3ddf3-17f9-417d-b64d-61a175a779a3
but when file size reach to 10 like 10 after picking files first it pics no 1 file then jumps to 10.i amusing below code don't know why it is doing do
string path1 = #"C:\Users\test\AppData\Local\Temp\XXXXX";
var paths = Directory.GetFiles(path1)
.OrderBy(path =>
Convert.ToInt32(
String.Concat(
path.Split('-', '.')
.Skip(3)
.Take(1)
//.Select(num => num.PadLeft(2, '0'))
.ToArray())
)
);
Please let me know how can i get files in proper order 1,2,3,4,5,6,7,8,9,10
but getting 1,10,2,3,4,5,6,7,8,9
This might help
string path1 = #"C:\Users\test\AppData\Local\Temp\XXXXX"
var files = Directory.GetFiles(path1);
var fileIndex = files.Select(a => new {Name = a, Index = Convert.ToInt32(a.Split(new[] {'-', '_'})[1])});
var orderdFileNames = fileIndex.OrderBy(a => a.Index).Select(a => a.Name);
Convert the second split value to int before .ToArry().
Please try this
string path1 = #"C:\Users\test\AppData\Local\Temp\XXXXX";
var files = Directory.GetFiles(path1);
var orderedFiles = files.OrderBy(file => Convert.ToInt32(file.Split(new []{'-', '_'})[1]));
Please try this
var orderedFiles = Directory.GetFiles(path1).OrderBy(path =>
Convert.ToInt32(
String.Concat(
path.Split('_','-')
.Skip(1).Take(1)
.ToArray())
)
);

Find multiple files in the same directory

I'm trying to find, giving a path, a list of files that have same filename but different extensions (.bak and .dwg) in the same directory.
I have this code:
String[] FileNames = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories).Where(s => s.EndsWith(".bak") || s.EndsWith(".dwg")).ToArray();
var queryDupNames = from f in FileNames
group f by Path.GetFileNameWithoutExtension(f) into g
where g.Count() > 1
select new { Name = g.Key, FileNames = g };
This works great to locate files with the same filename but in the whole system. I need only to obtain those that are in the same directory.
For example:
- Dir1\filename1.bak
- Dir1\filename1.dwg
- Dir1\filename2.bak
- Dir1\filename2.dwg
- Dir1\filename3.dwg
- DiferentDir\filename1.bak
- DiferentDir\filename1.dwg
- DiferentDir\filename3.dwg
The result should be:
- Dir1\filename1.bak
- Dir1\filename1.dwg
- Dir1\filename2.bak
- Dir1\filename2.dwg
- DiferentDir\filename1.bak
- DiferentDir\filename1.dwg
But with my code, filename3 is also included due to
g.count() > 1
it's true. It's grouping by only filename... I tried to fix with this code but I got 0 results:
String[] FileNames = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories).Where(s => s.EndsWith(".bak") || s.EndsWith(".dwg")).ToArray();
var queryDupNames = from f in FileNames
group f by new { path = Path.GetLongPath(f), filen = Path.GetFileNameWithoutExtension(f) } into g
where g.Count() > 1
select new { Name = g.Key, FileNames = g };
Any help or clue?
Thanks
System.IO.Path doesn't have a GetLongPath method. I suspect you are using an external library like AlphaFS. In any case, GetLongPath returns the full file path, not the path of the file's folder.
The file's folder path is returned by GetDirectoryName both in System.IO and other libraries like AlphaFS. The following snippet will return only Dir1\filename1, Dir1\filename2 and DifferentDir\filename1
var files = new[]
{
#"c:\Dir1\filename1.bak",
#"c:\Dir1\filename1.dwg",
#"c:\Dir1\filename2.bak",
#"c:\Dir1\filename2.dwg",
#"c:\Dir1\filename3.dwg",
#"c:\DiferentDir\filename1.bak",
#"c:\DiferentDir\filename1.dwg",
#"c:\DiferentDir\filename3.dwg",
};
var duplicates = from file in files
group file by new
{
Folder = Path.GetDirectoryName(file),
Name = Path.GetFileNameWithoutExtension(file)
} into g
where g.Count()>1
select new
{
Name = g.Key,
Files = g.ToArray()
};
first find all folders.
then for each folder find all the files with same name but different extension.
something like this:
var list = new List<string>();
foreach (var subDirectory in Directory.EnumerateDirectories(#"C:\Temp"))
{
var files = Directory.EnumerateFiles(subDirectory);
var repeated = files.Select(Path.GetFileNameWithoutExtension)
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => y.Key);
list.AddRange(repeated);
}
tested on .net 4.6

Whats the best way to search files in a directory for multiple extensions and get the last write time according to filename

I am having a hard time mixing types with linq in the forloop. Basically i need to search a directory with a dbname, not knowing if the file will be .bak or .7z. If there are multiple files with the same dbname i need to get the one with extention .7z. If there are multiple files with same dbname and extention .7z I need to get the file with the last write time. This is what i have so far.
string [] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
foreach (var fileName in files)
{
var dbName = "Test";
var extention7 = ".7z";
var extentionBak = ".bak";
if (fileName.Contains(dbName) && (fileName.Contains(extention7) || fileName.Contains(extentionBak)))
{
Console.WriteLine(fileName);
}
}
I wouldn't create a LINQ only solution for this - it will be too hard to understand.
Here is what I would do:
string GetDatabaseFile(string folder, string dbName)
{
var files =
Directory.EnumerateFiles(folder, dbName + "*.*")
.Select(x => new { Path = x, Extension = Path.GetExtension(x) })
.Where(x => x.Extension == ".7z" || x.Extension == ".bak")
.ToArray();
if(files.Length == 0)
return null;
if(files.Length == 1)
return files[0].Path;
var zippedFiles = files.Where(x => x.Extension == ".7z").ToArray();
if(zippedFiles.Length == 1)
return zippedFiles[0].Path;
return zippedFiles.OrderByDescending(x => File.GetLastWriteTime(x.Path))
.First().Path;
}
Please note that this doesn't take into account the case where there are no .7z files but multiple .bak files for a DB. If this scenario can occur, you need to extend the method accordingly.
Get files in directory:
var sourceFilePaths = Directory.EnumerateFiles(sourceDirectory).Where(f => Path.GetExtension(f).ToLower() == ".exe" ||
Path.GetExtension(f).ToLower() == ".dll" ||
Path.GetExtension(f).ToLower() == ".config");
.
.
.
File compare:
var sourceFileInfo = new FileInfo(filePath);
var destinationFileInfo = new FileInfo(destinationFilePath);
var isNewer = sourceFileInfo.LastWriteTime.CompareTo(destinationFileInfo.LastWriteTime) > 0;
Instead of packing everything in one if condition you should handle all cases separate:
var dbName = "Test";
var extention7 = ".7z";
var extentionBak = ".bak";
foreach (var fileName in files)
{
if (!fileName.Contains(dbName)) continue; // wrong base name
if (File.GetExtension(filename) == extention7)
{
// handle this case:
// extract file date
// remember latest file
}
else if(File.GetExtension(filename) == extentionBak)
{
// handle this case
}
}

Directory.GetFiles get today's files only

There is nice function in .NET Directory.GetFiles, it's simple to use it when I need to get all files from directory.
Directory.GetFiles("c:\\Files")
But how (what pattern) can I use to get only files that created time have today if there are a lot of files with different created time?
Thanks!
For performance, especially if the directory search is likely to be large, the use of Directory.EnumerateFiles(), which lazily enumerates over the search path, is preferable to Directory.GetFiles(), which eagerly enumerates over the search path, collecting all matches before filtering any:
DateTime today = DateTime.Now.Date ;
FileInfo[] todaysFiles = new DirectoryInfo(#"c:\foo\bar")
.EnumerateFiles()
.Select( x => {
x.Refresh();
return x;
})
.Where( x => x.CreationTime.Date == today || x.LastWriteTime == today )
.ToArray()
;
Note that the the properties of FileSystemInfo and its subtypes can be (and are) cached, so they do not necessarily reflect current reality on the ground. Hence, the call to Refresh() to ensure the data is correct.
Try this:
var todayFiles = Directory.GetFiles("path_to_directory")
.Where(x => new FileInfo(x).CreationTime.Date == DateTime.Today.Date);
You need to get the directoryinfo for the file
public List<String> getTodaysFiles(String folderPath)
{
List<String> todaysFiles = new List<String>();
foreach (String file in Directory.GetFiles(folderPath))
{
DirectoryInfo di = new DirectoryInfo(file);
if (di.CreationTime.ToShortDateString().Equals(DateTime.Now.ToShortDateString()))
todaysFiles.Add(file);
}
return todaysFiles;
}
You could use this code:
var directory = new DirectoryInfo("C:\\MyDirectory");
var myFile = (from f in directory.GetFiles()
orderby f.LastWriteTime descending
select f).First();
// or...
var myFile = directory.GetFiles()
.OrderByDescending(f => f.LastWriteTime)
.First();
see here: How to find the most recent file in a directory using .NET, and without looping?
using System.Linq;
DirectoryInfo info = new DirectoryInfo("");
FileInfo[] files = info.GetFiles().OrderBy(p => p.CreationTime).ToArray();
foreach (FileInfo file in files)
{
// DO Something...
}
if you wanted to break it down to a specific date you could try this using a filter
var files = from c in directoryInfo.GetFiles()
where c.CreationTime >dateFilter
select c;
You should be able to get through this:
var loc = new DirectoryInfo("C:\\");
var fileList = loc.GetFiles().Where(x => x.CreationTime.ToString("dd/MM/yyyy") == currentDate);
foreach (FileInfo fileItem in fileList)
{
//Process the file
}
var directory = new DirectoryInfo(Path.GetDirectoryName(#"--DIR Path--"));
DateTime from_date = DateTime.Now.AddDays(-5);
DateTime to_date = DateTime.Now.AddDays(5);
//For Today
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file.CreationTime.Date == DateTime.Now.Date ).ToArray();
//For date range + specific file extension
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => file.CreationTime.Date >= from_date.Date && file.CreationTime.Date <= to_date.Date && file.Extension == ".txt").ToArray();
//To get ReadOnly files from directory
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => file.IsReadOnly == true).ToArray();
//To get files based on it's size
int fileSizeInKB = 100;
var filesLst = directory.GetFiles().AsEnumerable()
.Where(file => (file.Length)/1024 > fileSizeInKB).ToArray();

Categories

Resources