Faster way to match PDF's to SQL Records - c#

I have a C# web application that matches PDF files to records in an Azure SQL server database. Part of the PDF file name includes the record ID. I don't have all the PDFs for all the records as they come in via SFTP. Currently, I make a list of all the file names, then loop through that list, pull out the record ID, find the matching record in the database and update the record. For 2k records this takes about 10 minutes to process. I was wondering if there was a faster or better optimized way to handle this?
string us_retail_directory = ConfigurationManager.AppSettings["DFCUSretailloanDirectoryGet"];
string us_post_directory = ConfigurationManager.AppSettings["DFCUSLoanDirectoryPost"];
MoveFileRecords counts = new MoveFileRecords();
DirectoryInfo directory = new DirectoryInfo(us_retail_directory);
FileInfo[] Files = directory.GetFiles("*.pdf");
foreach (FileInfo file in Files)
{
string fileSplit = Path.GetFileNameWithoutExtension(file.Name);
string[] strings = fileSplit.Split('_');
//string loanCode = strings[1];
string loanCode = strings[1].Remove(strings[1].Length - 3);
using (LoanFileEntities db = new LoanFileEntities())
{
IQueryable<dfc_LoanRecords> query = db.Set<dfc_LoanRecords>();
dfc_LoanRecords record = query.FirstOrDefault(f => f.loan_code_string == loanCode && f.region == "US");
if (record != null)
{
record.loan_file = file.Name;
record.found_date = DateTime.Now;
db.SaveChanges();
if (!File.Exists(us_post_directory + file.Name))
{ File.Move(file.FullName, us_post_directory + file.Name); }
else
{
string now = string.Format("{0:yyyyMMdd_Hmm}", DateTime.Now);
File.Move(file.FullName, us_post_directory + "(dup" + now + ")" + file.Name);
}
counts.usRetailMove++;
counts.recordCount++;
}
else
{
counts.usRetailSkip++;
counts.recordCount++;
}
}
}
return counts;

Each database lookup has some latency whereas the actual amount of data will propably not be your biggest issue.
Therefore try to batch the requests by loading multiple records at once (even if you won't use all of the fetched records [an appropriate overhead has to be determined by testing]).
You can do this in sql with a list contains (when having a set of ids) or prefetching records according to some other appropriate mechanism (e.g. by date).
Then try to match on the prefetched records and batch the insert operations.
EDIT
In your case you may query multiple records at once using a Contains expression on loan_code_string.

Related

Best way to find to select all matching files in matching directories of the root directory

Would like to find a better way
(In terms of readability, performance or maybe the length of the code?)
to do the following, like func<>? any other way will do.
As you can see from the code, its trying to grab the matching files in DatafilePath, and check if those files with matching pattern exists in all matching directories of the date of today in the path of _checkFilePath.
e.g.
if I have two files c:\abc\abc_1_2_3_4_12345.csv and c:\abc\abc_1_2_3_4_34567.csv in the datafilePath,
I would like to check if two those files has matching patterns '12345' and '34567' exists all the directories in c:\def such as c:\def\2019-10-25_123 and c:\def\2019-10-25_124.
Thanks guys.
files = Directory.GetFiles(DataFilePath, FilePattern);
var archiveDirs = Directory.GetDirectories(_checkFilePath + "archive","*",SearchOption.AllDirectories).Where(x => x.Contains(string.Format("{0:yyyy-MM-dd}", DateTime.Now)));
var filesToProcess = new List<string>();
foreach (var archiveDirStr in archiveDirs)
{
foreach (var file in files)
{
var key = file.Split('_')[5];
var contactFile = Directory.GetFiles(archiveDirStr, "*" + key + "*").FirstOrDefault();
if (contactFile != null)
{
filesToProcess.Add(file);
}
}
}
files = filesToProcess.ToArray();

Get field data outside of reporting database using Encompass360 SDK

I'm trying to build a standalone application that creates a custom report for Encompass360 without needing to put certain fields into the reporting database.
So far I have only found one way to do it, but it is extremely slow. (Much slower than a normal report within encompass when retrieving data outside of the reporting database.) It takes almost 2 minutes to pull the data for 5 loans doing this:
int count = 5;
StringList fields = new StringList();
fields.Add("Fields.317");
fields.Add("Fields.3238");
fields.Add("Fields.313");
fields.Add("Fields.319");
fields.Add("Fields.2");
// lstLoans.Items contains the string location of the loans(i.e. "My Pipeline\Dave#6")
foreach (LoanIdentity loanID in lstLoans.Items)
{
string[] loanIdentifier = loanID.ToString().Split('\\');
Loan loan = Globals.Session.Loans.Folders[loanIdentifier[0]].OpenLoan(loanIdentifier[1]);
bool fundingPlus = true; // if milestone == funding || shipping || suspended || completion;
if (!fundingPlus)
continue;
bool oneIsChecked = false;
LogMilestoneEvents msEvents = loan.Log.MilestoneEvents;
DateTime date;
MilestoneEvent ms = null; // better way to do this probably
if (checkBox4.Checked)
{
ms = msEvents.GetEventForMilestone("Completion");
if (ms.Completed)
{
oneIsChecked = true;
}
}
else if (checkBox3.Checked)
{
ms = msEvents.GetEventForMilestone("Suspended");
if (ms.Completed)
{
oneIsChecked = true;
}
}
else if (checkBox2.Checked)
{
ms = msEvents.GetEventForMilestone("Shipping");
if (ms.Completed)
{
oneIsChecked = true;
}
}
else if (checkBox1.Checked)
{
ms = msEvents.GetEventForMilestone("Funding");
if (ms.Completed)
{
oneIsChecked = true;
}
}
if (!oneIsChecked)
continue;
string LO = loan.Fields["317"].FormattedValue;
string LOid = loan.Fields["3238"].FormattedValue;
string city = loan.Fields["313"].FormattedValue;
string address = loan.Fields["319"].FormattedValue;
string loanAmount = loan.Fields["2"].FormattedValue;
if (loanAmount == "")
{
Console.WriteLine(LO);
continue;
}
int numLoans = 1;
addLoanFieldToListView(LO, numLoans, city, address, loanAmount);
if (--count == 0)
break;
}
}
I haven't been able to figure out how to use any of the pipeline methods to retrieve data outside the reporting database, but when all of the fields I am looking for are in the reporting database, it hardly takes a couple seconds to retrieve the contents of hundreds of loans using these tools:
session.Reports.SelectReportingFieldsForLoans(loanGUIDs, fields);
session.Loans.QueryPipeline(selectedDate, PipelineSortOrder.None);
session.Loans.OpenPipeline(PipelineSortOrder.None);
What would really help me is if somebody provided a simple example for retrieving data outside of the reporting database by using the encompass sdk that doesn't take longer than it ought to for retrieving the data.
Note: I am aware I can add the fields to the reporting database that aren't in it currently, so this is not the answer I am looking for.
Note #2: Encompass360 doesn't have it's own tag, if somebody knows of better tags that can be added for the subject at hand, please add them.
I use the SelectFields method on Loans to retrieve loan field data that is not in the reporting database in Encompass. It is very performant compared to opening loans up one by one but the results are returned as strings so it requires some parsing to get the values in their native types. Below is the example from the documentation for using this method.
using System;
using System.IO;
using EllieMae.Encompass.Client;
using EllieMae.Encompass.BusinessObjects;
using EllieMae.Encompass.Query;
using EllieMae.Encompass.Collections;
using EllieMae.Encompass.BusinessObjects.Loans;
class LoanReader
{
public static void Main()
{
// Open the session to the remote server
Session session = new Session();
session.Start("myserver", "mary", "maryspwd");
// Build the query criterion for all loans that were opened this year
DateFieldCriterion dateCri = new DateFieldCriterion();
dateCri.FieldName = "Loan.DateFileOpened";
dateCri.Value = DateTime.Now;
dateCri.Precision = DateFieldMatchPrecision.Year;
// Perform the query to get the IDs of the loans
LoanIdentityList ids = session.Loans.Query(dateCri);
// Create a list of the specific fields we want to print from each loan.
// In this case, we'll select the Loan Amount and Interest Rate.
StringList fieldIds = new StringList();
fieldIds.Add("2"); // Loan Amount
fieldIds.Add("3"); // Rate
// For each loan, select the desired fields
foreach (LoanIdentity id in ids)
{
// Select the field values for the current loan
StringList fieldValues = session.Loans.SelectFields(id.Guid, fieldIds);
// Print out the returned values
Console.WriteLine("Fields for loan " + id.ToString());
Console.WriteLine("Amount: " + fieldValues[0]);
Console.WriteLine("Rate: " + fieldValues[1]);
}
// End the session to gracefully disconnect from the server
session.End();
}
}
You will highly benefit from adding these fields to the reporting DB and using RDB query instead. Internally, Encompass has to open / parse files when you read fields without RDB, which is a slow process. Yet it just does a SELECT query on fields in RDB which is a very fast process. This tool will allow you quickly checking / finding which fields are in RDB so that you can create a plan for your query as well as a plan to update RDB: https://www.encompdev.com/Products/FieldExplorer
You query RDB via Session.Loans.QueryPipeline() very similarly to your use of Loan Query. Here's a good example of source code (in VB): https://www.encompdev.com/Products/AlertCounterFieldPlugin

C# Query On MongoDB Not Returning Correct Results

Im currently running into an issue when querying MongoDb using c#. The problem is that I am not returned the correct results or the correct number of results. I do not know the exact number of results but it should be less than 100; instead, I am receiving around 350k-500k results (many of which are null). The other problem is that the program takes upwards of 10 minutes to finish processing.
You can see the problematic portion of code in the following:
public List<BsonDocument> find_All_Documents_With_pIDs()
{
List<string> databases = new List<string>();
List<BsonDocument> pDocs = new List<BsonDocument>();
databases.AddRange(mongo_Server.GetDatabaseNames());
//iterate through each db in mongo
foreach (string dbName in databases)
{
List<string> collections = new List<string>();
var database = mongo_Server.GetDatabase(dbName);
collections.AddRange(database.GetCollectionNames());
//iterate through each collection
foreach (string colName in collections)
{
var collection = database.GetCollection(colName);
//Iterate through each document
foreach (var document in collection.FindAllAs<BsonDocument>())
{
//Get all documents that have a pID in either the main document or its sub document
IMongoQuery query = Query.Exists(document.GetElement("_id").ToString().Remove(0,4) + ".pID");
IMongoQuery subQuery = Query.Exists(document.GetElement("_id").ToString() + ".SubDocument.pID");
pDocs.AddRange(collection.Find(query));
pDocs.AddRange(collection.Find(subQuery));
}
}
}
//Theres a collection used earlier in the program to backup the documents before processing. Not removing the documents from the list found in this location will result in duplicates.
return remove_Backup_Documents_From_List(pIDs);
}
Any help is appreciated!
EDIT:
The following is a screen capture of the data received. Not all the data is null like the following but a very large amount is:
Your script is first bringing all your documents from the database
collection.FindAllAs<BsonDocument>()
and then assembling a query for each one. That's probably the reason the query is so slow.
As an alternative you could do the following:
foreach (string colName in collections)
{
var collection = database.GetCollection(colName);
//Query for all documents that have pID
IMongoQuery query = Query.And([Query.Exists("pID"), // The field exists
Query.NE("pID", BsonNull.Value), //It is not "null"
Query.NE("pID", BsonString.Null)]); //It is not empty i.e. = ""
//Query for all documents that have Subdocument.pID
IMongoQuery subQuery = Query.And([Query.Exists("SubDocument.pID"), // The field exists
Query.NE("SubDocument.pID", BsonNull.Value), //It is not "null"
Query.NE("SubDocument.pID", BsonString.Null)]); //It is not empty i.e. = ""
IMongoQuery totalQuery = Query.Or([query, subQuery]);
List<BsonDocument> results = collection.Find(totalQuery);
if (results.Count > 0) {
pDocs.AddRange(results); //Only add to pDocs if query returned at least one result
}
}
That way you assemble a query that returns only the documents that have either pID or Subdocument.pID fields set.

Execute SQL scripts in a required order

i need to create a small utility to execute sql files on SQL SERVER 2008R2, i have tried the following code
private static void ExecuteScripts()
{
string sqlConnectionString = "UID=sa;password=passw0rd;Data Source=somesqlserver\\db01";
DirectoryInfo info = new DirectoryInfo(#"c:\dxsh\);
FileInfo[] fileInfos = info.GetFiles("1.8*");
foreach (var fileInfo in fileInfos)
{
string script = fileInfo.OpenText().ReadToEnd();
var conn = new SqlConnection(sqlConnectionString);
var server = new Server(new ServerConnection(conn));
server.ConnectionContext.ExecuteNonQuery(script);
}
}
i will have the following files in the folder
1. 1.8_DatabaseAndUsers.sql
2. 1.8_TablesAndTypes.sql
3. 1.8_Views.sql
4. 1.8_KeysAndIndex.sql
5. 1.8_ProceduresAndFunction.sql
i need to execute the files in this order only, pls help
If you know the order in which you want to execute the files, just fetch the files in the order you expect:
string[] files = { "1.8_DatabaseAndUsers.sql", "1.8_TablesAndTypes.sql", ... };
foreach (var file in files)
{
// Simpler way of reading files (and doesn't leave the file handle open)
string text = File.ReadAllText(file);
// using statement to avoid leaking resources
using (var conn = new SqlConnection(...))
{
var server = new Server(new ServerConnection(conn));
server.ConnectionContext.ExecuteNonQuery(script);
}
}
Basically, you shouldn't rely on the order in which the files are returned by GetFiles - if you want them in a specific order, just enforce that yourself.
Another option is to use GetFiles but make sure the filenames can be ordered appropriately, e.g.
1.8_01_DatabaseAndUsers.sql
1.8_02_TablesAndTypes.sql
1.8_03_Views.sql
1.8_04_KeysAndIndex.sql
1.8_05_ProceduresAndFunction.sql
That way you don't need to hard-code the names in your program, but you can still guarantee the order, just by sorting the filenames before executing the scripts.
I presume you do not wish to hardcode the file names into your code.
In this case, you should rename the files so that they are alphabetically ordered, which can easily be achieved by putting a number in front of their name (such as 01, 02, etc.) so the first file will be 01 1.8_DatabaseAndUsers.sql and so on.
Directory.GetFiles() should return the files in alphabetical order, but you can use the following code to retrieve the file names and add them to a list, which you then explicitly sort into alphabetical order:
// Get a list of files and add them to a list
List<string> fileList = new List<string>();
foreach (string item in Directory.GetFiles(#"c:\dxsh", "* 1.8*"))
fileList.Add(item);
fileList.Sort();
// Go through each file in the list order
for (int i = 0; i < fileList.Count; i++)
{
string filename = fileList[i];
string script = File.ReadAllText(filename);
// Run your code
}

Filtering set of images from MongoDB

I have written some codes to store image files in MongoDB. Now I want to filter and retrieve some images from the mongoDB. I want to filter out some images which has some set of characters on the image name.
For Ex: say I have stored aaaa_DEX.jpg, bbbb_DEX.jpg, cccc_BVX.jpg, dddd_OUI.jpg, eeee_DEX.jpg images in mongoDB and I want to get all the images which has the "DEX" on there names. Will it be possible with Query builder? How can I do this?
To upload I use:
public JsonResult UploadPrimaryImage(string hotelCode)
{
var db = _hoteldbObj.Instance();
var primaryImageBucket = new MongoGridFS(db, new MongoGridFSSettings() {Root = "HotelPrimaryImage"});
foreach (string httpFile in Request.Files)
{
var postedFile = Request.Files[httpFile];
if(postedFile == null)
throw new InvalidOperationException("Invalid file");
var bytes = ReadToEnd(postedFile.InputStream);
using (var c = primaryImageBucket.Create(hotelCode, new MongoGridFSCreateOptions() { ContentType = postedFile.ContentType }))
{
c.Write(bytes, 0, bytes.Length);
c.Flush();
c.Close();
}
}
return new JsonResult();
}
Thank You
Performing a .find("ABC") where ABC is your filename will handle this if querying on the full file name.
If you want to query on a substring within the file name, my suggestion would be to save the substring as part of the metadata object. See this post for an example of working with metadata.

Categories

Resources