Trying to sync data from third party api - c#

This question has probably been asked correctly before, and I'll gladly accept an answer pointing me to the right spot. The problem is I don't know how to ask the question correctly to get anything returned in a search.
I'm trying to pull data from a 3rd party api (ADP) and store data in my database using asp.net core.
I am wanting to take the users returned from the API and store them in my database, where I have an ADP ancillary table seeded with the majority of the data from the api.
I would then like to update or add any missing or altered records in my database FROM the API.
I'm thinking that about using an ajax call to the api to retrieve the records, then either storing the data to another table and using sql to look for records that are changed between the two tables and making any necessary changes(this would be manually activated via a button), or some kind of scheduled background task to perform this through methods in my c# code instead of ajax.
The question I have is:
Is it a better fit to do this as a stored procedure in sql or rather have a method in my web app perform the data transformation.
I'm looking for any examples of iterating through the returned data and updating/creating records in my database.
I've only seen vague not quite what I'm looking for examples and nothing definitive on the best way to accomplish this. If I can find any reference material or examples, I'll gladly research but I don't even know where to start, or the correct terms to search for. I've looked into model binding, ajax calls, json serialization & deserialization. I'm probably overthinking this.
Any suggestions or tech I should look at would be appreciated. Thanks for you time in advance.
My app is written in asp.net core 2.2 using EF Core
* EDIT *
For anyone looking - https://learn.microsoft.com/en-us/dotnet/csharp/tutorials/console-webapiclient
This with John Wu's Answer helped me achieve what I was looking for.

If this were my project this is how I would break down the tasks, in this order.
First, start an empty console application.
Next, write a method that gets the list of users from the API. You didn't tell us anything at all about the API, so here is a dummy example that uses an HTTP client.
public async Task<List<User>> GetUsers()
{
var client = new HttpClient();
var response = await client.GetAsync("https://SomeApi.com/Users");
var users = await ParseResponse(response);
return users.ToList();
}
Test the above (e.g. write a little shoestring code to run it and dump the results, or something) to ensure that it works independently. You want to make sure it is solid before moving on.
Next, create a temporary table (or tables) that matches the schema of the data objects that are returned from the API. For now you will just want to store it exactly the way you retrieve it.
Next, write some code to insert records into the table(s). Again, test this independently, and review the data in the table to make sure it all worked correctly. It might look a little like this:
public async Task InsertUser(User user)
{
using (var conn = new SqlConnection(Configuration.ConnectionString))
{
var cmd = new SqlCommand();
//etc.
await cmd.ExecuteNonQueryAsync();
}
}
Once you know how to pull the data and store it, you can finish the code to extract the data from the API and insert it. It might look a little like this:
public async Task DoTheMigration()
{
var users = await GetUsers();
var tasks = users.Select
(
u => InsertUser(u)
);
await Task.WhenAll(tasks.ToArray());
}
As a final step, write a series of stored procedures or a DTS package to move the data from the temp tables to their final resting place. If you are using MS Access, you can write a series of queries and execute them in order with some VBA. At a high level it would:
Check for any records that exist in the temp table but not in the final table and insert them into the final table.
Check for any records that exist in the final table but not the temp table and remove them or mark them as deleted.
Check for any records in common that have different column values and update the final table.
Each of these development activities raises it own set of questions, of course, which you can post back to StackOverflow with details. As it is your question doesn't have enough specificity for a more in-depth answer.

Related

Populate new field in SQLite database using existing field value and a C# function

I have a SQLite database for which I want to populate a new field based on an existing one. I want to derive the new field value using a C# function.
In pseudocode, it would be something like:
foreach ( record in the SQLite database)
{
my_new_field[record_num] = my_C#_function(existing_field_value[record_num]);
}
Having looked at other suggestions on StackOverflow, I'm using a SqliteDataReader to read each record, and then running a SQlite "UPDATE" command based on the specific RowId to update the new field value for the same record.
It works .... but it's REALLY slow and thrashes the hard drive like crazy. Is there really no better way to do this?
Some of the databases I need to update might be millions of records.
Thanks in advance for any help.
Edit:
In response to the comment, here's some real code in a legacy language called Concordance CPL. The important point to note is that you can read and write changes to the current record in one go:
int db;
cycle(db)
{
db->FIRSTFIELD = myfunction(db->SECONDFIELD);
}
myfunction(text input)
{
text output;
/// code in here to derive output from input
return output;
}
I have a feeling there's no equivalent way to do this in SQLite as SQL is inherently transactional, whereas Concordance allowed you to traverse and update the database sequentially.
The answer to this is to wrap all of the updates into a single transaction.
There is an example here that does it for bulk inserts:
https://www.jokecamp.com/blog/make-your-sqlite-bulk-inserts-very-fast-in-c/
In my case, it would be bulk updates based on RowID wrapped into a single transaction.
It's now working, and performance is many orders of magnitude better.
EDIT: per the helpful comment above, defining a custom C# function and then reference it in a single UPDATE command also works well, and in some ways is better than the above as you don't have to loop through within C# itself. See e.g. Create/Use User-defined functions in System.Data.SQLite?

Handle a lot of data in ASP-NET MVC

I'm starting to program with ASP-NET MVC an application with Angular for the front-end and SQL Sever to the database. In some cases, I have complex query than I have to use and I cannot modify because of a restriction business. I am using a structure similar to this one: using simple queries in ASP.NET MVC but I don´t know which is the correct way to handle a lot of data and show in the front-end.
I have a ViewModel with the data structure of the results query, a DomainModel where the query is located and the Controller to communicate with the front-end.
My problem is that I don´t know which would be the way to develop what I am trying. Now I´m trying to create as many objects in a list object as rows in my query, but when this method is running my computer gets blocked with no error showed (I can guess it is because it is using the whole memory).
Note that the table in the front has to show only 25 results per page, maybe I can execute the query always when the user choose a diferente page of the table, getting a different lots of results. I didn´t try this choice yet.
This is part of the DomainModel:
public IEnumerable<OperationView> GetOperations()
{
List<OperationView> Operationslist = new List<OperationView>();
using (SqlConnection connection = new SqlConnection(connectionString))
using (SqlCommand command = new SqlCommand("", connection))
{
command.CommandText = /*Query joining 8-10 tables*/;
connection.Open();
SqlDataReader reader = command.ExecuteReader();
while (reader.Read())
{
var OperationView = new OperationView();
OperationView.IdOperacion = reader["ID_OPERACION"].ToString();
//Loading here some other variables of OperationView
Operationslist.Add(OperationView);
}
connection.Close();
}
return Operationslist;
}
This is part of the Controller:
public IEnumerable<OperationView> GetOperaciones()
{
var Operation = new OperationPDomainModel();
return Operation.GetOperations();
}
I think that my front and ViewModel are not importants for this problem, but I can include them if needed.
Currently, if I try to execute the computer shuts down unexpectely...
As your system is going out of memory, you need to have pagination.
This paging is should be done in the database side. UI just need to pass the page index and number of records displayed per page.
So your query should be something as below
Select a,b,c, ROW_NUMBER() OVER(ORDER BY a) rnum from foo
where rnum between (25 * Page_Index) + 1 and (25 * Page_Index) + 25
There are a few improvements you could make.
Make the call async
The operation would hang as it blocks the main thread. If possible try this operation async. Use task-based programming to run the operation on a different thread. That should make things a little better not improve that significantly.
Use pagination
Get only the number of records that you need to display on the page. This should be the best improvement based on the code you have. It would also be better to have some more filters if possible. But getting only 25 records if you need only 25 should be the way to go.
It would also help if you could use modern programming techniques like EF and LINQ instead of traditional ADO.Net
Use Ajax
Such large processing should be done using AJAX calls. If you do not want the user to wait for the data to be loaded, you can load the page and make the data retrieval a part of a separate AJAX call.
check this View Millions Of Records
https://www.c-sharpcorner.com/article/how-to-scroll-and-view-millions-of-records/

Rethinkdb .net update value

I'm looking at updating stored values in a RethinkDB using the C# RethinkDB.Driver library and I'm just not getting it right.
I can achieve an update by getting the result, altering that object then making a separate call to update with that object. When there are many calls to a record to update like this, the value being updated elsewhere whilst the application is working with the record.
TestingObject record = r.Db("test").Table("learning").Get("c8c54346-e35f-4025-8641-7117f12ebc5b").Run(_conn);
record.fieldNameIntValue = record.fieldNameIntValue + 1;
var result = r.Db("test").Table("learning").Get("c8c54346-e35f-4025-8641-7117f12ebc5b").Update(record).Run(_conn);
I've been trying something along these lines :
var result = r.Db("test").Table("learning").Get("c8c54346-e35f-4025-8641-7117f12ebc5b").Update(row => row["fieldNameIntValue"].Add(1)).Run(_conn);
but the result errors with Inserted value must be an OBJECT (got NUMBER):101 which suggests this is only passing the field value back instead of updating the object.
Ideally I'd like to update multiple columns at once, any advice is appreciated :)
This is an example that works in the ReQL data explorer. You can chain as may filters before the update as you want. I assume this will translate to the C# Driver, but I dont have any experience with that.
r.db('database').table('tablename').update({clicks: r.row("clicks").add(1)}).run().then(function(result){ ...
Thanks T Resudek your answer and a clearer head helped emphasised the need to map the calculation to the property.
Looking at the javadocs for update it has HashMap method which I followed with the c# library and it works.
var result = r.Db("test").Table("learning").Get("c8c54346-e35f-4025-8641-7117f12ebc5b").Update(row => r.HashMap("fieldNameIntValue",row["fieldNameIntValue"].Add(1))).Run(_conn);
I'd be interested to know if this is the right way or was a better way.

Parsing and inserting bulk data. How to keep performance and do relations?

The data
I have a collection with around 300,000 vacations. Every vacation has several categories, countries, cities, activities and other subobjects. This data needs to be inserted into a MySQL / SQL Server database. I have the luxury of being able to truncate the entire database and start clean every time the parser program is run.
What I have tried
I have tried working with Entity Framework, this is also where my preference lies. To keep Entity Framework's performance up I have created a construction where 300 items are taken out of the vacations collection, parsed and inserted by Entity Framework and it's context disposed thereafter. The program finishes in a matter of minutes using this method. If I fill the context with all 300k vacations from the collection (and it's subobjects) it's a matter of hours.
int total = vacationsObjects.Count;
for (int i = 0; i < total; i += Math.Min(300, (total - i)))
{
var set = vacationsObjects.Skip(i).Take(300);
int enumerator = 0;
using (var database = InitializeContext())
{
foreach (VacationModel vacationData in set)
{
enumerator++;;
Vacations vacation = new Vacations
{
ProductId = vacationData.ExternalId,
Name = vacationData.Name,
Description = vacationData.Description,
Price = vacationData.Price,
Url = vacationData.Url,
};
foreach (string category in vacationData.Categories)
{
var existingCategory = database.Categories.Local.FirstOrDefault(c => c.CategoryName == categor);
if (existingCategory != null)
vacation.Categories.Add(existingCategory);
else
{
vacation.Categories.Add(new Category
{
CategoryName = category
});
}
}
database.Vacations.Add(vacation);
}
database.SaveChanges();
}
}
The downside (and possibly dealbreaker) with this method is figuring out the relationships. As you can see when adding a Category I check if it's already been created in the local context, and then use that. But what if it has been added in a previous set of 300? I don't want to query the database multiple times for every vacation to check whether an entity already resides within it.
Possible solution
I could keep a dictionary in memory containing the categories that have been added. I'd need to figure out how to attach these categories to the proper vacations (or vice-versa) and insert them, including their respective relations into the database.
Possible alternatives
Segregate the context and the transaction -
Purely theoretical, I do not know if I'm making any sense here. Maybe I could have EF's context keep track of all objects, and take manual control over the inserting part. I have messed around with this, trying to work with manual transaction scopes without avail.
Stored procedure -
I could write a stored procedure that handles and inserts my data. I'm not a big fan of this alternative, as I would like to keep the flexibility of switching between MySQL and SQL Server. Also, I would be in the dark as to where to begin.
Intermediary CSV file -
Instead of inserting parsed data directly into the RDMBS, I could export it into one or more CSV files and make use of importing tools such as MySQL's INFLINE.
Alternative database systems
Databases such as Azure Table Storage, MongoDB or RavenDB could be an option. However, I would prefer to stick to a traditional RDMBS due to compatibility with my skillset and tools.
I have been working on and researching this problem for a couple of weeks now. It seems like the best way of finding a solution that fits is by simply trying the different possibilities and observing the result. I was hoping that I could receive some pointers or tips from your personal experiences.
If you insert each record separately, the whole operation will take a lot of time. The bottleneck is SQL-queries between client and server. Each query takes time, so try to avoid using multiple of them. For huge amount of data it will be much better to process them locally. The best solution is to use special import tool. In MySQL you can use LOAD DATA, in MSSQL there is BULK INSERT. To import your data, you need a .css file.
To handle external keys correctly, you must populate tables manually before inserting. If destination tables are empty, you can simply create .css file with predefined primary and external keys. Otherwise you can import existing records from server, update them with your data, then export them back.
Time
Since you can afford to make only INSERTs, one suggestion is to try Entity Framework Bulk Insert extension. I have used it to save up to 200K records and it works fine. Just include in your project and write something like this:
context.BulkInsert(listOfEntities);
This should solve (or greatly improve the EF version) your problem's the time dimension
Data integrity
Keeping everything in one transaction does not sound reasonable (I expect that 300K parent records to generate at least 3M overall records), so I would try the following approach:
1) make your entities insertion using bulk insert.
2) call a stored procedure to check data integrity
If the insertion is quite long and the chance of failure is relatively big, you can load what is already loaded and have the process skip what is already loaded:
1) make smaller bulk inserts for a batch of vacation records and all its children records. Ensure that it runs in a transaction. One BULK INSERT is run atomically (no transaction needed), for several it seems tricky.
2) if the process fails, you have complete vacation data in your database (no partially imported vacation)
3) retake the process, but load existing vacation records (parents only). Using EF, a faster way is using AsNoTracking to spare the tracking overhead (which is great for large lists)
var existingVacations = context.Vacation.Select(v => v.VacationSourceIdentifier).AsNoTracking();
As suggested by Alexei, EntityFramework.BulkInsert is a very good solution if your model is supported by this library.
You can also use Entity Framework Extensions (PRO Version) which allow to use BulkSaveChanges and Bulk Operations (Insert, Update, Delete and Merge).
It's support your both provider: MySQL and SQL Server
// Upgrade SaveChanges performance with BulkSaveChanges
var context = new CustomerContext();
// ... context code ...
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(operation => operation.BatchSize = 1000);
// Use direct bulk operation
context.BulkInsert(customers);
Disclaimer: I'm the owner of the project Entity Framework Extensions

Optimizing a RESTful Query in the client

READ FIRST before answering!
I have a RESTful service which wraps around the Entity Framework. Basically, all I did was create a database, add relations between the tables, create an Entity model around this database and finally expose the whole thing as a RESTful *.svc service. This is done, cannot be changed either.
Now I need to query data from it through a client application. All I can access is the service itself. I cannot add any server-side code, even if I wanted to. The server is now locked.
I need to retrieve data from a table called "ProductVoorwaarden" (product conditions) which is linked to three other tables. (Rubriek, Categorie and Datatype.) This data needs to be returned as XML, with a root node called "PRODUCTVOORWAARDEN" and every record in it's own XElement called "REC". Within this REC, there's an attribute for every field in the table plus references to related tables. Here's the code I have right now:
XElement PRODUCTVOORWAARDEN()
{
XElement Result = new XElement("PRODUCTVOORWAARDEN");
var Brondata = COBA.Productvoorwaarden.OrderBy(O => O.Code);
foreach (var item in Brondata)
{
COBA.LoadProperty(item, "Rubriek");
COBA.LoadProperty(item, "Categorie");
COBA.LoadProperty(item, "Datatype");
XElement REC = new XElement("REC",
Attribute("Rubriek", item.Rubriek.Code),
Attribute("Categorie", item.Categorie.Naam),
Attribute("Code", item.Code),
Attribute("Datatype", item.Datatype.Naam),
Attribute("Eenheid", item.Eenheid),
Attribute("Naam", item.Naam),
Attribute("Omschrijving", item.Omschrijving),
Attribute("UitgebreideTekstVeld", item.UitgebreideTekstVeld),
Attribute("Veld", item.Veld)
);
Result.Add(REC);
}
return Result;
}
This code works fine, but it's slow. It reads all ProductVoorwaarden records but then it has to make round-trips to the server again for every record to retrieve Rubriek.Code, Categorie.Naam and Datatype.Naam. (In the database, these relations are set by an auto-incremental Identity field but the XML code uses Code or Naam as reference.)
As you can imagine, every trip back to the RESTful service just eats up more time, which I'm trying to avoid. So is there any way to speed this all up a bit more just on the client-side?
The server is still under development and the next release will take a few more months. As a result, I have to deal with the options that the server provides right now. If there's no way to speed this up without modifying the server then fine. At least I've tried. There are 35 more tables that need to be processed with a deadline in a few days so if it works, then it works.
You could make each of your COBA.LoadProperty calls asynchronous and run them in parallel rather than sequentially. It will make your client code more complex since you'll have to handle the return of each async call and determine when they have all completed and you're ready to build your XML. Assuming each of your 4 REST calls is taking the same amount of time that would reduce the delay by half.
You've probably already double checked but I have come across cases where generating the enumerator from the lambda expression can be expensive. Still it was in the hundreds of milliseconds and I get the impression your delay is larger than that. May be worth checking.

Categories

Resources