Inserting PyDataset Records into a Database Video at Inductive University

Transcript

(open in window)

[00:00] In this lesson, we'll demonstrate how to create a new Python function in Ignition to insert the contents of a PyDataset into an existing database table. For our purposes, this PyDataset will be the output of another function created in the prior lesson. This function will be an additional step in the larger process of reading CSV data into a database using scripting as demonstrated in a prior lesson. Since this function involves a database, we'll make use of the Database Query Browser to iteratively develop and test our function. So let's invoke that now by clicking Tools, Database Query Browser. The ultimate destination for our function will be a reusable project scripting library. This allows for reuse of the code in other places. However, we'll use the Script Console as a development workspace to write and test our code. So let's invoke it by clicking Tools, Script Console, and let's also maximize it since we'll be needing some screen space for this function.

[01:13] Let's start by defining the function interface. We'll call our function populate_db_table_from_ PyDataset. We'll make our font a bit bigger also. The function inputs assume we know these three things to start with: a PyDataset, the data records we're trying to insert, the destination database table, assumed to already exist, and a database connection name. Next, let's plan out the execution flow. Note that all following lines must be indented. We'll want to clear the table each time, set up a basic query string, And loop over all PyDataset records, and do the inserts. We want to clear the table contents each time so that it tracks the input PyDataset. Also, because the first column primary key must be kept unique and we want to fully update the table, not just append to it.

[02:19] Let's create the query statement using Python string formatting syntax to embed the database table input. Then the system library call to execute this statement. Next, let's create the basic insert query, which is central to this function. Here the question marks are data value placeholders, and we'll return to these shortly.

[03:15] We'll get our data by looping over the entire PyDataset. Then to execute the query, we'll use another system library function with a query, data, and database connection name. Though this would work, it'd be hugely inefficient, since we are inserting one row at a time over and over. Let's improve upon this basic structure. First, some initial refinements. For starters, let's carve out the placeholders and replace them in the query string. We won't always have exactly 5 parameters, so let's generalize this.

[04:07] We can use the bottom of the Script Editor to do some experimenting and testing. The code above it won't execute unless called since it's just a function definition. Let's use our prior function to get some data to work with and find how many data fields we do have. Here is one placeholder. Then here are any number of them. Then to comma separate them, we can use Python's join syntax. and to enclose all that in parentheses. Let's print these to make sure they're correct. Looks good. Since it all works, let's combine these into one compact statement and move this new expression back into our function, as well as the actual count of parameters.

[05:29] Then finally, we can remove most of these test expressions at this point. Next, to make the database insertions more efficient, let's aggregate all data and do one insert at the end. First in the query string, add one more level of indirection. In this context, pholders is now no longer the variable, but instead a named argument. Next, ahead of the for loop, define two empty lists to hold data and placeholder arguments. Then inside the loop, for the data, use extend to create one long list, and for the placeholders, inserts are already grouped, so append them. Then move the update query outside and after the loop, and use the aggregate query data in the insert.

[06:29] Then finally, update the original query string to use the named argument. This might be a good place to spot test what we have so far. We'll set up the needed two database parameters, but here it should be noted that the database table name should not be allowed to be specified as direct user input text, to help ensure database security. Then call the function above using them.

[07:10] When we return to the Database Query Browser, if we select the database connection, double click the products table, and execute the select all query, our script changes seem to run without any issue so far, so let's move on. What we have so far is a bulk insertion of all PyDataset records at once. While this is fine for reasonably sized sets of data. If we try to insert a massive number of records all at once, this might not be desirable or efficient from a database standpoint. So as a final update, let's batch up the inserts into fixed size groupings to allow for user tuning. For example, maybe we want to insert in groups of 10 or a hundred or a thousand at a time. We'll add one more method input for this, and also down below for testing. Next, we need to know how many records we are working with.

[08:06] We'll add that alongside the columns count near the beginning. Finally, let's use this as a check to do no inserts if there is no data. We'll add this for all lines after the truncate, but we will clear the table, since we wanted to reflect the state of the input PyDataset. So instead of all records, we'll do them in fixed size batches. So we'll work with specific slices of a PyDataset instead. Let's plan out the updates. We'll add indices for the first N records. Update an earlier comment to reflect N sized batches. Insert those N rows. Advance the indices for the next N records. And finally, address any remaining records.

[09:09] So making the noted updates, we'll start with the index bounds of the first insert. We'll add an outer while loop, tab over all contained lines, but not including leftover records. Adjust the actual four looping indices. And advance both indices to the next N records. Finally, we need to handle any leftover records. All we need to do is, advance the max index to the actual number of records. Then simply copy the steps we already have above, and outdent as required. Now we do need to differentiate between versions of the query string.

[10:11] The first modified one becomes queryStr1 in two places, and the copied one becomes queryStr2 in two places. Finally, we'll update the last insert to reflect a partial insert. Let's once again spot test what we have created. In particular, let's confirm the batched inserts into the database. If we reopen our CSV file, let's add two more lines. The data can be random, but the primary key, the first field, must be unique, so we can just copy paste these first two lines, and edit only the first two values. We'll save our changes and minimize this window. Then let's add prints of the queryArgs and queryData ahead of each of the inserts.

[11:17] If at the bottom we set the maxRecs to 3, we should see one insert of three records and a remainder insert of one record, which is just what we see when we execute. Our function now looks complete. The only remaining step is to relocate it into the project library. So remove the print statements, cut all lines of the function, restore our Designer, open the toolbox project library, paste the function, and save the changes. Then for a final test, we can reopen the Script Console, reset the maxRecs to say maybe 1000, and replace the last function call with its toolbox version.

[12:24] And notice it's very easy to script this method now, since we chose input parameter names that align with the function input names. Now we can execute the script, and go back to our Query Browser, rerun the Select All query and inspect the product table contents in the Query Browser against the contents of our CSV file. We've now shown the entire process of creating a reusable utility function, which inserts the contents of an input PyDataset into a database, in user specified insert batches.

In this lesson, we'll demonstrate how to create a new Python function in Ignition to insert the contents of a PyDataset into an existing database table. For our purposes, this PyDataset will be the output of another function created in the prior lesson. This function will be an additional step in the larger process of reading CSV data into a database using scripting as demonstrated in a prior lesson. Since this function involves a database, we'll make use of the Database Query Browser to iteratively develop and test our function. So let's invoke that now by clicking Tools, Database Query Browser. The ultimate destination for our function will be a reusable project scripting library. This allows for reuse of the code in other places. However, we'll use the Script Console as a development workspace to write and test our code. So let's invoke it by clicking Tools, Script Console, and let's also maximize it since we'll be needing some screen space for this function.

[01:13]          Let's start by defining the function interface. We'll call our function populate_db_table_from_ PyDataset. We'll make our font a bit bigger also. The function inputs assume we know these three things to start with: a PyDataset, the data records we're trying to insert, the destination database table, assumed to already exist, and a database connection name. Next, let's plan out the execution flow. Note that all following lines must be indented. We'll want to clear the table each time, set up a basic query string, And loop over all PyDataset records, and do the inserts. We want to clear the table contents each time so that it tracks the input PyDataset. Also, because the first column primary key must be kept unique and we want to fully update the table, not just append to it.

[02:19]          Let's create the query statement using Python string formatting syntax to embed the database table input. Then the system library call to execute this statement. Next, let's create the basic insert query, which is central to this function. Here the question marks are data value placeholders, and we'll return to these shortly.

[03:15]          We'll get our data by looping over the entire PyDataset. Then to execute the query, we'll use another system library function with a query, data, and database connection name. Though this would work, it'd be hugely inefficient, since we are inserting one row at a time over and over. Let's improve upon this basic structure. First, some initial refinements. For starters, let's carve out the placeholders and replace them in the query string. We won't always have exactly 5 parameters, so let's generalize this.

[04:07]          We can use the bottom of the Script Editor to do some experimenting and testing. The code above it won't execute unless called since it's just a function definition. Let's use our prior function to get some data to work with and find how many data fields we do have. Here is one placeholder. Then here are any number of them. Then to comma separate them, we can use Python's join syntax. and to enclose all that in parentheses. Let's print these to make sure they're correct. Looks good. Since it all works, let's combine these into one compact statement and move this new expression back into our function, as well as the actual count of parameters.

[05:29]          Then finally, we can remove most of these test expressions at this point. Next, to make the database insertions more efficient, let's aggregate all data and do one insert at the end. First in the query string, add one more level of indirection. In this context, pholders is now no longer the variable, but instead a named argument. Next, ahead of the for loop, define two empty lists to hold data and placeholder arguments. Then inside the loop, for the data, use extend to create one long list, and for the placeholders, inserts are already grouped, so append them. Then move the update query outside and after the loop, and use the aggregate query data in the insert.

[06:29]          Then finally, update the original query string to use the named argument. This might be a good place to spot test what we have so far. We'll set up the needed two database parameters, but here it should be noted that the database table name should not be allowed to be specified as direct user input text, to help ensure database security. Then call the function above using them.

[07:10]          When we return to the Database Query Browser, if we select the database connection, double click the products table, and execute the select all query, our script changes seem to run without any issue so far, so let's move on. What we have so far is a bulk insertion of all PyDataset records at once. While this is fine for reasonably sized sets of data. If we try to insert a massive number of records all at once, this might not be desirable or efficient from a database standpoint. So as a final update, let's batch up the inserts into fixed size groupings to allow for user tuning. For example, maybe we want to insert in groups of 10 or a hundred or a thousand at a time. We'll add one more method input for this, and also down below for testing. Next, we need to know how many records we are working with.

[08:06]          We'll add that alongside the columns count near the beginning. Finally, let's use this as a check to do no inserts if there is no data. We'll add this for all lines after the truncate, but we will clear the table, since we wanted to reflect the state of the input PyDataset. So instead of all records, we'll do them in fixed size batches. So we'll work with specific slices of a PyDataset instead. Let's plan out the updates. We'll add indices for the first N records. Update an earlier comment to reflect N sized batches. Insert those N rows. Advance the indices for the next N records. And finally, address any remaining records.

[09:09]          So making the noted updates, we'll start with the index bounds of the first insert. We'll add an outer while loop, tab over all contained lines, but not including leftover records. Adjust the actual four looping indices. And advance both indices to the next N records. Finally, we need to handle any leftover records. All we need to do is, advance the max index to the actual number of records. Then simply copy the steps we already have above, and outdent as required. Now we do need to differentiate between versions of the query string.

[10:11]          The first modified one becomes queryStr1 in two places, and the copied one becomes queryStr2 in two places. Finally, we'll update the last insert to reflect a partial insert. Let's once again spot test what we have created. In particular, let's confirm the batched inserts into the database. If we reopen our CSV file, let's add two more lines. The data can be random, but the primary key, the first field, must be unique, so we can just copy paste these first two lines, and edit only the first two values. We'll save our changes and minimize this window. Then let's add prints of the queryArgs and queryData ahead of each of the inserts.

[11:17]          If at the bottom we set the maxRecs to 3, we should see one insert of three records and a remainder insert of one record, which is just what we see when we execute. Our function now looks complete. The only remaining step is to relocate it into the project library. So remove the print statements, cut all lines of the function, restore our Designer, open the toolbox project library, paste the function, and save the changes. Then for a final test, we can reopen the Script Console, reset the maxRecs to say maybe 1000, and replace the last function call with its toolbox version.

[12:24]          And notice it's very easy to script this method now, since we chose input parameter names that align with the function input names. Now we can execute the script, and go back to our Query Browser, rerun the Select All query and inspect the product table contents in the Query Browser against the contents of our CSV file. We've now shown the entire process of creating a reusable utility function, which inserts the contents of an input PyDataset into a database, in user specified insert batches.

We are experiencing playback issues from our video hosting provider. Please check back shortly.

Supplemental Videos