A little background
Table of Contents
At my job, I use Tableau Prep, Desktop and Server to create reports and visualize data from our Integrated Library System (ILS). Each member library uses this system to complete transactions and store information about library materials and patrons. The ILS uses a PostgreSQL database, and we can only access read-only views of the data. The views in question have very limited data retention. The retention default is useless regarding annual reports, so we had to figure out a way to offload the data.
In the following article, I will be discussing our solution, which incorporates the following from Tableau;
- Tableau Desktop
- Tableau Server
- Tableau Data Prep
I believe there are ways to achieve similar results without all of these services, but I am familiar with Tableau, and our organization is already paying for these services.
My first solution
My starting point on this project was to address the default data retentions in the PostgreSQL view. By default, the retentions are set to store only about a week’s worth of anonymized circulation data. The only use for this data was for daily checkout reports. So, we started by increasing the database’s circulation data retention to 380 days. This would let us collect an entire year’s worth of data, plus an additional 15 days of buffer to run reports or queries. We were covered as long as we remembered to run our reports or questions in this 15-day window. This data was live, so there wasn’t a good way to get it back once it was gone.
My second solution
Using Tableau desktop’s export feature, I could create static reports. These static reports include an entire year’s worth of data. Each year, I could create a new export and a new report. The problem I ran into was that if I ever wanted to update any of the reports, I had to be very careful to avoid updating the data source. Tableau desktop likes to run data source refreshes whenever you make any changes to the worksheets. I started off saving the exports to my local computer and then moved on to storing them as a published data source on our Tableau server.
Again, this became cumbersome, as I could not automate the connections of reports on our Tableau server to these exported files. I had to set myself a reminder at a certain date each year to pull an export and read my notes on how to avoid overwriting the other existing exports.
My third solution
As I was working through the previous two solutions, I thought it would be great if we could pull an entire export of the circulation data periodically and offload it to a separate database. I was a newbie with my data ETL (extract, transform, load) knowledge then. I initially relayed the idea to our systems administration team, thinking they could set up a cron job to handle the process. Although it was a generally accepted idea, they never had the time to sit down and find a solution. So, I thought I would be stuck with my second solution and its limitations.
Then along came Tableau Prep. Announced in 2018, Tableau Prep provided a great set of simple ETL tools. Since our organization is a non-profit, we got a Tableau Prep license for extremely cheap. Initially, I was just using the tool to clean up and optimize my data sources. Also, around this time, I was in a database administration class and learning to use Microsoft SQL Server. Then, one day, something clicked.
I determined I could set up a Microsoft SQL Server database to store all my exports. I could use Tableau Prep to combine my exports into one file and then manually load them into the database. This was a game changer in our circulation history, and it was how we operated until I learned more about automating Tableau Prep files using Tableau Server and Incremental Refresh.
Introducing Tableau Prep Incremental Refresh
After several months, I noticed the new option to output to a database table in Tableau Prep. I immediately thought about how to automate the circulation exports I had been doing manually at the end of every year. With the output to a database table, I could post the Tableau Prep flow file to our Tableau Server and have it set to run every day. Remember that I only had the ILS server set to retain 380 days, so I didn’t want to perform full refreshes of the data each day. That would wipe out my history, and the Microsoft SQL database would just be a copy of the ILS database.
Incremental Refresh in Tableau Prep to the rescue. Incremental refresh reads the data in the destination database using a date or ID field. It then determines the data that exists in the source database but not in the destination. It only appends the new data to the destination database. Since I could schedule this to run each day, I would have an automated system to allow us to grow our historical data as much as we needed.
I connected to the Microsoft SQL server with Windows authentication in my initial setup. Initially, I ran the flow as a full refresh to set the stage for the incremental refreshes. The full refresh worked fine in the Tableau Prep program. So, I posted the flow file to the Tableau server and set it to a scheduled run. After a few days, I returned to the Tableau server to see if the refreshes were running, but I was met with errors. I tracked down the problems and found that to run incremental refresh; I needed to use a regular database user account rather than Windows authentication.
A word of warning. If you start with Microsoft SQL Server Express, you can create a free database of up to 10GB in size. I started with this setup and, after almost a year, started to get close to the limit. So, we upgraded the database and are now paying a nominal annual fee for a standard Microsoft SQL Server license. So I would advise using a different database such as PostgreSQL or MariaDB, for example, or planning and knowing when your data might exceed the Microsoft SQL Server Express limits.
Conclusion
At this point in time, I have two separate ILS servers and am using this incremental refresh and circulation data offload procedure to store extended historical data. We have about 3 years’ worth of data for one server and 1.5 years for the other. The export jobs do not require a lot of maintenance. I check on them in the Tableau Server every couple of weeks to ensure there aren’t any connectivity errors. The good thing is that because of our ILS database retention, as long as I resolve any errors within 380 days, the incremental refresh will catch up, and no data will be lost.
I now know more programmatic ways to do this same thing. However, because we don’t have to pay much for our Tableau licenses, my solution works perfectly fine for completing the job. Please let me know if you have any clarifying questions about the process I have set up.
Leave a Reply