Today I want to talk a bit about accessing Tableau Metadata outside of the web based implementation you all may be most accustomed to. There are several reasons why I think you may want to consider adding this as a potential option moving forward. The biggest reason in my opinion might be the node limit in your queries. The web based query tool limits this to 20,000 nodes and depending on how your query is set up, this may be far too restrictive.
Tableau Server Client is a wonderful tool for interacting with all aspects in our Tableau environment. It can help manage dashboards, data sources, connections and more without the GUI overhead which can get in the way sometimes.
I will be using Tableau Server Client to establish a connection from a python notebook, send a query in paginated chunks to avoid the node limitations, and finally unpack that data neatly in a single dataset ready to be used.
Step 1: Create a Personal Access Token (PAT)
Alright, first thing we need to do in order to connect Python to Tableau is to create our Personal Access Token. Log into your Tableau portal and navigate to 'My Account Settings' under your initials in the top right corner of the screen. About halfway down you'll see the setting we're looking for.
Type a name for the token you want to use. Once your happy with the name, create the token and you will see a new pop-up on screen containing the secret code tied to this particular token. Copy this as we'll need it for the next step. I recommend keeping this open to ensure we're able to paste it into the configuration file.
One thing to note, these Personal Access Tokens are supposed to last a year but may be reset by your company's IT team. I have had them deactivate about every month or so which had caused some confusion in troubleshooting. So if you come back to an old project that uses this and it no longer works, check and see if the token is still active.
Step 2: Create a Config File
Given that this is a notebook file that would likely be visible it is not a great idea to just hard code your credentials for everyone to see and access. To address this, we'll be using a separate configuration file that is just local to you.
We'll need a few items to get started:
Server URL address,
Company Server Redirect address,
The name of the token you created previously
and finally, the token secret.
Step 3: Install Tableau Server Client Python API
This is pretty simple. Send the commands in the image below to your active python kernel and wait for it to finish installing. You may need to restart the kernel to ensure it is installed and visible.
Step 4: Load the Config File
Once you have a configuration file setup that holds all of your credentials we need to load that into our notebook. I am using a .yaml file but you can use whatever you like best. I am assigning the fields in the configuration file to their own object that I can use in the next step when making the connection request.
Step 3: Initialize Connection to Tableau Server with your PAT
Tableau Server Client has a really nice API that handles connection and querying that we'll use for this walkthrough. Because we made a Personal Access Token earlier, we'll use that to establish the connection. We pass in the credentials we loaded earlier and then send it off to our Tableau server to verify and establish the connection.
Step 4: Write and Send Your Query
Now that we have the connection all set up and we have access to Tableau data, we need to write and send our query to it. For this example, I am going to query some metadata about the formulas used in calculated fields in all of our published data sources. This uses the same GraphiQL syntax you are used to, so it's a copy and paste between the three double quoted section with a tiny bit of extra information.
Since our data has a lot of nodes, we'll want to call 'server.metadata.paginated_query()' rather than just 'server.metadata.query()'. This is done by that header section used by Tableau Server Client query structure where we can specify the batch size. In this case I am setting it to 'first: 1000, after: null'. This indicated that I want this to pull batches of 1000 nodes and keep going until no more are found as specified by the 'after: null' section.
Step 5: Restructure Paginated JSON Output
The output returned by this query will be in a pretty densely packed json object that isn't the most friendly for data consumption so we'll have to unpack it a bit first. To get started, I am transforming this json object into a pandas DataFrame which will make a lot of the unpacking a bit easier in the long run. To get started, I am calling the pandas function 'json_normalize' on the results object I created in the previous step and directing it to 'pages' since we pulled the data in batches. This unifies all of these batches into a single output to work with.
I am selecting the columns that I want to unpack which starts with the connection nodes and finally ending at the 'fields' column. From there it will begin to create new columns from the nested data which starts to resemble a traditional dataset. Occasionally we'll want to remove some unwanted fields that may come from this like an unnamed Index column just to keep things tidy.
Step 6: Save The Output
Once you have the data drilled down to the level you want and in a file format that you like to use, all that's left is to save it. You can save it directly as a csv file as I am doing here or you could directly load it into your data warehouse if that ends up being useful to you. (Could do another post about that later)
Conclusion
I hope this walk-though was helpful in showing the possibilities of Python and Tableau working together. This is only scratching the surface of what we can do when we join these two tools. I know I only showed metadata querying, but you can do so much more from pulling actual data loaded into the dashboard all the way to pulling a list of data sources and the credentials used there.
Comments