There are many reasons you may need to move and/or transform your application data: to improve database performance, consolidate data, provide access to other teams in your organization to safely query data, and other reasons specific to your use case. This may seem straightforward to set up yourself, but this can be tedious and difficult to get right without impacting your production database. With PlanetScale Connect (now in Beta), you can easily perform ELT (Extract, Load, Transform) actions on your data to fulfill your application needs.
With the PlanetScale Connect, you can integrate with existing ELT platforms to extract data from your PlanetScale database and safely load it into other destinations for analysis, transformation, and more. For the initial release of this feature, we will support Airbyte Open source as the ELT tool of choice, with plans to expand on this in the future.
Within Airbyte, you’ll be able to select your PlanetScale database as a source. Then, you’ll choose from hundreds of connectors (full list of Airbyte connectors), including Google BigQuery, AWS Redshift, Snowflake, and more. During this configuration, you can perform transformations on your data before loading it into its final destination. This gives you complete control to migrate your data, transform it, and upload it to a new data source with just a few clicks and configurations.
For additional context into our PlanetScale Connect launch, let’s examine some key benefits of implementing an established ELT pipeline.
Offloading your application data to a more suitable data store improves how you maintain and query historical data. For example, your production application may only need readily available data from the previous two months. This means you can offload older data to a different data store that can be queried against without impacting the performance of your main application.
Oftentimes, not every single piece of data that gets stored in a database is needed forever. In these cases, ELT provides a prime opportunity to get rid of unnecessary data during the transformation phase before it is loaded into the new data source.
In addition to consolidating data, you may also find yourself in need of enriching data as part of the transformation process. For example, you may grab additional data from internal and/or external APIs to add additional context and detail to your existing data.
After creating an ELT pipeline that generates the desired outcome, there is no more manual intervention necessary for the process to continue. Your team can continue to work on the highest priority items while ensuring your data pipeline continues to run.
By leveraging an ELT pipeline, you can guarantee your data is always consistent and accurate. This provides the flexibility for upstream application schemas to change while maintaining a consistent format for downstream applications.
For PlanetScale Connect to function as a source for an ELT platform, it needs to address three key issues.
ELT sources should support discovering the schema across all keyspaces in a PlanetScale database and return that in the myriad of formats the ELT tools expect (specially-formatted JSON documents in most cases).
Initial data dump
ELT sources should be able to efficiently return a full data dump of a PlanetScale database. This is incredibly important considering the negative impact an inefficient solution would have on a production database.
Incremental data synchronization
ELT sources should be able to handle the concept of “incremental sync” where it maintains a cursor to describe where and when the data was last synced. This would then be used to query only data that has changed or been added since the previous sync.
If you’d like to try out PlanetScale Connect or just want to learn more, refer to the PlanetScale Connect docs. In the meantime, if you have any feedback on the feature, please let us know.