In this blog we are going to cover how to improve the performance of your Helical Insight instance. When all of these steps are followed it will improve the performance a lot.
- Cache preparation and speeding up the cache preparation:As soon as a user creates the database connection on our datasource page our application starts preparaing the cache. In this cache we are preparing the list of all the catalogs, schemas and table names which are present in that specific connection.
- Speeding up of fetching of Tables list: As mentioned in the above point as soon as a user creates a datasource the Helical Insight application starts fetching the list of tables for that specific datasource. There are processes which run to fetch the list and by default in every process/thread we fetch 25 table names. This count can also be increased to fetch more table names in a single
There is a file called “project.properties” and the file location is “..\hi\apache-tomcat-7\webapps\hi-ee\WEB-INF\classes”. You can open this file in edit mode and search for “metadataThreadsTableThreshold=”. It is generally present on line no 32. You can see that here it woudl be mentioned as 25, you can increase this upto 200. More advisable is keep it to 150.
Once done please restart the server. Now the process will fetch 150 table names in a single process.
- Migration from Derby to Other Database: By default our application comes with Derby database which is a light weight file based database. This database stores details of cache, cached data, user role profiles etc.
Derby is not an enterprise level database. By migrating from Derby to some other DB like MySQL, Postgress, SQLServer etc it is possible to improve the performance as well.
- Optimal hardware: Based on the number of users who are accessing the application concurrently, the amount of data that is being transferred as well as the number of users who are using export the hardware sizing can be optimally decided. The hardware sizing will also depend on whether any middleware like Apache Drill, Dremio, etc are being used. People who are exporting the data also consume a huge amount of memory since with export of raw data al the dataset needs to be exported.
- JVM: Based on the hardware allocated to Helical Insight it is also important to allocate the appropriate JVM to the application.
Try to give as much as possible JVM to the application.
- Caching: We are having a caching functionality with cache timeout defined as 24 hours. If you think that yoru reporting db/DW is having new data refresh even latter than 24 hours then you can increase the timeout further.
- Raw Data Export: Charts generally have aggregated dataset, so exporting them does not lead to a lot of load on the BI server or DB server. Whereas there might be tabular reports as well which can have really huge amount of data with no aggregation applied. Though when being viewed at the frontend there is pagination which only shows first 10 records whereas the other set of records is being fetched and loaded in the subsequent pages.
- Load Balancing: Helical Insight application also supports Load balancing. You can have multiple servers between which you can distribute your load. You can refer to the below link to understand how to implement load balancing across multiple servers.
- Database Tuning: Generally whenever a person is viewing the report we are depending on the performance of the DB to give us the aggregated response to show the data at the frontend. Hence having a data warehouse / an optimized reporting database with Indexes etc will automatically help in making sure that the queries are very fast, thus the reports will also be rendered very fast.
- Inmemory Mode: Helical Insight can also be used in an In-memory mode wherein it is possible to load the dataset required for reporting from your transactional DB into a local columnar fast DB (like Presto, Spark, etc) and in turn Helical Insight will connect to this one. Since everything is present locally in RAM the performance is extremely fast here.
So in case if you make a DBconnection which has got lot of tables like 5000 tables 10000 tables etc and then you immediately visit the metadata page, since the Helical Insight application have not had sufficient time to prepare cache, it might not be visible and you might get errors like “could not obtain cache”. So you should actually wait and when the cache is ready then only create metadata from the metadata page.
The most ideal way is you can create a new user at your DBlevel and this user should only have access to few tables which would be required for creating the reports and dashboards. This way since the cache to be prepared is for less number of tables, the cache preparation or cache refresh etc will always be very fast. Also even if you restart the server (in which case the cache gets removed) or refresh the cache etc even then in this case at the metadata page the performance would be very fast.
Please refer to the blog Overview of Home Page in Helical Insight 3.1, it has a section called cached datasources wherein you can understand more about cache and different statuses of cache.
You can refer to the below link to know more about doing the same thing.
First thing which should be done is to increase the hardware sizing in conjunction with the amount of data and the number of users. Please make sure that the server allocated is dedicated hardware. A Linux server is always more preferable than a windows server since it is not having a GUI and thus automatically the performance is faster.
You can always reach out to firstname.lastname@example.org to help you understand what could be the correct hardware sizing for your use case.
Even at report level also you are having cache timeout refresh option. Usage of Cache Refresh Script
Note: Disabling cache memory, or having cache refresh for each and every report at a very short span of time will lead to performance degrading.
Whereas if a person clicks on the export of this report, the entire data is to be downloaded from the DB server and then downloaded to the local laptop. Hence in those cases, this can result in performance issues. Hence the data export should be limited to only a few users / should be scheduled for a time when the load is lowest via HWF (Helical Workflow).
You can reach out to us at email@example.com to know more about any of the steps mentioned above.