savvyProfiler
User Guide v1.4
n dwSavvy
This document details a step by step usage of savvyProfiler.
Step (1) Go to http://localhost:444/dwsavvy/dwsavvy.php. Note: It’s assumed that Apache is configured to run with PHP and is listening on port 444.
_files/image001.jpg)
2
Step (2): Click on the input menu. This menu allows metadata input along with several other configuration settings.
![]() |
Step (3): Enter the metadata of file/table/dimension/fact to be profiled. Note: In the next release there'll be a wizard for this task. Click on item “Meta Unit” from the “selection panel” in the left.
_files/image003.jpg)
Step (4): Create a new Meta Unit. Any object entered into the metadata repository is defined as a Meta Unit. To profile a file or a table, it first needs to be entered into the metadata repository as a Meta Unit. Click on “add new row” link on the right panel. Enter this form and hit the “create” button. Make sure “profile” is set to 'y'. To return to the list view, click “back” on the left corner of the right panel.
_files/image004.jpg)
Step (4a): To edit, hit the mu_id value for the Meta Unit that you want to edit. Mu Id of each Meta Unit acts as a link to its “edit form”. Here's an example of Meta Unit that's a file, followed by one that's a table. Note: If file path is not specified, it looks in the default server directory which is configured in “app_constants”. Look for constant name = 'extract_area', which specifies the default directory where files for profiling can be picked up. Ensure Profile = 'y'.
_files/image005.jpg)
Step 4b: Example of how a “table” Meta Unit is set.
_files/image006.jpg)
Step 5: Now that table/file Meta Unit is set, we need to enter its columns. Note: If file format is fixed, “format fixed”='y' else 'n'. Click on “mu_column” from the “selection list” to the left. On the right panel, select “Meta Unit” (for the file/table for which the columns need to be defined) from the top section, and hit “continue”. Click on “add new row” link to add new columns. Note: For delimited columns, each field could have a different delimiter.
_files/image007.jpg)
Step 6: Click on “add new row” to add a new column to the Meta Unit.
![]() |
Step 7: To edit a column attribute, click on its col_id link to get to the column input form.
_files/image009.jpg)
Step 8: Since savvyProfiler runs on a scalable dwSavvy (cluster based) platform, there's tremendous opportunity/flexibility to organize and manage data profiling. To achieve this, dwSavvy allows jobs to be grouped. Jobs within each group can either have dependency amongst each other or not. All jobs that can run in parallel within each group are automatically run in parallel. Groups can be prioritized (jobs within each group can also be prioritized) and can have dependencies (dg_level) amongst each other. For instance, all high priority data profiling jobs can be put within one group (for instance, “highPriGroup”) and the rest in other. Here's the screen for defining groups--> Select “dep_group” from the “selection list” on the left panel.
_files/image010.jpg)
Step 9: Assign your Meta Unit a dependency group. Select “mu_dep_group” from the selection panel on left. Select your Meta Unit from the top right panel and hit continue. To assign a new dependency group click on “add new row” from the bottom panel.
_files/image011.jpg)
Hit mugd_id hyperlink to “edit” dependency or “add new row” link to add a new one.
_files/image012.jpg)
Step 10: Create a job by attaching your Meta Unit to a dwSavvy process called “jobProfiler”. Note: The platform is extremely flexible, we can create custom “processes/data searvice” (see “process” input screen for a list of exiting processes) that can be applied to your Meta Unit. Click on “job” from the selection list on the left. Click on the “add new row” link to add a new job. To edit, click on the “job_id” to be edited.
_files/image013.jpg)
Job create/edit screen--> Note the drop down selection box mudg_id and process_id. Select the Meta Unit and dependency group created earlier. Process for data profiling is “jobProfiler”.
_files/image014.jpg)
Step 11: The last step is to create a new batch to run all your jobs. As soon as the batch is “started”, dwSavvy cluster starts processing all the jobs scheduled. If the jobs are not scheduled i.e. they don't have a start_datetime, they start right away. Select “batch” from the selection panel on the left.
![]() |
To create a new batch click on the “add new row” link. To edit, select the batch to be edited and hit the corresponding batch_id link.
_files/image016.gif)
As soon as a new batch is created, and dwSavvy server is running,
your batch should start processing.
Screen to list all the jobs that are ready to run on Clover-dwSavvy platform:
Step 1: Click on “Display” and select “Job Queue”
_files/image017.gif)
Step 2: Select your “data mart” from the selection on the left panel, and hit “Get Data/Refresh” button. Note: You could define multiple data marts; apart from other uses, this product is designed ground up for ASP's that provide data services to multiple clients. Note the Meta Unit name and process name listed to your right.
_files/image018.jpg)
Step3: To monitor jobs being processed in real time, select “Job Monitor” from the “Display” menu. Select “data mart” and “batch” and hit the “Get Data/Refresh” button. Note: “Refresh every” and “Refresh” drop down boxes can be used to refresh the screen periodically.
_files/image019.jpg)
Screens to check your profile results: Step 1: Select “profile-->Results” from the “Display” menu. This screen lists the statistics collected during data profiling. From the left selection panel, select a data mart, and a batch. The batch selection provides the ability to pick profile results from any historical run. After batch selection, 2 additional drop-down for Meta Unit and column selection appear. Note: These drop downs only show the list of Meta Units that got profiled in the selected batch.
_files/image020.jpg)
Step 2: For profile details select from “Display Menu”, “Profile-->Detailed Results”. Follow the same selection rules as the above screen.
![]() |
Check out the Trending tables/Charts: These tables/charts help you trend your profile results over historical batch runs. Selections are similar to above with the addition of PE (Profile Element).
Trending Table: In the “Display Menu” select “Profile--> Across Batches”
_files/image022.jpg)
Trending Chart: In the “Display Menu” select “On Charts”. Selections are similar to above with the addition of chart type.
(1) Line Chart:
_files/image023.jpg)
(2) 3D chart:
_files/image024.jpg)