How to create Create Data Ware house from scratch - Black keyhole

welcome to our keyhole, we provide you various knowledge from various sources such as Electric&Electronic, Health, Arduino, IOT, Diy projects, A/l past paper and model paper, Even though you can keep in touch with current technology

How to create Create Data Ware house from scratch

Share This








here I have created data warehouse using ETL tools.without wasting too much time on Introduction let's move into the procces to get the hands on dirty.

Inorder do that in your local Machine you should have to Install the following requirements in your machine

  The dataset here selected is as the IDA statement transactional data. This dataset contains the column of project Borrower ,his credidt card number,country ,region also it includes the project details which was bought by the Borrower. Also it includes some 3rd party details . further more it has first Repayment Date, last Repayment Date, AgreementDate, BoardApprovalDate etc.

 

Description about dataset

   This csv types of dataset contains multiple sectors of column. I have attached the csv files in this document.

 

Columns are in this csv file

   1.End of period

   2. Credit Number

   3. Region

   4. Country Code

   5. Country

   6. Borrower

   7. Credit Status

  8. Service Charge Rate

  9. Currency of Commitment

  10. Project ID

  11. Project Name

  12. Original Principal Amount

  13. Cancelled Amount

   14. Undisbursed Amount

   15. Disbursed Amount

   16. Repaid to IDA

    17. Due to IDA

    18. Exchange Adjustment

    19. Borrower's Obligation

   20.  Sold 3rd Party

    21. Repaid 3rd Party

     22. Due 3rd Party

     23. Credits Held

     24. First Repayment Date

     25. Agreement Signing Date

      26. Board Approval Date

     27. Effective Date (Most Recent)

      28. Closed Date (Most Recent)


Solution Architecture

 

To Create a Data ware house using SQL server , the step by step guide  as follows.

 

If you want to make query and get analysis from your database you can’t simply do with your OLTP data structure.If you will be going do with that it slow down your bussines process. In order to make analysis from historical data you need to aggrecate the database into one data base.

That what we call data warehouse. In order to create a data warehouse we have following Architecture.







pic from google




1.Data Scource

 

Here I have used my data scource as my csvfile to load into the database 

 

 

2.     Staging Area

  A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories. The ETL tool  means Extract Transfer Load. This will Extract the data from data scources and transfer and load it into the Data warehouse.

 

3.    Warehouse.

The warehouse contains all the aggregate database in a one data house in a properly structured manner.

 

4.    Data mart

The  data mart contains specific set of datas from the data warehouse. As usual the datamart is a subset of data warehouse.

 

5.    User

In the user part, the user can able to Analyze, get reports from the Historical dataset using power bi tools.

 

Now we take look at how to load the data from the data source to data warehouse

 

Data warehouse design and development

 

Here I have used csv file as data scource to load into the data warehouse.












Star schema

 

   The data could be stored in a single table where the attributes are repeated on each row (like the source for an Excel pivot table). However, that would take up much more space and makes managing the dimensions more difficult.so star schema maps the table very well structure and reduce the data redundancy.




Here is my star schema for my dataset . It contains fact tables and multiple dimensional tables. The fact table contains foreign keys of other dimensional tables. Also it includes depentat colums values that interects with other dimensional tables. The dimensional  tables contain descriptive attributes.

 

Let’s get into workplace to get hands on dirty


     


Open your  Report server  configuration manager and and click start it.  This tool helps us to customize SQL Server Reporting Service parameters and configurations





Then go to the database option then click onto the change database option.





Then click onto create a new report server database option.after that click onto next option.





Then let the options as defult and click onto the next option.





Again let the options as defult and click onto the next option.


Then click onto Apply option




Then open your Microsoft SQL server management studio to create a database. Then click on connect it.






To create a new project go to file system click on new file and click to open a new project.




Then select the directory and save your project using your wishfull name.




Click on the Database option after that create a database



Give a name for your database and click ok


Then expand the table to create new  table.




Then create other column for my previously arranged dimensional tables





Here I have set my Region_IDS as my region dimensional table’s primary key.

Following images are that created for other dimensional tables.










Then create a Fact table by placing the dimensional tables primary key into the fact table as foreign key



Then take look how we will be going to make relation those primary key with the fact table.




Right click onto the value that you want to make foreign key and click on Relationship





Click on add




Then click on the symbol shows “…”



Here you have to set the value primary key of dimensional table and another side you have to define which value that you will be going to make sure in the foreign key table.

As well n as you have to do for other values too











create ETL tools to using visual studio


First you have to install Sql server intergration service using manage extension 




 

Then now create new project by using Intergration service project




Then click on control flow layer and select the data flow task from the side bar and drag and drop it on the work place. If you wish rename it. And again double click on it.It will allow to extract the data file from scource to the destination.



Then in side bar select flat file and drag and drop it on the work flow. And rename . then double click to open it.



Then click on the new




Then select the csv file that you want to upload










From the side bar select the Data conversation and drag and drop it on the workflow. Then rename it and double click to open it



Here you can select the data column and can change the data type as well




Now here I have used Loook up tools to compare scource and destination data. It filters out the matched and unmatched data in the specific destinations. In my Warehouse table I have 5 dimensional table and one fact table. To make sure to initialize and insert the values into the fact table and dimensional table. I have used Look up tool to insert the value from scource file to dimensional table. The dimensional tables primary key has been assigned into the fact table foreign.

key. In order to refer the Dimension tables primary key into the fact table foreign key, I have to map the scource column which are corresponded to the dimensional table. So the steps are follows. 



Here I have used flate file manager to transfer the data scource column type to that corresponds with destination column data type.Here In this picture I have done for one column like wise I have to do for other remaing columns.



Now double click on the Look up here. And then select yout server,dimensional table, then click on the column for mapping. Map the corresponded column from scource to destination column. Then check the primary key which wants to refer in the fact table. Like wise you have to do for other dimensional tables.



 

Use connections Lookup for match output


Then select the OLE DB destination fromside bar and drag and drop it on work flow.then change its name. then double click to open it.



Then click on new



Then select your server name and the database name .

 


Then click on Mappings and connect the corresponded columns from scources to destination column.


Populate the data into Dimensional Table


Then again create a new file . Then click on control flow layer and select the data flow task from the side bar and drag and drop it on the work place. then double click on it .place the flat file on the data flow layer.  Then use Multicast tool from the SSIS tools. It can populate the data to one or more destination. Here I have used five de



Then map the each and every column from the source to the dimensions tables as follows





  



Then double click on the  data flow layer and click the execute button

Now I have execute the SQL command to get the updated data in region dimension  table



Then execute the previous flat file flow 


Then I have used SQL command to get the populated flat file data



 


Now you can see in the above picture the dataset from the csv file has been successfully uploaded into the flat file and other dimensional tables.

Hope you have get the idea of how to load the  data from OLTP database structure to OLAP database structure.





No comments:

Post a Comment

business queries

Pages