How to create Create Data Ware house from scratch

here I have created data warehouse using ETL tools.without wasting too much time on Introduction let's move into the procces to get the hands on dirty.

Inorder do that in your local Machine you should have to Install the following requirements in your machine

SQL Server 2019

SQL Server Management Studio

Download Microsoft SQL Server 2019 Reporting Service

visual Studio 2019

The dataset here selected is as the IDA statement transactional data. This dataset contains the column of project Borrower ,his credidt card number,country ,region also it includes the project details which was bought by the Borrower. Also it includes some 3^rd party details . further more it has first Repayment Date, last Repayment Date, AgreementDate, BoardApprovalDate etc.

Description about dataset

This csv types of dataset contains multiple sectors of column. I have attached the csv files in this document.

Columns are in this csv file

1.End of period

2. Credit Number

3. Region

4. Country Code

5. Country

6. Borrower

7. Credit Status

8. Service Charge Rate

9. Currency of Commitment

10. Project ID

11. Project Name

12. Original Principal Amount

13. Cancelled Amount

14. Undisbursed Amount

15. Disbursed Amount

16. Repaid to IDA

17. Due to IDA

18. Exchange Adjustment

19. Borrower's Obligation

20. Sold 3rd Party

21. Repaid 3rd Party

22. Due 3rd Party

23. Credits Held

24. First Repayment Date

25. Agreement Signing Date

26. Board Approval Date

27. Effective Date (Most Recent)

28. Closed Date (Most Recent)

Solution Architecture

To Create a Data ware house using SQL server , the step by step guide as follows.

If you want to make query and get analysis from your database you can’t simply do with your OLTP data structure.If you will be going do with that it slow down your bussines process. In order to make analysis from historical data you need to aggrecate the database into one data base.

That what we call data warehouse. In order to create a data warehouse we have following Architecture.

pic from google

1.Data Scource

Here I have used my data scource as my csvfile to load into the database

2. Staging Area

A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories. The ETL tool means Extract Transfer Load. This will Extract the data from data scources and transfer and load it into the Data warehouse.

3. Warehouse.

The warehouse contains all the aggregate database in a one data house in a properly structured manner.

4. Data mart

The data mart contains specific set of datas from the data warehouse. As usual the datamart is a subset of data warehouse.

5. User

In the user part, the user can able to Analyze, get reports from the Historical dataset using power bi tools.

Now we take look at how to load the data from the data source to data warehouse

Data warehouse design and development

Here I have used csv file as data scource to load into the data warehouse.

Star schema

The data could be stored in a single table where the attributes are repeated on each row (like the source for an Excel pivot table). However, that would take up much more space and makes managing the dimensions more difficult.so star schema maps the table very well structure and reduce the data redundancy.

Here is my star schema for my dataset . It contains fact tables and multiple dimensional tables. The fact table contains foreign keys of other dimensional tables. Also it includes depentat colums values that interects with other dimensional tables. The dimensional tables contain descriptive attributes.

Let’s get into workplace to get hands on dirty

Open your Report server configuration manager and and click start it. This tool helps us to customize SQL Server Reporting Service parameters and configurations

Then go to the database option then click onto the change database option.

Then click onto create a new report server database option.after that click onto next option.

Then let the options as defult and click onto the next option.

Again let the options as defult and click onto the next option.

Then click onto Apply option

Then open your Microsoft SQL server management studio to create a database. Then click on connect it.

To create a new project go to file system click on new file and click to open a new project.

Then select the directory and save your project using your wishfull name.

Click on the Database option after that create a database

Give a name for your database and click ok

Then expand the table to create new table.

Then create other column for my previously arranged dimensional tables

Here I have set my Region_IDS as my region dimensional table’s primary key.

Following images are that created for other dimensional tables.

Then create a Fact table by placing the dimensional tables primary key into the fact table as foreign key

Then take look how we will be going to make relation those primary key with the fact table.

Right click onto the value that you want to make foreign key and click on Relationship

Click on add

Then click on the symbol shows “…”

Here you have to set the value primary key of dimensional table and another side you have to define which value that you will be going to make sure in the foreign key table.

As well n as you have to do for other values too

create ETL tools to using visual studio

First you have to install Sql server intergration service using manage extension

Then now create new project by using Intergration service project

Then click on control flow layer and select the data flow task from the side bar and drag and drop it on the work place. If you wish rename it. And again double click on it.It will allow to extract the data file from scource to the destination.

Then in side bar select flat file and drag and drop it on the work flow. And rename . then double click to open it.

Then click on the new

Then select the csv file that you want to upload

From the side bar select the Data conversation and drag and drop it on the workflow. Then rename it and double click to open it

Here you can select the data column and can change the data type as well

Now here I have used Loook up tools to compare scource and destination data. It filters out the matched and unmatched data in the specific destinations. In my Warehouse table I have 5 dimensional table and one fact table. To make sure to initialize and insert the values into the fact table and dimensional table. I have used Look up tool to insert the value from scource file to dimensional table. The dimensional tables primary key has been assigned into the fact table foreign.

key. In order to refer the Dimension tables primary key into the fact table foreign key, I have to map the scource column which are corresponded to the dimensional table. So the steps are follows.

Here I have used flate file manager to transfer the data scource column type to that corresponds with destination column data type.Here In this picture I have done for one column like wise I have to do for other remaing columns.

Now double click on the Look up here. And then select yout server,dimensional table, then click on the column for mapping. Map the corresponded column from scource to destination column. Then check the primary key which wants to refer in the fact table. Like wise you have to do for other dimensional tables.

Use connections Lookup for match output

Then select the OLE DB destination fromside bar and drag and drop it on work flow.then change its name. then double click to open it.

Then click on new

Then select your server name and the database name .

Then click on Mappings and connect the corresponded columns from scources to destination column.

Populate the data into Dimensional Table

Then again create a new file . Then click on control flow layer and select the data flow task from the side bar and drag and drop it on the work place. then double click on it .place the flat file on the data flow layer. Then use Multicast tool from the SSIS tools. It can populate the data to one or more destination. Here I have used five de

Then map the each and every column from the source to the dimensions tables as follows

Then double click on the data flow layer and click the execute button

Now I have execute the SQL command to get the updated data in region dimension table

Then execute the previous flat file flow

Then I have used SQL command to get the populated flat file data

Now you can see in the above picture the dataset from the csv file has been successfully uploaded into the flat file and other dimensional tables.

Hope you have get the idea of how to load the data from OLTP database structure to OLAP database structure.

check here my previous tutorials

Check out my YouTube Channel here

2 comments:

fodatJuly 25, 2024 at 8:44 PM
بهبود سئوی وب سایت های عربی می تواند تاثیر بسزایی در بهبود جایگاه کلمات کلیدی هدف داشته باشد. با استفاده از تکنیک های بهینه سازی برای موتورهای جستجو می توان محتوای وب سایت عربی را به گونه ای تنظیم کرد که موتورهای جستجو بتوانند به راحتی آن را شناسایی و رتبه بندی کنند. این شامل بهینه سازی عناوین، توضیحات متا، استفاده از کلمات کلیدی مرتبط و بهبود ساختار لینک های داخلی و خارجی است.
fodatDecember 18, 2024 at 9:19 PM
تصنيع وإنتاج المنتجات الخشبية والأثاث وأطقم السرير تصنيع وإنتاج السمسم ... أريكة مريحة، طاولة تلفزيون، أريكة كلاسيكية، أريكة ملكية، أريكة L، أريكة فولاذية. تتضمن عملية إنتاج الأريكة فیدار عصري عدة خطوات، تتطلب كل منها عناية واهتمامًا خاصين. وتشمل هذه المراحل التصميم الأولي واختيار المواد الخام لتصنيع المنتج والانتهاء منه. سنتناول في هذا المقال مراحل إنتاج الأثاث ومكوناته لتقديم صورة واضحة عن العملية المعقدة والفنية وراء إنتاج كل قطعة أثاث. انضم إلينا للتعرف على عملية إنتاج هذه الصناعة الإبداعية والصعبة خطوة بخطوة.