Give the two types of tables involved in producing a star schema and the type of data they hold. Answer : Fact tables and dimension tables. A fact table contains measurements while dimension tables will contain data that will help describe the fact tables.
What are types of Facts?
There are three types of facts:
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
What are non-additive facts in detail?
A fact may be measure, metric or a dollar value. Measure and metric are non additive facts.
Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount.
A non additive fact, for eg measure height(s) for ‘citizens by geographical location’ , when we rollup ‘city’ data to ’state’ level data we should not add heights of the citizens rather we may want to use it to derive ‘count’
What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Everytime you access the view, the SQL statement executes.
Materialized View - stores the results of the SQL in table form in the database. SQL statement only executes once and after that everytime you run the query, the stored result set is used. Pros include quick query results
What is active data warehousing?
An active data warehouse provides information that enables decision-makers within an organization to manage customer relationships nimbly, efficiently and proactively. Active data warehousing is all about integrating advanced decision support with day-to-day-even minute-to-minute-decision making in a way that increases quality of those customer touches which encourages customer loyalty and thus secure an organization’s bottom line. The marketplace is coming of age as we progress from first-generation “passive” decision-support systems to current- and next-generation “active” data warehouse implementations
What is SKAT?
Symbolic Knowledge Acquisition Technology (SKAT).
A system based on SKAT develops an evolving model from a set of elementary blocks, sufficient to describe an arbitrarily complex algorithm hidden in data, instead of routine searching for the best coefficients for a solution that belongs to some predetermined group of functions. Each time a better model is found, the system determines the best regression parameters for that model. In most general terms, this technology can be classified as a branch of Evolutionary Programming.
What is Memory Based Reasoning (MBR)?
To forecast a future situation, or to make a correct decision, such systems find the closest past analogs of the present situation and choose the same solution which was the right one in those past situations. That is why this method is also called the nearest neighbor method.
Give reasons for the growing popularity of Data Mining.
Reasons for the growing popularity of Data Mining
Growing Data Volume
The main reason for necessity of automated computer systems for intelligent data analysis is the enormous volume of existing and newly appearing data that require processing. The amount of data accumulated each day by various business, scientific, and governmental organizations around the world is daunting. According to information from GTE research center, only scientific organizations store each day about 1 TB (terabyte!) of new information. And it is well known that academic world is by far not the leading supplier of new data. It becomes impossible for human analysts to cope with such overwhelming amounts of data.
Limitations of Human Analysis
Two other problems that surface when human analysts process data are the inadequecy of the human brain when searching for compex multifactorial dependencies in data, and the lack of objectiveness in such an analysis. A human expert is always a hostage of the previous experience of investigating other systems. Sometimes this helps, sometimes this hurts, but it is almost impossible to get rid of this fact.
Low Cost of Machine Learning
One additional benefit of using automated data mining systems is that this process has a much lower cost than hiring an army of highly trained (and payed) professional statisticians. While data mining does not eliminate human participation in solving the task completely, it significantly simplifies the job and allows an analyst who is not a professional in statistics and programming to manage the process of extracting knowledge from data.
What is Market Basket Analysis?
Processing transactional data in order to find those groups of products that are sold together well. One also searches for directed association rules identifying the best product to be offered with a current selection of purchased products
What is Clustering?
Clustering. Sometimes called segmentation, clustering identifies people who share common characteristics, and averages those characteristics to form a “characteristic vector” or “centroid.” Clustering systems usually let you specify how many clusters to identify within a group of profiles, and then try to find the set of clusters that best represents the most profiles.Clustering is used directly by some vendors to provide reports on general characteristics of different visitor groups. These techniques require training, and suffer from drift on Web sites with dynamic Web pages.
What are cubes?
Cubes are data processing units composed of fact tables and dimensions from the data warehouse. They provide multidimensional views of data, querying and analytical capabilities to clients.
What are measures?
Measures are numeric data based on columns in a fact table. They are the primary data which end users are interested in. E.g. a sales fact table may contain a profit measure which represents profit on each sale.
What are dimensions?
Dimensions are categories by which summarized data can be viewed. E.g. a profit summary in a fact table can be viewed by a Time dimension, Region dimension, Product dimension
What are fact tables?
A fact table is a table that contains summarized numerical and historical data (facts) and a multipart index composed of foreign keys from the primary keys of related dimension tables.
What are the benefits of Data Warehousing?
Data warehouses are designed to perform well with aggregate queries running on large amounts of data.
The structure of data warehouses is easier for end users to navigate, understand and query against unlike the relational databases primarily designed to handle lots of transactions.
Data warehouses enable queries that cut across different segments of a company’s operation. E.g. production data could be compared against inventory data even if they were originally stored in different databases with different structures.
Queries that would be complex in very normalized databases could be easier to build and maintain in data warehouses, decreasing the workload on transaction systems.
Data warehousing is an efficient way to manage and report on data that is from a variety of sources, non uniform and scattered throughout a company.
Data warehousing is an efficient way to manage demand for lots of information from lots of users.
Data warehousing provides the capability to analyze large amounts of historical data for nuggets of wisdom that can provide an organization with competitive advantage.
What does data management consist of?
Data management, as it relates to data warehousing, consists of four key areas associated with improving the management, and ultimately the usability and reliability, of data. These are:
Data profiling: Understanding the data we have. Data quality: Improving the quality of data we have. Data integration: Combining similar data from multiple sources. Data augmentation: Improving the value of the data.
How is a data warehouse different from a normal database?
Every company conducting business inputs valuable information into transactional-oriented data stores. The distinguishing traits of these online transaction processing (OLTP) databases are that they handle very detailed, day-to-day segments of data, are very write-intensive by nature and are designed to maximize data input and throughput while minimizing data contention and resource-intensive data lookups.By contrast, a data warehouse is constructed to manage aggregated, historical data records, is very read-intensive by nature and is oriented to maximize data output. Usually, a data warehouse is fed a daily diet of detailed business data in overnight batch loads with the intricate daily transactions being aggregated into more historical and analytically formatted database objects. Naturally, since a data warehouse is a collection of a business entity’s historical information, it tends to be much larger in terms of size than its OLTP counterpart.
Do you know what a local lookup is?
This function is similar to a mlookup…the difference being that this funtion returns NULL when there is no record having the value that has been mentioned in the arguments of the function. If it finfs the matching record it returns the complete record..that is all the fields along with their values corresponding to the expression mentioned in the lookup local function. eg: lookuplocal( “LOOKUPFILE”,81) -> null if the key on which the lookup file is patitioned does not hold any value as mentioned.
What is ODS?
- ODS means Operational Data Store.
- A collection of operation or bases data that is extracted from operation databases and standardized, cleansed, consolidated, transformed, and loaded into an enterprise data architecture. An ODS is used to support data mining of operational data, or as the store for base data that is summarized for a data warehouse. The ODS may also be used to audit the data warehouse to assure summarized and derived data is calculated properly. The ODS may further become the enterprise shared operational database, allowing operational systems that are being reengineered to use the ODS as there operation databases.
Why should you put your data warehouse on a different system than your OLTP system?
A OLTP system is basically ” data oriented ” (ER model) and not ” Subject oriented “(Dimensional Model) .That is why we design a separate system that will have a subject oriented OLAP system…
What are conformed dimensions?
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined Ex:Date Dimensions is connected all facts like Sales facts,Inventory facts..etc
What is ETL?
ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target mappings, ransformation and job control parameter. · Extraction Take data from an external source and move it to the warehouse pre-processor database.
· Transformation Transform data task allows point-to-point generating, modifying and transforming data.
· Loading Load data task adds records to a database table in a warehouse.
What does level of Granularity of a fact table signify?
Granularity The first step in designing a fact table is to determine the granularity of the fact table. By granularity, we mean the lowest level of information that will be stored in the fact table. This constitutes two steps: Determine which dimensions will be included. Determine where along the hierarchy of each dimension the information will be kept. The determining factors usually goes back to the requirements
What is the Difference between OLTP and OLAP?
Main Differences between OLTP and OLAP are:-
- User and System Orientation
OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals. OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis).
- Data Contents
OLTP: manages current data, very detail-oriented. OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process.
- Database Design
OLTP: adopts an entity relationship(ER) model and an application-oriented database design. OLAP: adopts star, snowflake or fact constellation model and a subject-oriented database design.
OLTP: focuses on the current data within an enterprise or department. OLAP: spans multiple versions of a database schema due to the evolutionary process of an organization; integrates information from many organizational locations and data stores
Why are OLTP database designs not generally a good idea for a Data Warehouse?
Since in OLTP,tables are normalised and hence query response will be slow for end user and OLTP doesnot contain years of data and hence cannot be analysed.
Login to add comments on this post.