BI Future Blog: maart 2010

woensdag 31 maart 2010

Passing a parameter From a SSIS package to a storedprocedure and back again

Currently i'm developing a template package in SSIS for a new project. One thing that's important in my opinion is running package in an atomic way and his own environment (variables). So passing variables from a parent package to a child package is a no go for me. This would mean that you have to run the parent and the child to get a correct value of the variable. You can't run the package correctly when executing this separately. When running a lot packages you can determine a batchnumber once and pass it through from parent to child but this is a no goner (as is said)

So this is the case: i have a package in which i extract a table (Person) into a Import database and i want to log a batchnumber with every record, every logging and every error that could occur. So i created SQL Task in which i execute a stored procedure and pass a tablename to the stored procedure and retrieve a proces run number back. This is the result

Below a screen dump of the SSIS Parameter mapping of the SQL Task:

And below you see the variable declarations

And this is the T-SQL code:

DECLARE @RC int

DECLARE @chvParDatabase varchar(100)
DECLARE @chvParTabelNaam varchar(100)
DECLARE @intParProcesNummer int
DECLARE @intProcesNummer int

EXECUTE @RC = [MetaServices].[dbo].[uspMetaServices_GetProcesnummer] ,?, ? OUTPUT

The problem with stored procedures, SSIS and the expression language it's hard to match the datatypes unfortunately. For instance the datatype in T-SQL is Integer, in SSIS int32 and in expression language is it a Long datatype. I tried the integerdatatypes in the parameter mapping screen but errors were my part.

Greetz,
Hennie

zondag 21 maart 2010

SQL Server 2008 R2 Update for developers training kit required software

Today i downloaded the SQL Server 2008 R2 Update for developers training kit (http://www.microsoft.com/downloads/details.aspx?familyid=FFFAAD6A-0153-4D41-B289-A3ED1D637C0D&displaylang=en) and installed it into a virtual environment. All kinds of prerequisites, Beta's , Release candidates, tools were asked to install. Below you'll see the list of software i installed and the link to download the software:

Reportbuilder 3.0 (http://www.microsoft.com/downloads/details.aspx?FamilyID=f78b6b1e-8ccb-407a-bc3e-7955d60e1a6c&displaylang=en).
Powerpivot for excel (http://www.powerpivot.com/download.aspx).

32 bits
64 bits
Helpfiles

Streaminsight CTP2 (http://www.microsoft.com/downloads/details.aspx?FamilyID=A3FAA562-B6DC-4702-90C6-BF8E08DF3B8B&displaylang=en&displaylang=en).
Adventureworks R2 (http://msftdbprodsamples.codeplex.com/releases/view/24854).
Microsoft office 2010 Beta (http://www.microsoft.com/office/2010/en/download-office-professional-plus/default.aspx).
Sharepoint server 2010 Beta (http://www.microsoft.com/downloads/details.aspx?FamilyID=77c30c6c-47fc-416d-88e7-8122534b3f37&displaylang=en).
Visual Studio 2010 Beta (http://msdn.microsoft.com/en-us/vstudio/dd582936.aspx)

Quite a bunch of software! I think that i listed all the required software for using in the SQL Server 2008 R2 Update for developers training kit but may be i forgot one.

Greetz
Hennie

donderdag 11 maart 2010

SQL Server 2008 statistics

Today i want to tell you something about statistics in SQL Server 2008. Microsoft SQL Server 2008 collects statistical information about indexes and column data stored in the database. These statistics are used by the SQL Server query optimizer to choose the most efficient plan for retrieving or updating data (BOL). While in the majority of cases SQL Server takes care of statistical housekeeping automatically, it’s useful to have some understanding of what’s going on when you have to intervene.

SQL Server collects the following information:
• The number of rows and pages accupied by a table’s data
• The time that statistics were last updated
• Average length of keys in a column
• Histograms showing the distribution of data
• String summaries thart are used when performaing LIKE queries.

Suppose you have an column with following numbers Anderson, Andersen,Smith,Williams, Zhang. An histogram would look like this :

Andersen     2
Smith       1
Williams     1
Zhang        1

Suppose that you are searching for a value Anderson, Smith and Williams then the estimation, based on the above example would be 4 rows. This about 80% total rows that match the criteria. This would mean that SQL Server is going to scan the table fully instead of using the index (off course in a real world example you need more data).

I used the following code from MSDN :

-- Clean up objects from any previous runs.
IF object_id(N'Person.Contact','U') IS NOT NULL
DROP TABLE Person.Contact
GO

IF EXISTS (SELECT * FROM sys.schemas WHERE name = N'Person')
DROP SCHEMA Person
GO

-- Create a sample schema and table.
CREATE SCHEMA Person
GO

CREATE TABLE Person.Contact(
FirstName nvarchar(60),
LastName nvarchar(60),
Phone nvarchar(15),
Title nvarchar(15)
)
GO

-- Populate the table with a few rows.
INSERT INTO Person.Contact
VALUES(N'John',N'Smith',N'425-555-1234',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Erik',N'Andersen',N'425-555-1111',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Erik',N'Andersen',N'425-555-3333',N'Mr')
INSERT INTO Person.Contact
VALUES(N'Jeff',N'Williams',N'425-555-0000',N'Dr')
INSERT INTO Person.Contact
VALUES(N'Larry',N'Zhang',N'425-555-2222',N'Mr')
GO
-- Show that there are no statistics yet on the Person.Contact table.
sp_helpstats N'Person.Contact', 'ALL'
GO

-- Implicitly create statistics on LastName.
SELECT * FROM Person.Contact WHERE LastName = N'Andersen'
GO
-- Show that statistics were automatically created on LastName.
sp_helpstats N'Person.Contact', 'ALL'
GO

--DROP STATISTICS Person.Contact._WA_Sys_00000002_7D78A4E7

--Same Histograms.
DBCC SHOW_STATISTICS(N'Person.Contact', _WA_Sys_00000002_7D78A4E7)
--DBCC SHOW_STATISTICS(N'Person.Contact', IX_Contact_Lastname)

Below you can see an example of the histogram:

dinsdag 9 maart 2010

SQL Zaterdag

Last Saturday i went to ‘SQL Zaterdag’ seminar and this new seminar is great way of being informed about the (new) features of SQL Server. This seminar was organized by SQL Pass.nl, Infosupport and Microsoft Netherlands. There were two tracks: one BI track and a Not BI track:Engine track. I joined the BI track with an old co worker. These were the subjects of the BI track:

SQL Server – de BI release,
Master Data Management,
SSIS : Tuning for Performance
Building a dashboard with Powerpivot, Reporting Services and Performancepoint .

I would like to share some interesting stuff:

BI release (Hans Geurtsen)

The release date of SQL Server 2008 R2 is may 2010.
There are really neat features in Reporting services specially for supporting the Selfservice BI strategy of Microsoft. Report builder 3.0 will aid the power user more and more with self supporting functionality.
Now you have shared datasets which you can share between reports. So i can imagine that a Power user or developer build a query and when it's authorized and approved it can be shared between reports. So this will lead to less double queries if adopted in a proper way.
Shared report parts for mash up support. Mash up technologie will be a technology which will emerge in the coming years. Building in the right way organisations will be more agile because report parts can be created, dropped and changed easily. Another aspect is that when reports use the same reports parts they can be updated with the new report part. But updating the reports in a push strategy is not going to work (at this moment). You can't update all the reports all at once. Yoú'll have to load the reports and refresh the report parts manually.
2 extra pilars in the BI stack for Master Data management (stratature) and for Streaminsight (CEP).
Adjustments in the SSAS : Classic OLAP and Powerpivot support.
A new edition : Parallel datawarehousing (DatEllegro) .

Master data management (Marcel Westra)

Applications needs to be adjusted for supporting the MDM because data needs to be pushed to the MDM hub and down to the other worker hubs. So it’s gonna take a while before a successful MDM project is implemented. But when this implemented in the right way (?) the source systems will transform into a fact generator. This will have a great benefit in dataquality and will have consequences for implementing and maintaining datawarehouses.

SSIS : Tuning for performance (James Rowland-Jones):
James explained all kind of performance issues with SSIS. Stuff i haven't seen before like the log events screen. I have to look into that. Also he spoke about jumboframes(?). Below you see point that i managed to grab during his fast speed speech:

A lot of performance counters which can aid tuning. Too many to write down here. These are familiar and can also be found in the Microsoft certification books.
Some discussion about properties which can aid performance like FastParse.
Make a baseline when importing data. (use the trash destination).
Don’t use ADO.net for your source and destination. It will generate procedural based loops. Used OLEDB or SQL destination.

Building a dashboard with powerpivot, Reporting services and sharepoint (Kasper de Jonge)

Kasper showed great features with reporting, powerpivot and sharepoint. It seems to me that Powerusers, developers, etc will built components with reporting and powerprivot and than they will be uploaded to sharepoint and can be used by the users. Kasper showed all kind of possibilities and I was impressed by the new features. This demo show the integration of powerpivot, excel, reporting services and sharepoint in a great way.

Greetz,
Hennie

vrijdag 5 maart 2010

Selfservice BI

Yesterday i joined a seminar, the BI dutch year event 2010, with Wouter van Aerle (CAP Gemini) , Martijn Evers (Radboud University) and Johan van der Kooij (VLC) and the subject was Selfservice BI. This blog is about :What’s it all about and what will bring this to us?

So what is selfservice all about? So far i understand it something like this :The ability for the (power) user to gather, extract, enrich and present the information from different kinds of datasources in an organization or external sources like www. It can be used for one time or low frequent analysis prototyping and for short time to market solutions.

BUT, What’s different with the situation in organizations, now? Currently excel is installed on every machine and with a little help (power) user can connect to datasources and analyze the data and built reports. So what‘s all the fuzz about (except the marketing machine of Microsoft)?

When certain aspects are analyzed in a broader view you can see different things are happening. The world is changing and people and organizations are changing too. Workers in organizations have deadlines and need to deliver information and excuses like is not in the datawarehouse is not acceptable. Yup, start up excel (or Powerpivot in a short time) and off we go. Other reasons are faster decision making, better decision making, and the new generation Y (20 -30 years old) workers. This new generation are users who are independent, self secure and wants to do everything themselves.

So it's not a really a new tool or solution but more a new age of informationprocessing. That's all about with self service BI. It's facilitating the user with all kinds of methods, tools, ETL, etc.

Another issue is like what purpose do you want to achieve with Self service BI? I think you'll have to authorize the information in certain Selfservice BI Solutions somehow. This information validated and this information not. So quality of information will be more important. Do you want to create a ledger of huge company via a Selfservice BI solution?

These change will lead to another role for the BI consultants. Now, BI consultants are the intermediar between business and IT. They gather information and present them. In the future a (power) user will have the ability to gather, analyze and present by themselves. So what will be the role of the BI consultant in the future?

dinsdag 2 maart 2010

Fact dimension in SSAS

Today i wanted to see some degenerated dimension information in my cube in SSAS (2008). After trying some options in SSAS things were not going as planned. After a while I found out that Microsoft calls degenerated dimensions fact dimensions. In an article on MSDN it’s further explained (http://msdn.microsoft.com/en-us/library/ms175669.aspx)

Fact dimensions are standard dimensions, not based on a physical dimension table but on the physical table of the fact. In this Fact dimension you can insert attributes that are not measures but descriptive fields.

In the above example SalesOrderNumber and SalesOrderLineNumber are the Fact dimension keys and the degenerated dimensions are CarrierTrackingNumber and CustomerPONumber.

Greetz,
Hennie