June 2006 - Posts

A co-worker of mine, John Nielsen, forwarded me this post from xmldatabases.org, and I felt compelled to respond.  John is an extremely talented developer and author in the Python space.  As you might expect, he is also a little progressive in his thoughts about Computer Science.  I don’t think he sees Microsoft and Bill Gates as the Anti-Christ, but he does have a more open source bent. 

 

John and I have had many discussions about where things are going in technology, especially in regards to SOA and databases, two of my favorite technology topics.  Anyway here are some thoughts about this post.  I’ve included my thoughts in italics and bold.

 

1.      Do not think in terms of programming languages, XML is a data format. It is NOT a serialization of programming language structures so don't treat it as such. Yes, we learned this in Relational Databases long ago (well I guess I know some developers that haven’t, but on the whole…) In REST based systems you're moving and manipulating data, you're not making method calls against a remote object. The fact programming languages are involved at all is a necessary evil. The data is what matters, programming languages are just tools to help work with that data.  Yes!  Finally someone gets it!

2.      XML Schemas are to act as documentation and spot validation, do not use them as a straight jacket around the data. W3C XML Schema driven systems are horribly brittle in operation because they don't bend. If the data doesn't exactly match what is expected they blow up, even on something as simple and non-critical as an extra attribute from a different schema being added to an element. Yes validate the data when you need to, but do not overly constrain it.  This is a recipe for disaster and is a major flaw for XML databases.  In the relational database management world, we know that if you don’t constrain the data to the business rules, data quality will suffer.  XML is brittle and does not provide good constraining capabilities.

3.      For the data you receive just take what you need and pass through everything else. If there's extra data in the stream it is NOT an error as long as all the other data you need is there as well. Just ignore the extra data. This is one reason why XML Schema based systems are so fragile, they take too literal of a view about the shape of the data.  Of course, you could just limit the data you request, such as is possible with a good data access language like SQL.  XPath is a good stab at a declarative language for XML, but it still has some growing up to do.

4.      As a corollary to #3, it is also not necessarily an error when non-critical data is missing from the stream. If you can do a little extra work to make the operation succeed with missing data, do the work. Um yeah, it’s called a join in the set based world.  And the DBMS does all the heavy lifting for you.

5.      The network can and will disappear. Take advantage of the clean data packet XML provides to queue the request for later execution. Anytime there's no user sitting waiting for the data it's probably a good idea to use this kind of asynchronous architecture.   Transaction Logs are wonderful things!

6.      Your data is going to change, that's why you're using XML to begin with. Do NOT lead your clients down a path that leads them to overly coupling their systems to your changing data. I.e. do not encourage them to use some kind of XML data binding tool that's strongly coupled to a particular schema. This also means you need to be very selective in what you consider required data. Adding additional required data to an existing operation is going to cause breakage. If data is not absolutely 100% for certain required then just make it optional. Adding additional optional data should never break the system. Another reason XML Schema based systems are so horribly brittle is they tend to break on any schema change, not just required data changes.  Another area where relational/SQL based systems are still superior.  Changing data structures are easy, fast, and tend not to break well written applications. 

7.      Do not require more data then necessary in a request. Yet another limitation of schema based systems is that you can be forced to carry a bunch of context data with a request just so that it passes validation, even when the data is not truly required for the request.  See thoughts on point 3. 

8.      Use XML tools as much as possible. XPath and XSL-T are enormously powerful tools for working with XML data, traditional programming languages are generally very poor at working with XML. Trying to bend traditional programming language techniques to working with XML data is what leads to horrible systems like W3C XML Schema and static binding of programming language objects to schemas. If you want to build truly robust systems do not fall into this trap.  Yes these are powerful tools, but they have some catching up to do with SQL.  As a declarative language there are few things I CAN’T do to request, transform, and manipulate data.  Not all SQL implementations are good performers at everything that I need to do, but it’s possible and not difficult.

9.      Build software that bends. Think in terms of malleable data and disappearing services. In general, visualize the system as an evolving, changing entity that can not be constrained. Do NOT try to constrain it, if you do you'll just end up with a system that falls over at the smallest inconsistency. Validate what you need, pass through anything you don't need.  Again if you don’t constrain it, data quality will suffer.  If you are talking about programs that just pass the data through to an application, that has to be validated, too.  Often times I see people just shredding XML documents and stuffing them into relational databases, or <gasp> adding a LOB field and sticking the XML document in it.  If XML isn’t capable of constraining the data and there is no validation before sticking it into the database, problems will arise.  Your application might not break, but how about the next guy down the line or the business user that has to report on that data.  I pity the ETL developer or data steward in the Warehouse that has to cleanse that data.

10. Maximize flexibility, do not fall into the trap of believing that a fully specified and constrained system will be more robust. It won't be, it will be brittle and prone to constant operational failure in the real world. Accept that the system and the environment where it's going to operate is going to change and build for that eventuality.   Flexibility is often given up in lieu of scalability.  However, relational databases tend to give you a happy medium.

 

All this isn’t to say that I’m anti-XML.  On the contrary, I think it’s one of the most important technologies of the past decade.  It just isn’t a hammer and the world isn’t a nail.  I wouldn't use SQL to develop GUIs, would I (could I?...hmmmm.... - :)  ) Let’s use XML for what it was designed for – messaging and inter-process communications.  As an industry we ditched hierarchical and network databases many years ago (see IMS, IDMS, et al), and for good reason.   

 

Jon

 

with 2 Comments

I finally got around to installing Data Dude on my PC to try it out.  Having used a number of tools in the past, I was looking forward to having one that works well, is in the same toolset as SSMS, and is supported by MSFT.  Below is my first experience with the tool. 

Installation

At first I tried to install just the Database Professional Tools without installing VS2005 - no luck (received errors about DBProj application not installed).  You really have to install VS Team System.  I used the trial version which time bombs in 180 days.  After installing VSTS, I again tried to use the SQL 2005 Project - again no luck (Cannot Find MSSQL$Express).  There is an option to install SQL Server Express Edition with VSTS - This is a requirement for Data Dude.  I had elected not to install it because I already had a SQL 2005 developer edition running as well as a couple of SQL 2000 and Analysis Services instances running (yes, it is slow!). I was hoping to avoid yet another instance, but I gave in, killed a 2000 MSDE instance and installed the VSTS SQL 2005 Express.  Voila! Success!!!

Schema Import

This is a feature I'm very excited about.  It would be nice to have a source controlled version of schema that is up to date with my data modeles, that I can use to compare against a live database to generate ALTER scripts.  For those of you that have used ERWIN or ER/Studio, you know what a pain it is to get a ALTER TABLE command created from those tools (and don't get me started ERWIN bashing - it won't stop!).

This feature is really intuitive - all you need to do is right click on the Schema Objects folder and the context menu prompts you to import the schema. You can select an existing connection or make a new one.  For me this was the first time I'd used it, so I made a new connection.  Once that is done, the import tool works away at importing the schema. Unfortunately, this feature still has some bugs.  Essentially all the keys and stored procs were imported with errors, an '!' next to them.  The tables, views, and indexes seem to be fine, but constraints, not so much.  I expect this to get better in future versions.

Schema Compare

This comes in 3 flavors - Project to Database, Database to Database, and Project to Project.  In order to use the Project features, you first must import a schema.  Since I had a schema imported (albeit with errors), I used that, comparing it to a database similar (nearly identical) to it but on a different server from where I imported.  Since the keys were imported with errors, there were a lot of hits on missing PKs, FKs, and indexes.  Otherwise, it seemed to work OK.  I tried the reverse with the Project as the target and the Database as the source.  It did the comparison and got what seemed to be accurate results. However when I right clicked and tried to update the project, the changes seemed to go into the ol' bit bucket.

Next I tried the database to database comparison. This worked quite well, though not perfectly.  Again I tried a development database and its corresponding system test database (I knew they were identical - I built them myself yesterday :) ).  Surprisingly it came up with some hits due to '[]' around owner names that were in test and not development.  There was no way to filter that out unfortunately.  There are, however, other filters for comparison, including whitespace, case sensitivity, owners, and the like.  Unfortunately, they don't seem to be implemented yet, as nothing happened when I clicked them. The readme.htm that comes with the download confirms that filters will be included in a later build.  Since there were some diferences, I wanted to see what the output would be.  The layout has a diff window below the list of objects which is scrollable like VSS or Red-Gate SQL Compare - very nice.  Below that is a running change script window.  You can view and edit the script that Data Dude puts out.  It also provides a button at the top to export to the SSMS Query Editor - again nice work.  The script output was concise and accurate, just as I would write it by hand.

The comparison tool does seem to work very well.  I must say it is on par with tools like Embarcadero Change Manager and Red-Gate SQL Compare.  The GUI is intuitive and the results seem to be accurate.

Data Compare 

This again is a feature I was highly interested in. Red-Gate Data Compare is a fine tool, but it would be nice to have it in the box.  I again compared my DEV and TEST databases, not knowing what data the testers had mucked up. Although the database was small, this tool was fast and produced great results.  The script again was clean and accurate, and I could export it to a Query Editor window for review, saving, source control, etc.  Nice!  This will be great to have in the toolbox.

Test Data Generation

Again this is interesting to me.  I have looked mulitiple times at tools to sample full-size environments, scrub sensitive data, and populate test environments.  None of them works well and all of them require a lot of work and training (from my experience).  This tool is not one of those.  Essentially, it will generate random data to stuff into tables.  If it's a number column, the generator will produce a random number.  If it's a varchar column, you will get a randomly generated string, none of which made any sense to me.  I'm not sure this would be great for system testing and user acceptance testing, but maybe unit testing or performance testing.  Hey, it's data!

Stored Procedure Testing

I was at a bit of a loss as to how to start with this piece, so I watched the video of it on the website.  To me it seemed like a really hard way to test a stored procs for accuracy, but it seemed to have capabilities of ramping up and performance testing.  This could have promise, so I'll take some time and drill down on that later.  For now, the jury is out with me on this feature.

All in all this is a great first try from Microsoft.  I would use the DB to DB compare tool.  The data compare tool is also handy.  Other than that, I’m waiting for some things to be added or fixed before the tools are GA.

Jon

with 1 Comments

After much coaxing and prodding, Dave Rodabaugh has joined the blogsphere.  Dave is a friend and colleague that I respect highly.  In his first post, he makes some vague reference to following the advice of those smarter than him.  I assure you, he is the smarter one (Think rotating 16 sided rubix cubes - in his head)!

Anyway, Dave has just posted a follow up to my last post with some additions to his own interview style for DW/BI.  Check it out.  It will be a multi-part series that I won't miss. Dave just finished a long gig doing a DW for Home Depot, and I expect he will be blogging about his experience there.  Should be some good stuff.

 

Jon

with 0 Comments

There is a new article by Chris Shaw on SQLServerCentral.com about "Hiring a DBA".  I want to start off by saying that Mr. Shaw is right on, and thanks to him for writing this!  Having just gone through a hiring process, which was like many I've had in the past, I feel his pain. 

Hiring an employee or a consultant is always an adventure.  As Chris put it, it's more art than science.  Sometimes you just know one way or the other.  Sometimes, you are just wrong!  I recall one instance where the candidate interviewed with me and another Sr. DBA.  The guy answered many technical questions correctly, said "I don't Know" in the right places, and seemed to communicate very well.  However, when he arrived on site, he sadly did not work out due to lack of initiative and creativity and lack of ability to work in a highly available environment.  We were just wrong!

Hiring is typically a time-based task.  Very rarely can you wait around for the absolutely perfect candidate to show up on your doorstep.  I liken it to buying a house.  You can't buy the best house in the world (well, most of us can't anyway).  You buy the best house on the market when you are looking, where you are looking, and in the price range you can afford.  You recognize it might need some fixing up or doesn't have all the ameneties you were looking for, but that's OK.  You might add/fix it later or live without it.  As long as the candidate has the basics and the personality matches, that's often enough.

In Chris' article he breaks down the process into a few stages, including position definition, position posting, resume review, and interviewing.  I really liked the first two sections, and I have things I'd like to add to the last two.

Resume review is something that comes with a good bit of experience.  After talking with a number of candidates after reading their resume, I've learned to spot quite a few things, now.  Some of my favorites are

  • Search and replace of Oracle and/or DB2 with SQL Server.  This one kills me.  It's obvious on the resume when a candidate doesn't really know SQL Server.  This last round we had a candidate that had the title of SQL Server DBA, but all the tools listed were Oracle (TKPROF, OEM, etc.).  I realize that Oracle has a larger market share than Microsoft, so it's likely the person did both, but the resume didn't read that way at all.
  • Developers trying to become DBAs.  Yes, there are 2 types of DBA; Production Support and Application Development.  I am all for people expanding skillsets and working into new positions.  I just wish they were upfront about it.  I see alot of applications developers applying for Production Support roles, and the resume reads "developed SQL and stored procedures for .NET application", or "designed tables to support data storage."  When this type of candidate is asked about fragmentation or database consistency or backups, s/he really doesn't know.  It's just too far of a stretch.  That's an easy one to see on a resume, though.
  • Lots of job/contract hopping.  As Chris points out long tenure is easy to spot on a resume.  So is job hopping.  While I understand that "long tenure" has a different definition today than it did 20 years ago, it does seem reasonable that a person would catch on with a company for a couple of years at least.  There are exceptions to this, such as project related consultants.  Typically, that type of person is easy to spot, too, due to the high level of project tasks and variety of technologies used.  When I spot these types of people looking for long-term gigs, I like to phone interview and find out why.  Sometimes the reasons are legit and the person will work out.  Other times, you may be just the next hop.

Interviewing is also something that comes with a good bit of experience.  Many people think that answering technical questions and being funny or nice is all you need to get a job.  Actually, I've recommended people for hire without asking a single technical question, and been right!  More than knowing book answers or being able to quote a popular website or magazine, I like to learn about a candidate.

I prefer a two stage interview. First the phone interview, then the face-to-face.  The phone interview is crucial.  Unless a candidate comes recommended by someone I respect, I do a phone interview, and sometimes even then.  I like to ask a person to tell me a bit about him or herself and what recent projects s/he has done.  I can quickly tell about communication ability and enthusiasm or passion from these questions.  I might lob a few softball technical questions at the person on the phone, maybe more than softballs if s/he is long distance and would need to travel for a face-to-face.  BTW, pay attention to the sounds on the other end of the phone.  If you hear mouse clicking or keys tapping, the person may be googling or looking in BOL for answers (ask me how I know!).

Next is the face-to-face, and I hope you are wearing anti-perspirant for it.  It could get messy.  I prefer a group interview with some senior people, possibly even developers.  I try to push the person until I learn what I need to know to recommend or hire the person.  I'm really looking for the following:

  • How does a person handle stress?  I'll use questions or scenarios to get the person to a point of stress to see how they handle it.  My friend Scott Sawatzki likes to say people handle stress in 3 ways; Fight, Flight, or Freeze.  I avoid DBAs that freeze.  There is a lot of stress in the DBA role, typically.  If you've ever had a cubical full of managers waiting for you to restore a database or restart a service, you know what I mean.  How a person will behave in this type of situation is crucial to his/her success.
  • How does a person handle lack of knowledge.  Like Mr. Shaw, I like to hear people say "I don't Know", or "I would look that up".  Think of an interview as an adaptive test.  It will get harder as the candidate answers hard questions, and it won't stop until "I don't know" is reached. 
  • How does a person solve problems?  Does s/he work them out silently or verbally, or not at all?  If given a scenario question where a long answer with multiple steps are required, what does the person do?  There is so much problem solving for DBAs, it just requires a good problem solver.  Whether it is troubleshooting or SQL development, a candidate must be able to solve a problem on his/her own.  I tend to get a hint of organizational skills here, too.
  • How much information does a person offer when answering questions?  Does the person give short, terse answers, or does s/he prattle on and on getting off topic completely.  Like Chris Shaw mentions, this is typically a sign the candidate doesn't know and is trying to divert.  I like answers to be somewhere in the middle of these 2 extremes.  I recall a hire where I did not recommend the candidate, but he was brought in anyway.  I didn't like the fact that he talked too much in the interview.  In the end it was discovered the guy really didn't have the basic understanding needed and spent most of his day socializing.  He was let go in under 6 months.

There are many discussions about the types of questions to ask in an interview.  Given there are a number of sites that already list questions, I won't add to that.  I think this is a good starting point. http://vyaskn.tripod.com/iq.htm.  Vyas has provided questions by category, which is helpful when you might be interviewing for different roles.  Obviously, I have my own set of questions, but a candidate should at least be able to answer a set like these that are readily available. Essentially, I like to ask questions that are representative of what the person will be doing.  I like Chris Shaw's idea of making a list of tasks and projects.  That will narrow down the types of questions to ask.  I also like scenario questions where the person must give more than one word answers to demonstrate technical knowledge.

Hiring is hard.  It bogs down a staff trying to do it, and unsettles the team for a month or so afterward.  That's why its important to do it as infrequently as possible by getting the right candidate the first time.

What are your hiring or interviewing experiences?  What do you think of my additions to Chris' aritcle?

Jon

 

with 1 Comments

I have read back through some of my previous posts here, and I realized that I have had an angry, frustrated tone in recent months.  This disturbed me, and I'm betting it disturbed anyone who has read the posts.  My apologies for anyone this my have touched.

But no more.  I am vowing to turn over a new leaf.  Some of my earlier posts were more informational and helpful.  Those I liked, and I will work to have more of them in the future.  I'm currently working on some replication knowledge, since it has some great new features in 2005.  Look for some good things on that topic. 

 

Jon

with 1 Comments

So it's summer (well almost).  And in Ohio there are only a few months of the year when it's not gray everyday, so we likes our summer a lot!  In order to promote good health and well-being of our SQL Server community here, I think we are going to take a break from SQL Server user group meetings for July and August.  We'll pick back up again in September with a kick off event.  Some of the things I have in the hopper for presentations:

1. SQL Server performance troubleshooting, deep in the bowels of the internals (WaitStats, VirtualFileStats, and MaxDop) - Me

2. Project Real on 64-Bit NEC hardware - Luis G.

3. Share your favorite script night (think small church testimony time!)

4. Automated Daily Monitoring - Using T-SQL, DTS, and Reporting Services to automate daily drudgery - Me

5. Visual Studio Team System DB Pro Tools (Data Dude) - Luis G.

Of course none of these are set in stone.  This is just something to whet your appetite.  Oh, and don't forget to register for PASS Community Summit if you haven't done so already.  I will be there and hope to see others from Columbus there, too.

Jon

with 0 Comments

In my last post, I asked some very pointed questions about the delivery of the new DBPro tools.  Mostly I was concerned that these would only be available to customers willing to pay a premium.  Matt Nunn commented on my questions, and I was so pleased with the reply, I wanted to bring more attention to it.

Wanted to comment on this blog. Visual Studio Team Edition for Database Professionals will be available for everyone, ther eis no need to have even Visual Studio prior to acquiring this ediiton, it will ship with everything needed to run it. We also integrate with any source control system that implements the SCCI interface, i.e. any source control system that can plug into the Visual Studio IDE, which is pretty much all of them.

Hope this sets the record straight and answers your questions and I encourage you to try the new edition when the CTP goes live on June 11th. You wil be able to download it for
http://msdn.microsoft.com/vstudio/teamsystem/dbpro and for this particular CTP you will need to download a trial version of either Team Suite or VS Pro (or have a full version already) but moving forward that will not be the case

Thanks Matt for clearing this up.  My apologies for making some assumptions without asking the simple questions.  Once again, Microsoft is taking care of the database community in Premium fashion. 

Jon

 

with 3 Comments

So it sounds like there is a lot of buzz about the new Visual Studio Team System for Database Professional tools that are coming out in CTP on June 11.  Matt Nunn and others are blogging heavily about it on msdn.  I, too, would be very excited to have all of these tools in my hands.  In the past, I've used Red-Gate, Embarcadero, ERWIN and others to accomplish some of these tasks, so having a tool from the vendor that does it all will be exciting. 

Some questions, though.  Does it seem strange that only the people that have purchased Visual Studio Team Systems or another stand alone piece of VSTS should be able to have this nice toolset. What about folks who use other source control systems than Source Safe or the to-be Foundation Server?  And how about the the smaller shops where the DBA has to wear a number (if not all) of the hats in software development? 

My point is, since Management Studio and BIDS are VS based, now, why not include the tools in the box with SQL Server? 

I'm eager to hear other's opinions on this?  I can't be the only one to have this thought...

As my good friend Fernando Lucero would say, "It's just another example of The Man keeping us down. Darn The Man! Darn him!

Jon

 

with 2 Comments