I recently assisted in a review of data modeling tools for a customer.  I won't mention which one we chose, but I would like to give a general review of the tools.  The 3 major tools on the market for data modeling are CA Allfusion Data Modeler, Sybase PowerDesigner, and Embarcadero ER/Studio, and we looked at these 3 competitively.  Each has their own pros and cons. Some provide better SQL Server 2005 support than others.  Some are just plain easier to use than others.  I won't go into a diatribe about data modeling, but if you are looking for a new tool, or if you are looking to begin data modeling for your development projects, you might want to read on to see which tool might be a good fit for you.

CA AllFusion Data Modeler R7

This is the tool formerly known as ERWIN. If you've been in data management for very long, you've probably seen this tool at least once before, most likely as shelf-ware along with many other data management products.  That's not to say ERWIN isn't a fine tool, just that it is underutilized.  ERWIN has gone through a beleaguered history bouncing from LogicWorks to Platinum and then being consumed by Computer Associates.  It was really the first of the PC-based graphical data modeling CASE tools.  ERWIN, or I should say AllFusion Data Modeler as CA now calls it, is in it's 5th major version, but it's being called R7 (mostly I think to keep up with the Joneses - see versions of the other products being reviewed).  The latest version provides additional support for DBMS platforms other than SQL Server and it has little (if any) support for SQL Server 2005 new features. ERWIN's strengths continue to lie in green-field development.  It is very fast at creating a new logical data model, converting that to a physical data model, and generating the schema to create the database objects.  ERWIN's weaknesses continue to lie in metadata management, reporting, graphical layout of the model, and creating ALTER TABLE schema when comparing to an existing database.  The latter is the reason it is difficult to use for anything other than green-field development.  Both of the other tools provide much better means for developing enterprise databases for the long-haul.

Pros

Easy to create new structures
Long History
Full-Featured logical modeler
New Schema DDL Generation

Cons

Metadata Management
Poor Support from helpdesk or website
Graphical Layout - one method, not easy to read
CASE tool to generate ALTER DDL
Lack of SQL 2005 support
Price - it was the most expensive of the 3 solutions

Sybase PowerDesigner 12

Sybase has put a lot of effort into designing a comprehensive modeling suite. Their enterprise edition modeling tools allow for collaborative modeling through all phases of the SDLC.  The ideal scenario for PowerDesigner is to start by capturing Use Cases and modeling the results in PD.  The Use Case Model then pushes down to a number of sub-models, including a Conceptual Data Model.  Data Items can be defined by the business analyst in the Use Case, and the data modeler then adds the Data Item to the appropriate entity.  This is an incredibly empowering scenario for the business, as IT is no longer the only one with data definition capability and responsibility.  It also takes a lot of work off the data modelers' plate as now they have less work to do defining the nitty-gritty of each attribute. 

Once a Conceptual Model is complete, it can be pushed down to a Logical Model for further definition.  A complete Logical Model can be pushed further down to a physical model based on a specific DBMS.  All of these models are decoupled, in that changes to one do not affect the other(s).  PD does provide a link and synch tool to push changes up and down the model chain, allowing you to model at any level, while still keeping all the levels in synch. 

PD is an excellent end-to-end, formal modeling solution providing a collaboration repository.  Its strengths are many. The rich feature set allows the modeler to do just about anything you'd ever want to do.  Besides data modeling, the CASE tools provide Object Modeling and code generation in virtually all the major languages, making it a great place to start any project.  The downsides are just as you might expect.  The capabilities of this tool bring much complexity.  Ripping out a simple data model for a new project is not all that easy.  The tool really pushes you down a process based development pattern.  Also, the interface is not all that intuitive. I have been data modeling for many years, and it took me a  significant amount of time to be able to create entities and relationships for 20 entities.  Lastly it struggled much like ERWIN in creating ALTER DDL.  It was quick good at creating new DDL, but struggled when comparing a model to an existing database.

Pros

Complete business proccess and IT modeling tool
Rich Collaboration
Exceptional Metadata Management (Blobs and Definitions and search capability)
Flexibility in configuration
OK SQL 2005 support

Cons
Complexity
User Interface
Comparison and Script Generation

Embarcadero ER/Studio 7.1
I believe ER/Studio is the youngest of these modeling tools, but it certainly can hold its own with the older brothers.  ER/Studio provides rich data modeling capabilities, metadata management, reporting, and automation.  The last point is a feature I really like, the macros.  Users can write their own productivity and standard enforcement scripts to change models in whatever way necessary.  Embarcadero opens up the object model for access to program your own, but they also provide many helpful macros in the box.

Like PD, Embarcadero decouples the physical model from the logical model, providing compare and synch capabilities between models and the target database.  This last point makes ER/Studio stand-out.  It does a great job of creating alter DDL when comparing to an existing database.  Many tools will rename the table, recreate the table, and re-insert the data for even the simplest changes.  ER/Studio does a pretty good job of knowing when you just need an ALTER TABLE ADD COLUMN script.

The user interface and model layout features of ER/Studio are beyond belief.  There are six ways to layout a model. All of them are meanigful and fast, even with large models.  There are some areas of the user interface that take longer to do things than say ERWIN or PD, but in general it is intuitive and easy to use.

Lastly, ER/Studio now comes in a just for SQL Server edition, which is much less expensive than the competitors, which charge for all the databases, even if you never plan to use them.

Pros

Great Model Layout
Automation Macros
Good Comparison tool and DDL Generation
OK Metadata management
Great Helpdesk and feature enhancement support
SQL 2005 support
Price (SQL Server Edition)

Cons

Some tedious areas of the interface
A little buggy (needed to call support a few times)

As you can see, all of these tools can provide an organization with good data management abilities.  One thing I didn't mention in the reviews is that data management is more than implementing a tool.  It takes discipline and process enforcement.  None of these tools will do that for you. That's the people part of this industry.  Hopefully, this review will help others as they look to choose a tool to implement in their SDLC.

Jon

with 0 Comments

A co-worker of mine, John Nielsen, forwarded me this post from xmldatabases.org, and I felt compelled to respond.  John is an extremely talented developer and author in the Python space.  As you might expect, he is also a little progressive in his thoughts about Computer Science.  I don’t think he sees Microsoft and Bill Gates as the Anti-Christ, but he does have a more open source bent. 

 

John and I have had many discussions about where things are going in technology, especially in regards to SOA and databases, two of my favorite technology topics.  Anyway here are some thoughts about this post.  I’ve included my thoughts in italics and bold.

 

1.      Do not think in terms of programming languages, XML is a data format. It is NOT a serialization of programming language structures so don't treat it as such. Yes, we learned this in Relational Databases long ago (well I guess I know some developers that haven’t, but on the whole…) In REST based systems you're moving and manipulating data, you're not making method calls against a remote object. The fact programming languages are involved at all is a necessary evil. The data is what matters, programming languages are just tools to help work with that data.  Yes!  Finally someone gets it!

2.      XML Schemas are to act as documentation and spot validation, do not use them as a straight jacket around the data. W3C XML Schema driven systems are horribly brittle in operation because they don't bend. If the data doesn't exactly match what is expected they blow up, even on something as simple and non-critical as an extra attribute from a different schema being added to an element. Yes validate the data when you need to, but do not overly constrain it.  This is a recipe for disaster and is a major flaw for XML databases.  In the relational database management world, we know that if you don’t constrain the data to the business rules, data quality will suffer.  XML is brittle and does not provide good constraining capabilities.

3.      For the data you receive just take what you need and pass through everything else. If there's extra data in the stream it is NOT an error as long as all the other data you need is there as well. Just ignore the extra data. This is one reason why XML Schema based systems are so fragile, they take too literal of a view about the shape of the data.  Of course, you could just limit the data you request, such as is possible with a good data access language like SQL.  XPath is a good stab at a declarative language for XML, but it still has some growing up to do.

4.      As a corollary to #3, it is also not necessarily an error when non-critical data is missing from the stream. If you can do a little extra work to make the operation succeed with missing data, do the work. Um yeah, it’s called a join in the set based world.  And the DBMS does all the heavy lifting for you.

5.      The network can and will disappear. Take advantage of the clean data packet XML provides to queue the request for later execution. Anytime there's no user sitting waiting for the data it's probably a good idea to use this kind of asynchronous architecture.   Transaction Logs are wonderful things!

6.      Your data is going to change, that's why you're using XML to begin with. Do NOT lead your clients down a path that leads them to overly coupling their systems to your changing data. I.e. do not encourage them to use some kind of XML data binding tool that's strongly coupled to a particular schema. This also means you need to be very selective in what you consider required data. Adding additional required data to an existing operation is going to cause breakage. If data is not absolutely 100% for certain required then just make it optional. Adding additional optional data should never break the system. Another reason XML Schema based systems are so horribly brittle is they tend to break on any schema change, not just required data changes.  Another area where relational/SQL based systems are still superior.  Changing data structures are easy, fast, and tend not to break well written applications. 

7.      Do not require more data then necessary in a request. Yet another limitation of schema based systems is that you can be forced to carry a bunch of context data with a request just so that it passes validation, even when the data is not truly required for the request.  See thoughts on point 3. 

8.      Use XML tools as much as possible. XPath and XSL-T are enormously powerful tools for working with XML data, traditional programming languages are generally very poor at working with XML. Trying to bend traditional programming language techniques to working with XML data is what leads to horrible systems like W3C XML Schema and static binding of programming language objects to schemas. If you want to build truly robust systems do not fall into this trap.  Yes these are powerful tools, but they have some catching up to do with SQL.  As a declarative language there are few things I CAN’T do to request, transform, and manipulate data.  Not all SQL implementations are good performers at everything that I need to do, but it’s possible and not difficult.

9.      Build software that bends. Think in terms of malleable data and disappearing services. In general, visualize the system as an evolving, changing entity that can not be constrained. Do NOT try to constrain it, if you do you'll just end up with a system that falls over at the smallest inconsistency. Validate what you need, pass through anything you don't need.  Again if you don’t constrain it, data quality will suffer.  If you are talking about programs that just pass the data through to an application, that has to be validated, too.  Often times I see people just shredding XML documents and stuffing them into relational databases, or <gasp> adding a LOB field and sticking the XML document in it.  If XML isn’t capable of constraining the data and there is no validation before sticking it into the database, problems will arise.  Your application might not break, but how about the next guy down the line or the business user that has to report on that data.  I pity the ETL developer or data steward in the Warehouse that has to cleanse that data.

10. Maximize flexibility, do not fall into the trap of believing that a fully specified and constrained system will be more robust. It won't be, it will be brittle and prone to constant operational failure in the real world. Accept that the system and the environment where it's going to operate is going to change and build for that eventuality.   Flexibility is often given up in lieu of scalability.  However, relational databases tend to give you a happy medium.

 

All this isn’t to say that I’m anti-XML.  On the contrary, I think it’s one of the most important technologies of the past decade.  It just isn’t a hammer and the world isn’t a nail.  I wouldn't use SQL to develop GUIs, would I (could I?...hmmmm.... - :)  ) Let’s use XML for what it was designed for – messaging and inter-process communications.  As an industry we ditched hierarchical and network databases many years ago (see IMS, IDMS, et al), and for good reason.   

 

Jon

 

with 2 Comments

There is a new article by Chris Shaw on SQLServerCentral.com about "Hiring a DBA".  I want to start off by saying that Mr. Shaw is right on, and thanks to him for writing this!  Having just gone through a hiring process, which was like many I've had in the past, I feel his pain. 

Hiring an employee or a consultant is always an adventure.  As Chris put it, it's more art than science.  Sometimes you just know one way or the other.  Sometimes, you are just wrong!  I recall one instance where the candidate interviewed with me and another Sr. DBA.  The guy answered many technical questions correctly, said "I don't Know" in the right places, and seemed to communicate very well.  However, when he arrived on site, he sadly did not work out due to lack of initiative and creativity and lack of ability to work in a highly available environment.  We were just wrong!

Hiring is typically a time-based task.  Very rarely can you wait around for the absolutely perfect candidate to show up on your doorstep.  I liken it to buying a house.  You can't buy the best house in the world (well, most of us can't anyway).  You buy the best house on the market when you are looking, where you are looking, and in the price range you can afford.  You recognize it might need some fixing up or doesn't have all the ameneties you were looking for, but that's OK.  You might add/fix it later or live without it.  As long as the candidate has the basics and the personality matches, that's often enough.

In Chris' article he breaks down the process into a few stages, including position definition, position posting, resume review, and interviewing.  I really liked the first two sections, and I have things I'd like to add to the last two.

Resume review is something that comes with a good bit of experience.  After talking with a number of candidates after reading their resume, I've learned to spot quite a few things, now.  Some of my favorites are

  • Search and replace of Oracle and/or DB2 with SQL Server.  This one kills me.  It's obvious on the resume when a candidate doesn't really know SQL Server.  This last round we had a candidate that had the title of SQL Server DBA, but all the tools listed were Oracle (TKPROF, OEM, etc.).  I realize that Oracle has a larger market share than Microsoft, so it's likely the person did both, but the resume didn't read that way at all.
  • Developers trying to become DBAs.  Yes, there are 2 types of DBA; Production Support and Application Development.  I am all for people expanding skillsets and working into new positions.  I just wish they were upfront about it.  I see alot of applications developers applying for Production Support roles, and the resume reads "developed SQL and stored procedures for .NET application", or "designed tables to support data storage."  When this type of candidate is asked about fragmentation or database consistency or backups, s/he really doesn't know.  It's just too far of a stretch.  That's an easy one to see on a resume, though.
  • Lots of job/contract hopping.  As Chris points out long tenure is easy to spot on a resume.  So is job hopping.  While I understand that "long tenure" has a different definition today than it did 20 years ago, it does seem reasonable that a person would catch on with a company for a couple of years at least.  There are exceptions to this, such as project related consultants.  Typically, that type of person is easy to spot, too, due to the high level of project tasks and variety of technologies used.  When I spot these types of people looking for long-term gigs, I like to phone interview and find out why.  Sometimes the reasons are legit and the person will work out.  Other times, you may be just the next hop.

Interviewing is also something that comes with a good bit of experience.  Many people think that answering technical questions and being funny or nice is all you need to get a job.  Actually, I've recommended people for hire without asking a single technical question, and been right!  More than knowing book answers or being able to quote a popular website or magazine, I like to learn about a candidate.

I prefer a two stage interview. First the phone interview, then the face-to-face.  The phone interview is crucial.  Unless a candidate comes recommended by someone I respect, I do a phone interview, and sometimes even then.  I like to ask a person to tell me a bit about him or herself and what recent projects s/he has done.  I can quickly tell about communication ability and enthusiasm or passion from these questions.  I might lob a few softball technical questions at the person on the phone, maybe more than softballs if s/he is long distance and would need to travel for a face-to-face.  BTW, pay attention to the sounds on the other end of the phone.  If you hear mouse clicking or keys tapping, the person may be googling or looking in BOL for answers (ask me how I know!).

Next is the face-to-face, and I hope you are wearing anti-perspirant for it.  It could get messy.  I prefer a group interview with some senior people, possibly even developers.  I try to push the person until I learn what I need to know to recommend or hire the person.  I'm really looking for the following:

  • How does a person handle stress?  I'll use questions or scenarios to get the person to a point of stress to see how they handle it.  My friend Scott Sawatzki likes to say people handle stress in 3 ways; Fight, Flight, or Freeze.  I avoid DBAs that freeze.  There is a lot of stress in the DBA role, typically.  If you've ever had a cubical full of managers waiting for you to restore a database or restart a service, you know what I mean.  How a person will behave in this type of situation is crucial to his/her success.
  • How does a person handle lack of knowledge.  Like Mr. Shaw, I like to hear people say "I don't Know", or "I would look that up".  Think of an interview as an adaptive test.  It will get harder as the candidate answers hard questions, and it won't stop until "I don't know" is reached. 
  • How does a person solve problems?  Does s/he work them out silently or verbally, or not at all?  If given a scenario question where a long answer with multiple steps are required, what does the person do?  There is so much problem solving for DBAs, it just requires a good problem solver.  Whether it is troubleshooting or SQL development, a candidate must be able to solve a problem on his/her own.  I tend to get a hint of organizational skills here, too.
  • How much information does a person offer when answering questions?  Does the person give short, terse answers, or does s/he prattle on and on getting off topic completely.  Like Chris Shaw mentions, this is typically a sign the candidate doesn't know and is trying to divert.  I like answers to be somewhere in the middle of these 2 extremes.  I recall a hire where I did not recommend the candidate, but he was brought in anyway.  I didn't like the fact that he talked too much in the interview.  In the end it was discovered the guy really didn't have the basic understanding needed and spent most of his day socializing.  He was let go in under 6 months.

There are many discussions about the types of questions to ask in an interview.  Given there are a number of sites that already list questions, I won't add to that.  I think this is a good starting point. http://vyaskn.tripod.com/iq.htm.  Vyas has provided questions by category, which is helpful when you might be interviewing for different roles.  Obviously, I have my own set of questions, but a candidate should at least be able to answer a set like these that are readily available. Essentially, I like to ask questions that are representative of what the person will be doing.  I like Chris Shaw's idea of making a list of tasks and projects.  That will narrow down the types of questions to ask.  I also like scenario questions where the person must give more than one word answers to demonstrate technical knowledge.

Hiring is hard.  It bogs down a staff trying to do it, and unsettles the team for a month or so afterward.  That's why its important to do it as infrequently as possible by getting the right candidate the first time.

What are your hiring or interviewing experiences?  What do you think of my additions to Chris' aritcle?

Jon

 

with 1 Comments

As my vacation to the Mayan Riviera  approaches, I feel compelled to wish everyone Feliz Navidad.  Now if a particular song happens to get stuck in your head over the next few days, you can feel free to blame it on me! While this wish may not be as culturally sensitive as Kent Tegels, it seems especially fitting given my current bout with Short-Timer's syndrome.

As we approach the new year I wanted to also make a call for speakers to the Central Ohio SQL Server SIG.  We meet the 2nd Thursday of each month at the Microsoft office in Columbus.  If you have an interesting topic and/or want to advance your career in 2006, we would love to have you speak at one of our upcoming events. BTW...Dave Donaldson and I are currently working on a full day event in the March/April timeframe.  This would be like the Day of .NET, but adding a 2nd track for SQL Server 2005 related topics.  We could need speakers for that as well.  For those of you that attend the SQL SIG, if you don't volunteer to speak, your punishment is hearing me again, and again, and again in 2006.  Talk about motivation! :)  If you are interested, feel free to send an email.

Jon

with 2 Comments