Have You Heard About the Microsoft TechNet Wiki?
Here’s another one to add to your list of browser bookmarks!
The TechNet Wiki covers Microsoft technologies from writers throughout the community for use by the community. As with all wikis, this grassroots effort needs your help.
Microsoft is encouraging everyone to contribute the effort – all you have to do is join. So start a whole new article, add your knowledge or draw from your experience to improve an existing article. You can start small or large… Join in at http://social.technet.microsoft.com/wiki/ or simply use the wiki as a new resource to answer your tough technology questions.
Enjoy,
-Kev
Want Another Reason to Hate iTunes?
I’m not one to whine. Really. I’m totally not a whiner. However, I’m going to sound like one with this statement…
I fricken HATE iTunes.
There, I said it. I’m already starting to feel better.
Playing on Elisebeth Barrett Browning and her fantastic poem, “How do I love thee? Let me count the ways.”, I’m going to count some ways that iTunes is filling me with inhuman, Hulk-like rage:
- Ridiculously frequent updates. Not the “Update Tuesday” sort of thing we get from Microsoft, but the “I’m going to interrupt you all the time, any time sort of upda…” – hold on, iTunes wants me to update it.
- Genius. You’re an idiot.
- Shuffle. You don’t.
- Home Sharing. If by “sharing”, you mean making it impossible to get music onto other devices without copying and moving it manually, you’re perfect.
- Relentless Focus on Making a Buck. Yeah, I know that Apple is the biggest capitalized company since the Iron Age and that they had a better Q4 in 2011 than the rest of humanity combined. But couldn’t you give it a rest for plain ol’ music, especially if you’re a user who still uses CDs? It seems like they’d monetize punctuation marks if they had the opportunity, for Pete’s sake!
- Duplicate Songs in Library. Take a few minutes and Google on ‘Remove duplicates from iTunes library’. (See it on Let Me Google That for You). People are about to grab torches and pitchforks on this one. WHY ISN’T THIS BUILT IN?!? WHY DOESN’T THE TOOL HANDLE THIS?!!!? Someone please make an app for this!
- Duplicate Songs on Disk. So my last disk backup took a lot longer than usual. Hmmm, I wonder why? So I looked at my backup info and saw this (image). Just in case you don’t see what I saw – 70.3 GB of hard disk sucked down for music. Keep in mind that I actually have only about 6 GB of music files. So, in the vernacular, W-T-F?!?
I’m now going to spend hours of precious time burning down iTunes and rebuilding my library. And I must’ve already done this two or three times in the past few years already. Look, Apple, I don’t fricken need this. I need troublefree music that doesn’t require an IT certificate to manage. Didn’t Apple used to be the company where everything was slap-your-mammy easy? Well, it ain’t so now. And it’s pissin’ me off.
I’m sure that there are other things about this product that makes your blood boil. Lay it on me! I want to hear your rant!
Enjoy!
-Kev
For Devs – Component Code Challenge and INETA Community Champs
Two quick notes from our friends over at INETA:
Component Code Challenge
Ever had these two distinct thoughts – “I have a good idea for an application, however what can I get for it?” and “I would love to go to a big conference DevConnections or Tech·Ed, but how would I pay for it?”. Haven’t we all had thoughts like that at some time or another?
Well, you are in luck. The INETA Component Code Challenge for 2012 will send one lucky winner to their choice of those aforementioned conferences for slinging some serious code. All you need to do is create an application using 2 approved controls from 2 approved vendors, create a video talking about your application, and submit it to their judging panel. The judging panel consists of Bill Reiss (blog | twitter), Nikita Polyakov (blog | twitter), Matt Hidinger (blog | twitter), and Greg Leonardo (blog | twitter) are on the lookout for innovation and creativity in the use of approved controls. To read official rules click here.
Grab your computer and Visual Studio and GET YOUR CODE ON!
Fine Print: INETA covers one conference ticket, hotel, and travel to the conference as is outlined in INETA’s travel policy. Please visit the site for additional rules.
INETA Community Champs
Are you a hardworking community builder in the coding world? Then INETA wants to know about you! Check out the latest round of INETA Community Champ nominations. According to INETA:
The mission of the Community Champs program to recognize and thank those members of our community that make it all possible. The first quarter is flying by, so this is your friendly reminder that you should submit your entries for Community Champs. Are there any “Rock Stars” in your community who should get a “thank you” for their hard work be sure to let us know. The deadline for submissions is March 31, so what are you waiting for go to http://www.inetachamps.com. If you have any questions about the program feel free to contact us at noram.champs@ineta.org.
Enjoy,
-Kev
Follow me on Twitter
Oracle’s Big Data Appliance and Toad for Cloud Databases
The big Hadoop news of the week is that Oracle has partnered with Cloudera to bring their Hadoop expertise to Oracle’s Big Data Appliance. As Computer World notes the prevailing wisdom had been that Oracle would put together their own distribution, and it may seem surprising that the world’s largest database vendor would use someone else’s database software. The next year will certainly be interesting – is this a try before you buy move for Oracle? Or is it rinse and repeat of what they did with RedHat; partner first, then try to take them out?
Anyway, the net of all of this for Quest and Toad for Cloud Databases is positive. Quest users want to be assured that Toad has them covered whatever the database landscape looks like and however it changes. We brought our product to market early, and have broad support for the Hadoop ecosystem with HBase and Hive support, as well as having partnered with Cloudera in 2010 on the Quest Data Connector for Oracle and Hadoop , a high-speed data connector to move data between Oracle and Hadoop that unlike Oracle’s Hadoop loader enables you to move data in both directions. We also have support for other systems – Cassandra, MongoDB, Amazon, and Microsoft SQL Azure and Azure Table Services. In 2012 we’ll be adding support for Oracle’s noSQL database and a couple of others that I’ll write about as our roadmap gets firmed up.
The Year that was – 2011
Looking back on 2011, I’m surprised by two occurances. First, I got a lot of work done, despite myself. My biggest obstacles to high-performance are all self-derived; procrastination, disorganization, and plain ol’ laziness. Second, I’m surprised I survived my personal travails. I’ve had my fill of frowns this year, from wayward children to caring for sick loved ones to self-inflicted injuries in 2011. I’m glad to be closing the door on a few of those chapters and look forward to better times in 2012.
Here’s a run-down on my professional activities over the course of 2011:
- Articles: 3
- Conference Spoken: 14
- Customer Calls: 124
- Customer Visits: 7
- Magazine Columns: 14
- PASS Chapter Presentations: 12
- Pre-cons/Full-day Seminars: 7
- SQL Saturdays: 4
- SSWUG Sessions: 8
- Webcasts: 16
Plus, I got to got on an awesome SQLCruise and was featured on Richard Campbell’s RunAsRadio show at least once (Richard’s blog | twitter). (I was thinking that I’d been on twice in 2011. But that other appearance may have been in late 2010. My records aren’t clear.)
I was also put in charge of the SQLServerPedia portion of the DBPedias sites. Some statistics there:
- 133 contributing bloggers
- 4,500 blog posts added in 2011 (out of a total 9,000 blog posts)
- 57,000 content items added in 2011 (out of a total 138,000 content items)
- Monday-Thursday all Pedias average 8,700 visits combined
- 1.9 million visits in 2011 (out of a total 3.8 million visits to Pedia sites since SQLServerPedia was started in 2008)
2011 was also my year to jump into Twitter. By years end, I had accumulated:
- 3,452 Tweets
- 531 Following
- 2,656 Followers
- 230 Listed
I think that my increase in tweets had a direct correlation on my decrease in blog posts. Ironically, I have accumulated even more topics to blog about (I’ve somewhere around 630 nascent blog posts), but simply run out of time to put them into WordPress. My blogging activity for 2011 was down to 77 entries, about half what I wrote in 2010.
I’ll talk a little about my plans for 2012 in another post. I hope to see you following me on Twitter soon! Thanks,
Kevin
Getting started with Apache Pig
If, like me, you want to play around with data in a Hadoop cluster without having to write hundreds or thousands of lines of Java MapReduce code, you most likely will use either Hive (using the Hive Query Language HQL) or Pig.
Hive is a SQL-like language which compiles to Java map-reduce code, while Pig is a data flow language which allows you to specify your map-reduce data pipelines using high level abstractions.
The way I like to think of it is that writing Java MapReduce is like programming in assembler: you need to manually construct every low level operation you want to perform. Hive allows people familiar with SQL to extract data from Hadoop with ease and – like SQL – you specify the data you want without having to worry too much about the way in which it is retrieved. Writing a Pig script is like writing a SQL execution plan: you specify the exact sequence of operations you want to undertake when retrieving the data. Pig also allows you to specify more complex data flows than is possible using HQL alone.
As a crusty old RDBMS guy, I at first thought that Hive and HQL was the most attractive solution and I still think Hive is critical to enterprise adoption of Hadoop since it opens up Hadoop to the world of enterprise Business Intelligence. But Pig really appeals to me as someone who has spent so much time tuning SQL. The Hive optimizer is currently at the level of early rule-based RDBMS optimizers from the early 90s. It will get better and get better quickly, but given the massive size of most Hadoop clusters, the cost of a poorly optimized HQL statement is really high. Explicitly specifying the execution plan in Pig arguably gives the programmer more control and lessens the likelihood of the “HQL statement from Hell” brining a cluster to it’s knees.
So I’ve started learning Pig, using the familiar (to me) Oracle sample schema which I downloaded using SQOOP. (Hint: Pig likes tab separated files, so use the –fields-terminated-by ‘\t’ flag in your SQOOP job).
Here’s a diagram I created showing how some of the more familiar HQL idioms are implemented in Pig:

Note how using Pig we explicitly control the execution plan: In HQL it’s up to the optimizer whether tables are joined before or after the “country_region=’Asia’” filter is applied. In Pig I explicitly execute the filter before the join. It turns out that the Hive optimizer does the same thing, but for complex data flows being able to explicitly control the sequence of events can be an advantage.
Pig is only a little more wordy than HQL and while I definitely like the familiar syntax of HQL I really like the additional control of Pig.
New in 2012 – IT Horror Stories
I do a lot of public speaking over the course of the year at many different conferences and events. I always try to carve out time during and after the presentation to take questions from the audience. While many of these questions are de riguer, I often get questions that can only be described as “How do I handle this … <insert IT horror story here>?”

These stories often turned out to be more interesting than the question or the answer in and of themselves. For example, it’s a common public speaking best practice to repeat a question back to the attendee. This helps ensure that you fully understood the question and, in case of a session recording that’s picked up only on the microphone, that the question is also recorded. But when you’re immediate response, as the speaker, is “Your manager told you to do WHAT?!?”, you know you’ve hit a zinger, as in “Your manager told you that backups aren’t important?!?”
These stories came to be so fun, in the time-honored tradition of slowing down to carefully examine a car wreck on the highway to the point of clogging all other traffic, that I started to make IT Horror Stories a part of my regular presentation portfolio. And I never have to repeat myself since something new and horrible aways seems to be happening and, in many situations, conference attendees specifically seek out these sessions just so they can air their grievances.
Want to share your IT Horror Story? I’ll give you a free eBook for any that I post here!
In our first installment of IT Horror Stories, I bring you a little lesson from my friend and coworker, Richard Douglas (blog | twitter), a SQL Server enthusiast living in the Maidenhead UK region. Richard writes:
The background story is that I was in a meeting with a few managers and they announced (as they tend to do) that in 20 minutes they were going to start UAT’ing on a machine I hadn’t heard of (let’s call it PC101) I asked what this
machine was as it wasn’t listed on my last estate audit using MAP (Ed: the Microsoft Assessment and Planning too, found here. I wrote it about on my SQL Server Pro magazine Tool Time column).
The manager told me that it was just a PC not a server with one spindle and only 2GB of RAM on Win 7 32bit OS to hold a suite of databases with a total size of 300GB with TDE enabled to boot – and they were going to be doing user testing on this!!!
I told them there was no way that this machine was going to be usable and the users would take a bad view of the new features because of the poor performance. So I was given the challenge of doing what I could to improve performance - in 15 minutes.
Straight away, I rushed over to desktop support to see what spare machines they had lying about. Luckily, they had some spare machines for new starters. So I managed to grab a bit of extra RAM and a hard drive from another machine. We had trouble attaching the extra drive into the machine. It just wasn’t going to fit. So we ended up putting the drive on top and taped it on so it wouldn’t get knocked. All the log files were moved to the second drive to try to eliminate some of the disk contention and we also added a USB flash drive to make use of Readyboost.
Of course, the users still complained about performance. But I like to think that we helped things a little and it’s a great story of British ingenuity!
Manager: Of course we can get top-of-the-line performance with a little PC under a desk somewhere with minimal RAM, CPU, and IO capabilities.
ITPro: Are you kidding me? We might’ve been able to make it fast if we’d done a little planning beforehand. But this is rolling out RIGHT NOW!
Manager: Well, see what you can do with it.
ITPro: Ok. What can I spend to upgrade components?
Manager: Nothing.
ITPro: Gurgle… < Makes clutching motion at throat as if dying>
Follow me on Twitter! Enjoy,
-Kev
Amazon Elastic Map Reduce (EMR), Hive, and TOAD
Since my first post on connecting to Amazon Elastic Map Reduce with TOAD, we’ve added quite a few features to our Hadoop support in general and our EMR support specifically, so I thought I’d summarize those features in this blog post
Amazon Elastic Map Reduce is a cloud-based version of Hadoop hosted on Amazon Elastic Compute Cloud (EC2) instance. Using EMR, you can quickly establish a cloud based Hadoop cluster to perform map reduce work flows.
EMR support Hive of course, and Toad for Cloud Databases (TCD) includes Hive support, so let’s look at using that to query EMR data.
Using the Toad direct Hive client
TCD direct Hive connection support is the quickest way to establish a connection to Hive. It uses a bundled JDBC driver to establish the connection.
Below we create a new connection to a Hive server running on EMR:
- Right click on Hive connections and choose “Connect to Hive” to create a new Hive connection.
- The host address is the “Master” EC2 instance for your EMR cluster. You’ll find that on the EMR Job flow management page within your Amazon AWS console. The Hive 0.5 server is running on port 10000 by default.
- Specifying a job tracker port allows us to track the execution of our Hive jobs in EMR. The standard Hadoop jobtracker port is 50030, but in EMR it’s 9600.
- It’s possible to open up port 10000 so you can directly connect with Hive clients, but it’s a bad idea usually. Hive has negligible built-in security, so you’d be exposing your Hive data. For that reason we support a SSH mode in which you can tunnel through to your hadoop server using the keypair file that you used to start the EMR job flow. The key name is also shown in the EMR console page, though obviously you’ll need to have an actual keypair file.
The direct Hive client allows you to execute any legal Hive QL commands. In the example below, we create a new Hive table based on data held in an S3 bucket (The data is some UN data on homicide rates I uploaded).
Connecting Hive to the Toad data hub
It’s great to be able to use Hive to exploit Map Reduce using familiar (to me) SQL-like syntax. But the real advantage of TCD for Hive is that we link to data that might be held in other sources – like Oracle, Cassandra, SQL Server, MongoDB, etc.
Setting up a hub connection to EMR hive is very similar to setting up a direct connection. Of course you need a data hub installed (see here for instructions), then right click on the hub node and select “map data source”:
Now that the hub knows about the EMR hive connection, we can issue queries that access Hive and – in the same SQL – other datasources. For instance, here’s a query that joins homicide data in Hive Elastic Map Reduce with population data stored in a Oracle database (running as Amazonn RDS: Relational Database Service). We can do these cross platform joins across a lot of different types of database sources, including any ODBC compliant databases, any Apache Hbase or Hive connections, Cassandra, MongoDB, SimpleDB, Azure table services:
In the version that we are just about to release, queries can be saved as views or snapshots, allowing easier access from external tools of for users who aren’t familiar with SQL. In the example above, I’m saving my query as a view.
Using other hub-enabled clients
TCD isn’t the only product that can issue hub queries. In beta today, the Quest Business Intelligence Studio can attach to the data hub, and allows you to graphically explore you data using drag and drop, click and drilldown paradigms:
It’s great to be living in Australia – one of the lowest homicide rates!
If you’re a hard core data scientist, you can even attach R through to the hub via the RODBC interface. So for instance, in the screen shot below, I’m using R to investigate the correlation between population density and homicide rate. The data comes from Hive (EMR) and Oracle (RDS), is joined in the hub, saved as a snapshot and then feed into R for analysis. Pretty cool for a crusty old stats guy like me (My very first computer program was written in 1979 on SPSS).
Enabling Rapid Cross-Platform Reporting & Analytics
Toad for Data Analyst Webinar – Enabling Rapid Cross-Platform Reporting & Analytics
Date: Thursday, December 15
Time: 10 a.m. PST / 1 p.m. EST
Duration: 1 hour
Register: http://bit.ly/v90vfC
Ad-hoc queries have remained basically the same over the years, but query and analysis tools have changed significantly. In this educational webcast, we’ll explore the latest advancements in that area and the one tool that offers them all. You’ll see how Toad for Data Analysts will dramatically simplify your work with its ability to rapidly connect you to all your data sources. Our product expert will also show you how it accelerates reporting and analysis – and increases accuracy.
Plus, you’ll get in on some cool tricks that deliver significant time savings. We’ll reveal easy ways to:
- Connect to multiple heterogeneous databases, spreadsheets, and BI sources from a single tool
- Develop high-performance heterogeneous SQL using Toad’s powerful SQL editors
- Profile, cleanse, and stage data faster than ever for rapid analysis
- Automate and schedule reporting workflows to eliminate repetitive tasks
Registration date and time information:
Thursday, December 15, 2011 10 a.m. PST / 1 p.m. EST
Don’t have your own copy of Toad for Data Analysts? Here’s your chance to download the leading SQL query tool on the market today that is capable of everything from data profiling, data mining to predicitive analytics. Get it here for free for 30 days.
SQL Server Community Gives Toad the Gold
Big thanks to our SQL Server community for naming Toad your favorite database development tool and gold medal winner of the 2011 Community Choice Awards by SQL Server Magazine.
Jason Bovberg, senior editor at Windows IT Pro said, “Our Windows IT Pro, SQL Server Magazine and DevProConnections awards programs are unique in that the product-nomination process is open to readers. We don’t just present a predefined list of products and services, thereby limiting scope. Instead, the community both nominates and votes for the best products of the year, ensuring a nice breadth of inclusion in the surveys. The winners of these awards are truly the community favorites. Our winners have earned a unique honor to stand out among their peers as winners of our Community Choice Awards.” The article can be read at http://www.sqlmag.com/content1/topic/2011-sql-server-magazine-editors-community-choice-awards-140830/catpath/awards/showprivate/1/page/4
In addition to taking the gold, the editors at Windows IT Pro magazine also awarded Toad with a silver medal in the 2011 Editors’ Best Awards. Jason Boverg also said, “Our Editors’ Best Awards let us leverage our contributing editors’ expertise to provide well-earned recognition to products that exceed industry standards.”
And if that wasn’t enough, Toad was also named Runner Up for Best SQL Server tool at the Fall Connections 2011 show.
We are thrilled by the overwhelming support for Toad in the SQL Server community coming off the latest release of Toad for SQL Server 5.6. If you haven’t downloaded it yet, then I highly suggest you download the latest version for free for 30 days.
THANK YOU SQL SERVER COMMUNITY!


