Creating a DOCX Out of Thin Air

Suppose you were stranded on a desert island, and you wanted to create a document. A “Hello World!” document, just to let the world know you’re out there.

And let’s say you’re going to send it to a bunch of people back at the office who are all running Office 2007. (You’ve been on this island for quite a while, see, or maybe you work in DPE.) Office 2007 can open anything from a text file to a DOCX, but you’d like to send a DOCX, just to look cool and to show your support for open non-proprietary technology.

But you only have a crude computer with no application software installed. Maybe it’s running Windows XP, or maybe even Win95. Or it could be a Mac, or a Linux box, or even some old DOS machine or an Apple II or something.

Sounds like you need to create a DOCX out of thin air. No problem. You’ll need three things to do it:

1) A folder-based file system. CP/M or TSO won’t do — ideally you want something designed in the last 25 years or so. I used XP.

2) A text editor. I used Notepad.

3) Some way to compress a ZIP archive, and it needs to be a version that handles folders within the archive. I used WinZIP.

Here’s what to do …

First, create two folders. On my XP laptop, I just put them on the desktop. These folders should be named _rels and word.

Next create a text file at the same level as these folders (e.g., also on the desktop), and put this chunk of XML in it.

That’s just some XML that defines two content types: package relationships and a WordProcessingML document. Save the text file, and rename it [Content_Types].xml. You should then have two empty folders and a content-type file, all in the same folder or on your desktop.

OK, we’ve create three of the five things we’ll need to make this document work. We’re over half done!

The next thing we need to create is the relationships file. Create a text file in the _rels folder, and put this content in it.

That’s some xml that basically says “this officeDocument’s outermost part is the document.xml file in the word folder.” Save this text file, and name it .rels. (Yes, just an extension — note that in the content types we said, in essence, “anything with a rels extension defines package relationships.”) If you’re running Windows, you can’t rename a file to an extension only from the GUI shell, so you’ll need to go to the command prompt to do this or find some other way. Deal with it.

The final step is to create the document.xml file in the word folder. That one should contain this chunk of WordProcessingML.

Note that we could also have a _rels folder within the word folder, to define relationships to document.xml. But we didn’t bother, because there are no other parts in this document for document.xml to have a relationship with. When you have no relationships, life’s simple.

That’s it, you’ve created the contents of a DOCX file. Now you just need to package it up. Go back to your top level folder and ZIP the three pieces (the two folders and the content-types file) into a ZIP archive, then rename it something like HelloWorld.docx. Then open it in Word 2007, and it should look something like the image to the right.

If you have the Word 2007 beta but don’t want to bother with the steps above, here’s a link to a HelloWorld.docx that was created exactly as outlined above.

This document has many key parts and items missing. It has no document properties and no application properties. It has no support for most common content types. It has no headers, footers, styles, themes, tables, fonts, or other typical document elements. But it’s an Office Open XML document, and Word 2007 will open it without complaint.

Suppose you want to add an image to this document. That gets a little messier. You have to add a content type for the image format (jpeg, say), then you have to define a relationship for the image and insert some WordProcessingML into document.xml to add the relationship to the document. You also have to put the image file somewhere in the document — Word puts them in a media folder, which is a good idea, but you can just throw it in the word folder if you’re feeling lazy, and make the relationship point to it there. If you do all of those steps, you get a document something like this one, which looks like this in Word 2007.

So you can create a DOCX that Word 2007 (or any other Open XML consumer) will open, and you can do it without Office 2007, or prior versions of Office, or any other software written in the last decade for that matter. You can create it right out of thin air. This isn’t the easiest way to go about things, of course, and it’s certainly not a best practice, but it demonstrates the openness of the new Office Open XML file formats in a simple and straightforward way.

The obvious variation, to create a DOC file from thin air, will be left as an exercise for the reader. :-)

The World is Flat

As longtime readers know, I used to review the occasional book. Because I used to read a lot of books. A book or two a week, sometimes. Then I got a job at Microsoft. I’ve only read a few books in the last few months, mostly about C# or .NET.

But last week Megan got me a copy of “The World is Flat” by Thomas Friedman. And somehow it made it to the top of my “to read” pile right away, and I’ve actually been reading it.

It’s a cool book. Several people I know have read it already, so I feel a bit late to the party. And after the way I’ve spent the last few weeks in my job at Microsoft, I can see why people would be recommending this book to me.

The opening paragraph starts with a description of a tee shot on the KGA golf course in Bangalore. Here’s a picture I took of that tee shot, during lunch break at an Office 2007 workshop three weeks ago. Friedman describes aiming at Microsoft, and my photo is a shot from Microsoft back to the golf course.

I was there in Bangalore for the Office 12 workshop (where I met Pali, Tarun, Raja, Amol and others), but I also met on Friday with Sonata. Datta and Sanjay are working on some content for OpenXmlDeveloper.org.

I also took a picture from the new Microsoft building, which is nearly ready to be occupied. Looking across the golf course from the open deck outside the cafeteria, you’re looking right down the middle of a runway of the nearby airport. Looks like it will be a cool place to hang out and watch planes taking off, hauling loads of software developers to and fro.

The premise of Friedman’s book has been discussed a lot on many forums (check out the Amazon comments), but here’s my synopsis. Friedman’s premise is that the business world has fundamentally changed in the last 5 years, while most people were distracted by things like the war on terror. Individuals can now collaborate effectively from anywhere around the globe, there is inexpensive high-speed internet connectivity to previously untapped markets (such as India and China), and there is a growing body of people who look at international business in a completely different way from how previous generations viewed such possibilities. That’s the thesis, and the book is full of anecdotal and statistical evidence that supports this thesis.

My own tangential experience of some of the people and places in this book is interesting, at least to me. For example, on page 206 he tells the story of how the state of Indiana outsourced its unemployment operations to Tata. Tata is a huge Indian tech consulting firm, a competitor of Sonata. When governor Joe Kernan discovered what a political brouhaha had been hatched by outsourcing the unemployement department to an Indian firm, he squelched the deal, but not before Tata had made roughly a million dollars on the deal.

So check out this picture, which has been on my web site since 1999. Mom and I were traveling in India, and we had bumped into a couple of ladies in Nepal the week before whose itinerary overlapped with ours for over a week through Kathmandu, Royal Chitwan National Park, and Varanasi. One of those ladies, Maggie Kernan (on the right in the foreground, black shirt with flowers on it) is the wife of Joe Kernan. She wasn’t the First Lady of Indiana when we met her — Joe was Lieutenant Governor then, but governor Frank O’Bannon (who had signed off on the Tata deal) had sudden health problems that catapulted Joe into the governor’s office.

Today I was on a conference call with Microsoft’s XML MVPs, to tell them about the site that I’ve been working on with Sonata. Shortly after the call, I got an email from Jeff Julian, an MVP on that call. I followed the link to one of Jeff’s blogs, EntrepreneursWithBlogs.com, and there I saw that Jeff had a very brief review of “The World is Flat” in which he recommends reading the first 100 pages and then skipping ahead to the last 3 chapters.

I’ve made it to page 207, but I’m going to take Jeff’s advice from here and skip to the last three chapters. I’m too busy to be reading this book. I mean, I just got off a conference call with Sonata!

Anybody else who’s read this book want to comment on it?

More Tips from Savraj

I mentioned in an earlier post that Savraj Dhanjal has started doing guest posts on Jensen Harris’s blog about programming the Office 2007 clients. Client extensibility is one of the areas where Office has really been beefed up in this version, and there are lots of great options for developers to customize the user interface.

This week, Savraj explains the callback concepts for custom add-ins that dynamically alter the ribbon. If you find callbacks confusing, I think you’ll find Savraj’s Hollywood screenplay-style explanation very clear.

The Office Open XML Spec

I’ve been meaning to dig into the details of the Office Open XML specs for a while, and now that I’m moderating the OpenXmlDeveloper.org site I’ve started doing exactly that. It’s a 2000-page document, and it’s not in “published” form yet, so it’s a bit challenging to dig through all the details and figure out what really matters in real-world document assembly and application integration scenarios.

I figured as long as I’m going through this process, I might as well take some friends and colleagues along for the ride. So if you’re new to Office Open XML and want learn more about it, check out my “Guided Tour of the Spec, Part 1: Packaging” post. Next I’m planning to cover the three specific markup languages WordProcessingML, SpreadsheetML, and PresentationML, then I’ll start posting some C# code samples that use the System.IO.Packaging API to do common tasks with the new file formats.

Mom, you may want to sit this one out, since your computer doesn’t have the latest build of Visual Studio installed. I’ll post some fun pictures this weekend for you instead.

Email Scams

There are so many email scams these days. And there are a few that just seem to never get old, recurring themes in the scam-o-sphere: free porn, “male enhancement” drugs, investment advice, and my favorite, the old “please tell us your username and password” trick. I believe the official term is phishing, although I’m pretty old-fashioned and conservative so I won’t be using that word until it shows up in my Webster’s dictionary. And I only buy a new one every 20 years or so. :-)

Anyway, we put the OpenXmlDeveloper.org site up last week, and within a few days I got this email message. It was sent to the administrator address for the site, and it’s a classic “send us your username and password” message. (OK, phishing will start looking pretty good after I type that phrase a few more times!)

Look close at the email. The link in the middle of the message doesn’t really go to the address it shows: instead, it goes to the IP address you can see in the tooltip that popped up when I hovered the mouse over the link. And I looked at this message pretty closely, just for fun: every link on it points to that same page.

What does that page do? It asks you for your Ebay username and password, of course. In a very official-looking way that presumably looks familiar to Ebay sellers everywhere. Then they store that info, and an actual human can log in to your account later and sell your junk for less than it’s worth or send nasty emails to your friends under your name or drain your bank account, or whatever.

I say “presumably” looks familiar to Ebay sellers, because I’ve never sold anything on Ebay. They have some kind of problem with my debit card, and I have some kind of problem with companies that don’t have a human being I can call and talk to, so the Doug-Ebay business relationship got off to a rocky start and has never recovered. Too bad, really, because I tried to play along and give the scammers an Ebay login, but I had none to share.

Office 2007 Video Tour

Now that I’m using the Tech Refresh build of Office 2007 (B1TR, as we call it here in acronym-happy Redmond), I’ve been meaning to put together a series of screen shots to show everyone what it looks like. The visual details of the TR build are quite a bit more polished than the Beta 1 build I had been using since November.

But hey, as my friend Len once said about my tendency to play music whenever I get a chance, “some things are best left to professionals.” So the marketing and publicity pros around here have released a video today that saved me a bunch of time. Click here to see a slick video of the Office 2007 user interface in action.

With all the time this video has saved me, I’m going to head over to the Pro Club and work out. Got to get ready for those wedding photos!

Do No Evil, Google!

Twice in the last week, I’ve come across great posts on Dare Obasanjo’s blog. One of them made fun of Microsoft, and one made fun of Google — nicely balanced coverage of tech issues, I’d say.

First he pointed out why there’s a bit of confusion in the marketplace about our Live strategy, with this gem: One of These is Not Like the Others.

Then today, this Open Letter to the Google Blogger Team is priceless. What has the world come to, when a Microsoft guy is pointing out Google’s bugs and lack of support for open standards?

If you don’t read Dare’s blog, you’re missing out on some good stuff.

OCR and Translation

Handwriting recognition has improved to the point that it’s usable for most people. And that’s sort of miraculous for those who write longhand more than they type on a keyboard. (I don’t know anyone like that, actually, but I hear such folks still exist.)

The English alphabet only has 26 letters, 10 numbers, and a few punctuation marks involved. Kid stuff. Consider the complexity of Kanji, with its 1800+ symbols.

My friend Tad has considered the complexity of Kanji (and English!) a time or two. He has worked as a Japanese-English translator for many years, and he now works for a major software firm in Microsoft’s back yard (hey Tad, should I say where you work?). The work he does: translating Japanese-language SDKs into English. (An SDK, or software development kit, is a set of documentation and tools related to a specific API or a general approach). In Tad’s work, it’s helpful to know both the English and Japanese ways of saying something like “calculate the hash for the string and compare it with a property of the object passed by reference.” He even translates the comments in the source code, so the job involves reading a lot of source code, too.

Tad showed me some simple handwriting recognition software for the Japanese (kanji) charater set. He drew a character with the mouse (at a pace that looked like he does this often) and the system recognized it and showed various alternate ways of drawing it.

Hey Tad, what’s this character? Did I just say something obscene or anything like that?

Saturday Night at Home

It’s been a long time since I’ve even been home on a Saturday evening. I can’t remember the last Saturday I was at home. Let’s see …

Last week, I was working on OpenXmlDeveloper.org, the final push before going live Tuesday morning. Before the sun went down, I stepped outside and shot a picture of my new cell phone behind Building 16.

A week before that, I finished a long day of sightseeing in Mysore with flights from Bangalore to Mumbai, Amsterdam, and Seattle.

A week before that, I was on the way to Bangalore, after picking up my passport and Indian visa at the Sea-Tac FedEx office. My itinerary had three flights, ending with a coach-class trip from Mumbai to Bangalore at 3:00AM.

A week before that … well, OK, it wasn’t really work related, but it was exhausting and thousands of miles from home: the all-night Carnival parade through the Sambodromo in Sao Paulo. Somebody had to do it!

A week before that, I was on the way to Sao Paulo.

And a week before that, six weeks ago, was the last time I had a Saturday evening at home.

Tonight, Megan cooked vegi-stuffed peppers, delicious and healthy. Fish patrolled the table until the food arrived. It’s nice to be home.

Tools for working with Office Open XML files

Kevin Boske blogged Friday about some tools the Office team plans to release for developers who are working with the Office Open XML file formats. See his post for the details.

Kevin’s team has some great ideas, and they’ve been talking to many of the developers who are already working with the new formats to fine-tune their plans. If you’ve started digging into the formats and have opinions about what types of dev tools might be most useful, post your thoughts on the conversations on Kevin’s blog and OpenXmlDeveloper.org.