Blog

What is a Data Scientist?

November 28, 2011 in News

“What is a Data Scientist?  By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meets Columbo – starry eyed explorers and skeptical detectives.” – Monica Rogati, Senior Data Scientist at LinkedIn

Source: http://www.forbes.com/sites/danwoods/2011/11/27/linkedins-monica-rogati-on-what-is-a-data-scientist/

Give & Take

October 23, 2011 in News

Well, we lost Steve, and Apple are now Jobless!

I thought about that, and decided not to comment.   There’s enough out there that it really doesn’t help to add my few cents worth.

What can I say?  I’m a wimp?  Besides, my comments might not be seen as complimentary.

Then, a week later, for some reason I cannot fathom, my swiss-cheese mind delivered a quote, right out of nowhere, it was something like this:

“The New Frontier is not a promise — it’s a challenge.”

I’m an immigrant, green-card, then citizen, then kids, now college bills.
So I do think I understand the quote, both edges of the sword.
No complaints.

I thought maybe it was something he said, once upon a time, centuries ago, when size was measures in kilobytes?

I couldn’t place it, this is something that bothers me a lot, attribution is important to me, it’s the only thing that gives you a sense of place in the universe.

It dogged me all week, eventually I gave in and asked that oracle-of-all-things called Google.

And I got back, top of the list, first time out?

The New Frontier of which I speak is not a set of promises — it is a set of challenges.

It sums up not what I intend to offer the American people, but what I intend to ask of them.

WHO?  It was JFK, in his 1960 acceptance speech.

I was 8 years old!

No TV in working-class England then, I got it from the radio.

Then I suddenly came to see that I admired both of them for similar reasons: - drive, ambition, commitment, quality

BUT, tempered with just a dash of humility and compassion. (just a dash)

Like the really old Moody-blues riff goes,
“Give just a little bit more, take a little bit less from each other tonight”

Somewhere, since 1960, we all lost the “give” part.

Steve never did.

He demanded, cajoled, persuaded, never apologized and probably even bribed!
But he always gave back far more than he got, always, it was in his soul.

We all lost another monumental personality.

Just don’t lose their lessons!

Big Data: The Next Frontier for Innovation, Competition and Productivity

October 11, 2011 in News

The McKinsey Global Institute’s report, “Big data: the next frontier for innovation, competition and productivity,” looks at the vast amount of enterprise information that exists, and the challenges that organizations will face in trying to manage it.

Source: http://www.intelligentutility.com/magazine/article/237281/big-data-threat-or-opportunity

Link to the McKinsey Report: http://www.mckinsey.com/mgi/publications/big_data

Where’s the Data-Quality Monkey?

October 10, 2011 in Inspirations

We come across a lot of routine data-quality tasks in our line of work.  Migrations, test data sets, mergers and acquisitions, its really quite an old subject by now. I thought we pretty much had it nailed, boring, tedious, and profitable!

But there’s always a surprise waiting for you just around the corner.

We’ve stumbled into an entirely new way of looking at much of this stuff over the last year or so, here’s how it goes: You get dragged in to a project that’s publishing monster volumes of data that until  recently was locked away deep in the bowels of a proprietary system (or even several of them).

But government mandates, or a drive for improving productivity, or a dozen other reasons, have driven the various data owners together to publish all this stuff so that it’s more accessible.

This invariably means some kind of internet portal, perhaps some BI tools, a fancy security system, maybe even public access. Then we get to the data, and it starts to get hairy.

As we pull stuff out we find all sorts of inconsistencies, undocumented gaps, lapses, ridiculous paradoxes and so on. Of course everyone blames us, since they see us as the new owners of the new “system”.

The DQ-monkey is firmly on our back. Probably for life!

So what to do? After all, our mandate is to publish the stuff, and we want to do it in a way that seems straightforward to both the data owners, and the folk that will be using it. Until recently we would end up with a set of rather elaborate procedures to get the data, ship if from one place to another, piles of validation, then reload into the “new system”. This is classic ETL (Extract-Transfer-Load) there are even a few good tools out there to help.

But no matter how you sweat it its usually slow, your published portal lags, by days, sometimes a week (or a Quarter!) behind the real data. And every time there’s an inconsistency we get dragged over the coals.

So last time round we took a different track. We didn’t really plan it though. We had so many different systems, with different owners, schedules, availability etc we just could not get it together, no one could agree on formats and some folk had their data locked up in Fort Knox. We had to find another way so that everyone could give us the data “their way”.

So we went out to each provider, we called them “data sources”, and convinced them to let us build a little web-service right in their environment. We got them up and running, one at a time, no need for co-ordination at all.  Everyone got fairly enthusiastic as soon as they realized they would no longer be responsible for providing those weekly or monthly feeds. Just turn it on and forget it!

Then we built our portal, with a query processor that could figure out where to go and get what it needed. Since we had written all the webservices we had a pretty consistent common-API. All done with SOAP web-service calls.  It was sluggish, but it worked rather well, and when one source went down the others all kept chugging away just fine.

Then a strange thing happened.

We ran into some data errors, but they weren’t ours anymore!  Since the data was “live” from the horse’s mouth, the provider was the one everyone asked for explanation.

The DQ-monkey has been chopped into little pieces and smeared all over the globe!

We now call this the “DQ Spotlight”, as soon as something pops up we shine the spotlight on it and hey-presto, the owner fixes it, or at least explains well enough so we can document it!

So next time you’re called on to pull a collection together think about how you can delegate! Take out your Spotlight and blind that Monkey!

National Bycatch Report Released

September 28, 2011 in News

IA Team is proud to support the entire team at the NMFS Office of Science & Technology.  IA Team built the database components and the reporting interface for the National Bycatch Report.  Our congratulations to IA Team’s Eddie and Paul for a job well done!

National Bycatch Report Released

The National Marine Fisheries Service has released the National Bycatch Report, the agency’s first comprehensive effort to collect regional data for a nationwide report on bycatch in federal U.S. commercial fisheries.  The report provides a baseline for measuring bycatch and establishes a consistent, comprehensive methodology for future studies. Now available on the NOAA Fisheries website, the report provides bycatch estimates for 481 species, including marine mammals, sea turtles, seabirds, and fish. It also presents bycatch estimates for 81 fisheries. Moving forward, the agency will continue to collect data on bycatch in preparation for the next edition of the national bycatch report scheduled for 2013.  Future editions in the report will be used to monitor changes and trends in bycatch, including as a result of new regulations, bycatch reduction devices, changes in fishing patterns, and population fluctuations in both targeted and bycatch species.  Samantha Brooke, now of the Northwest Regional Office, is the program manager and has given strong attention to detail and unwavering dedication to this project since 2005.

Cloudy Weather

September 25, 2011 in Inspirations

Some thoughts on the Silver Lining in that cloud. We’ve been using the Amazon cloud for development for couple of years now, and on the whole are very pleased. Even have a small production app up there.

But the last week has been a wakeup call for where its real advantages lie.

Imagine this.  You’ve been working on a production rollout of a well-developed app. It’s all there, well tested, all we want is to go public, both on a private internal net and on the Internet.

So we order the hardware, and wait. Get the machine room ready, and order extra AC, and wait. Get the power in there (wait for the mega UPS). The sysadmins get it up, install the OS, apply the patches etc., another week passes.  Now the DB guys come in, put in the DB, while at the same time the network guys get the app server up and running, and start haggling about how it will work with the firewall(s).

And we haven’t even loaded the app yet!

All told its weeks, actually nearly 2 months, from the decision to go to the first successful test from outside that really runs the app!  And more than a dozen people are involved, hundreds of man hours!

Then it finally dawns on everyone that there is no DR in place at all. Sure, we have good backups, and practice restores. But if the site is lost then we are out of commission until we build a new one! So now we start the whole process over to plan a DR and bring it online.

So, here I am, with all this collected wisdom about what it takes when I get a request, for a totally different unrelated project.  It’s late Wednesday afternoon on the 14th. “Paul, the OL app has to go live on the first! Make it happen! Oh, and we have to have a DR in California! We expect 1,000 online users by Christmas. ”

Big gulp, “only way to do it is a set of cloud servers”

So we try….. Thursday we fire up the DB, the security guy already has a cloud-database security group figured out, bring up a standby in CA. Decide we don’t like that version of  Linux, so we do it all again with something that’s a better fit for what we want. Load 20G of data in the afternoon.  Friday we test failover, switchover, switchback, get it down to 5 minutes after a couple of runs!

Monday we bring up App-servers. Big issue, the CA App-server can’t talk to the east coast DB!   Frantic scrabble for a couple hours, and it works.  Tuesday morning, I’m all done! I spend the rest of Tuesday writing up a nice set of 3 install docs, and a failover plan.

Then we realize we are weeks early! So we simply stop them all. Amazon won’t bill us if it’s not up and running.

Sometime on Wednesday it suddenly strikes me.

We found the silver lining in this cloud!

We replaced 2 months work part time by a team of ten or twelve with about 4 man days, plus a half day from the security guy, with no capital budget at all.

And we are done! Industrial strength DR, failover in 5 minutes, growth capacity available at a days notice!

Now if we can get a rainbow maybe I can find the crock of gold?

Something is stirring in the Forest!

September 8, 2011 in News

Have you ever woken up in the morning with that vague nagging feeling that you’ve seen it before? But you’re not sure what! Being an old-man-of-the-hills in the IT business I’ve watched a number of “sea changes” in the technology we use.

Mainframes got chased by Minicomputers, then PCs came along, then fast networks and client-server, then the Internet, now super-phones!

Each time we all watched as an interesting but immature technology gradually took hold and changed the way we did things.

Well, we all know that “digital convergence” is finally here; we’ve been talking about it for a decade. Phones, games, TV, movies are all coming together. But something strange is also happening at the same time.

The best stuff — the neat, cool awesome bits — are happening on my kids toys! And they are getting cheaper every week!

I have this nagging feeling that we are all missing the point of what it is now possible for an amazingly small amount of $$. Gadgets such as tablets, and e-readers are now bristling with wireless, GPS camera and video facilities that would have required a production studio crew only a decade ago! They run for ages, on batteries, stuffed in the backpacks of kids on the school-bus.

So where will this path take us? That’s where it’s foggy folks — there’s a 900-pound gorilla starting to move around in the forest. Myself, I think that when he finally pokes his head out and says “boo,” we’ll suddenly see he’s the cookie-monster! Someone we never expected!

Welcome!

September 1, 2011 in News

Welcome to the new IA Team website. Please take a look around and get to know us, and learn about what Mojo is and how we infuse it into our projects.

Also, be sure to check out photos of our team enjoying the beauty of this great Earth, and our new, mysterious flotsam section.

  • About
  • Mojo
  • Blog
  • Flotsam
  • Photos
  • Envision