all right so we’re gonna start with the
bad news chances are your data sucks I found this out the hard way it’s how I
wound up at Thinknum today we’ve got it we’ve got people who know the
difference between data that is great and data that sucks it is our
editor-in-chief Josh Fruhlinger joined by Adweek’s platforms reporter
Shoshana Wodinsky who covers financial and societal impacts of major
social networks we also have been Ben Gilbert over on this end he’s the
Business Insider senior technology correspondent Thornton McEnery the
executive editor of Dealbreaker and right in the middle Ross Fadely who’s
the Chief of Data Science at the Wall Street Journal Josh thanks Jon hey everyone
welcome back so as Jon mentioned we’re going to be talking today to a roomful
of people who work in the media world just like I do myself and we’re talking
about the ways in which we use data in order to tell stories whether it’s
breaking news trends in business like we do at Thinknum Media so I wanted to
talk a little bit about these these terms we’re hearing lately fake news we
often hear what’s your source we often hear that the ways that surveys are
taken are biased and therefore the data is no good and these these are all terms
that sound to us ridiculous on their own but putting and when we put political
baggage on them they make people think that it’s just crappy data and that’s
not always necessarily the case that’s why I wanted to talk to this crew of
people so in my daily coverage of everything from business to technology
to retail the culture at Thinknum Media what I do is I always include my data
source partially in order to show that I’ve done my homework but really also
more than anything to invite the reader to make his or her own interpretation of
what’s happening so they can actually look at those numbers and look at the
source of those data that’s one of the things that’s amazing about external
objective alternative data because anyone can go and look at those numbers if they
don’t think you’re right they may have a different interpretation of say Apple
hiring more people than they did before but the reality is is that that data is
objective it’s very real but I’m not the only one this is a group of people who
all use data in various ways within they’re covered so I wanted to talk to
to all of them so Shoshana I wanted to start with you because you work to my
left and you’re stealing my water so as John message she’s Adweek’s platform
reporter where she covers the financial and societal impacts of major social
networks you’re also the person that found or discovered in Facebook’s code
and maybe you can tell us about this that they were tracking app uninstalls
in order to see which apps people were downloading but more importantly which
ones they were deleting so tell us about what you do and about how huge data to
do some storytelling sure so I use kind of like a weird kind of subset of data
I am I call myself a coder not a developer because I am terrible at
coding but I like to tell stories with it if that makes sense and something
that I’ll often do when you’re a tech reporter kind of covering companies like
Facebook or Twitter or Instagram they’ll often make things like api’s or soccer
development kits or SDKs public like public because they want people to
unload them so they can kind of like siphon data from different apps and
different services for themselves so when you’re covering things when you’re
covering companies like Facebook like I cover a lot of Facebook because all of
their stuff is pretty much public and it’s kind of out there and there was
there always growing up somehow so the story that you’d mentioned that I talked
that I that I sent over I think either yesterday or the day before was that if
you go through their API you can typically find what if not what the
company is planning to do or what they’re already doing then like how
they’re using data and kind of like what their thought process is kind of the
same way that you can kind of glean a thought process from a company’s hires
or kind of the moves that they’re making in that space
if that makes sense so what I found out was that Facebook was regularly tracking
the apps that people were installing and I also found out that for a while they
were piloting a service that could track which ones were being deleted as well
and when I reported it out they were like nope this program doesn’t exist
anymore and they shut it down but kind of to that point when you’re I
learned pretty early on that when you’re covering a company like Facebook or
Twitter or Instagram or face or or Snapchat or anything like that you need
to double-check double-check and triple check and quadruple check your work and
you always need to save everything in like some sort of herd file so in this
particular case Facebook like instantly took down the developer documents I was
covering and I had them like archived I went into the wayback machine and I’m
like you were not taking these down because I need people to read this and
make sure I’m right and I was right but that just goes to show that it’s so
important to like have people check your work because ultimately once these
documents are taken down these companies can really say anything and you were
only as good as your word that doesn’t make sense
and so you actually gave me a nice segue to talk to Ross because when you speak
of checking or double-checking your work even if it’s external data even if it’s
data and numbers and ones and zeros that anyone including you know a Corp comms
person at Facebook or something like that who’s gonna come to you and say
where’d you get this information from and you say it came from you and then
they say well we took it down and then you have to be able to say well it was
objectively out there it’s very funny because I have had calls people saying
like we’re not running this program and then I’m like well how do you explain
this right and then typically it’s just it’s not a case of anything malicious at
least in my experience it’s usually just left hand not talking to the right hand
and these companies are so huge that they just don’t know in any of the
transactions that they’re making they’re leaving trails of data that exist
especially especially on the internet so I also wanted to introduce Ross Fadely
he’s Chief of Data Science at the Wall Street Journal and they obviously are
always checking their work Ross what is what does that mean Chief of Data
Science what are you what are you collecting and how are you interacting
with with people like ourselves who are looking for numbers and data and
double-checking their work yeah so I would say there’s like two rough buckets
of the work me and my teams do the first is actually more just internal like us
understanding how audiences are engaging with our content how do they make we’d
make better decisions around that and the most of the only external pieces for
that type of work are from like what’s happening in the news cycle because we
often want to make better content release strategies the other bucket of
work is more aligned with this which is we do a variety of both deep and you
know relatively quick data analyses that go into content and to stories so for
instance a really recent one our team built a tool that they called talk 2020
where they scraped all the you know trans like transcriptions and public
speeches that candidates have done going back to 1994 and then built a whole
bunch of machine learning models on top of that so that then reporters and
journalists can actually dive in and actually surface the topics that they
are talking about and get a more cohesive picture around how the
conversation has transformed over time both like historically but also leading
up to this current cycle so there’s both like I think project driven ones where
it helps to surface information to our journalists but also just regular
investigations so for instance a team was recently looking into customs data
and often they have to work with a third party you get that data and in that
transaction they can actually find out that that data is often heavily redacted
and so then how do you actually model that when you have some censoring of the
data and do that in a way that you feel confident you’ve delivered the actual
journalism so that’s some of the rough flavors of things we’ve worked on and
how do you and how do you normalize that data and still keep the I guess the the
veracity of say the the vector have changed within that data or something
like that and make it still meaningful without necessarily effect where you’re trying to get with that
story I guess you can say exactly so moving further down god you guys are
so far away this is Thornton McEnery he’s the
executive editor of Dealbreaker you may have heard of them daily
coverage of finance usually a lot more with a higher metabolism if you also
talk to us about that I mean you guys you guys are hitting these financial
stories hard on a daily basis yeah I mean I approach to data is a we’re
dealing exclusively almost our audience is really people on the street who sort
of want to have their personas reflected back at them and a less flattering light
it’s a very like masochistic news website but um we uh yeah our take
really especially in the last couple of years has been we you know we don’t do
data stories per se but we reflect that of stories back and say well the data is
saying this so why are you all doing that which has been really a bread and
butter for the last year when you know you’re seeing these stories you’re
seeing banks and funds coming with these large data dumps so here’s how we see
things going and you see one cast Bloomberg headline gets weeded out and
everything goes the other way and that sort of been like a recurring joke
that’s lost its humor the whole period it becomes more terrifying but I mean I
think with Wall Street especially our take on data is you’re dealing with an
audience that is probably the most aware in the world the data can say what you
wanted to say if you’re clever enough to play it that way so we try to go down
the middle a little more and you know make more of a statement via dark humor
about how well this like you know where does data end and maybe truth and logic
begin and that’s sort of been again you can plumb those depths a lot lately we
WeWork I think it’s sort of become sort of the the logical thesis it’s sort of
the end of our story at this point you know where everyone you know
whatever data led JPMorgan to come out and say months ago this thing could be
worth a hundred billion dollars and then we later say well let’s think worth
eight you know that’s where we’re like so we’re
where were the data guys here and that’s sort of that’s where how we approach it
in you have a you have an interesting I guess space in here because we’ve worked
together on a few stories yeah and we ran a story together a couple weeks ago
about JUUL e-cigarettes you know the the company that was literally on every
single corner that was worth billions of dollars and they were selling you know
millions of little cartridges and they come out under heavy regulatory attack
and political attack and that kind of thing probably rightly so but we saw
there they’re hiring data absolutely tank and and ran this story on Dealbreaker within an engaged investment community and something interesting
happened which was you had people who are they’re reading stories about
finance and looking for for data about investing perhaps but they they they
were kind of swimming upstream against the data and it was they’re furious
yeah they get I mean the names are called it gets wild I mean yeah when it
comes to it was data and our audience again it becomes this sort of strange
inflection point where if it’s disproving your thesis it becomes
useless and we find that to be just its rhythm the the the financial space is
riven with this phenomena that well no I don’t like that data so let’s discard it
and I mean it’s always you’ve done with us something we’ve written about and you
guys came to us with a really good data set was if everyone in Wall Street is so
hunky-dory about this never-ending bull market why are all the investment banks
firing everybody why you know why are why is Goldman Sachs deciding the
trading is no longer a business and you can see that in the people they’re
getting rid of I mean you can literally see you guys had it they’re getting
lighter and so our argument or question to our audience was if you guys like
trading so much why is no one being paid to do it and those stories are met with
you know a lot of yin and yang and that becomes just a become sort of a self
reflection yelling in the mirror thing which is fun to write about but again
the torque is Brandon Huber well we’ll talk about how we how we defend the data
and this is why we called this session why your data sucks
because in many cases when we’re reporting on objective data people will
come to us and say your data sucks and we’ll say well here it is it’s real so
whether it sucks or not you think it sucks because you don’t like it which is
your point and then way down there about a quarter of a mile away is Ben Gilbert
he joins us from Business Insider he leads tech news he covers as a senior
correspondent he covers video games technology food and culture previously
at Philly Weekly Joystick and Engadget also one of the founding members of Tech
Insider so welcome Ben our IP the tech insider throw that out there
it was but I feel like it’s still there in spirit
yeah via brand yeah brands the friend you’re there so it lives through you
will tell us about what what do you what are you covering on a daily basis and
we’ll talk about that story we did together both echo some of what my
colleagues here have said a lot of Facebook a lot of WeWork my focus is
more general tech news so a lot of it is figuring out how to turn somewhat arcane
stories into something readable for a general consumer sometimes its investors
sometimes it’s Wall Street but a lot of times it’s try to hit that SEO crowd
people coming in through Google coming at their Facebook trying to make
something boring and arcane into something interesting and broad so my
use of numbers across the last year has been a little bit more traditional maybe
it’s just peeling stories off of the WeWork S-1 that kind of showed the
writing on the wall this morning first thing in the morning was going through a
Google executives stock sales to determine how much money he had gotten
in in granted stock units in the last year but figuring out how to turn a
story about somebody who’s a kind of controversial figure with an alphabet
Google into using that that extremely boring financial filing document turned
into something about this sort of controversial figure you’re getting a
big payday right so it’s it’s stuff like that it’s it’s a little bit less pulling
API’s or trying to read the market and and more about trying to present
something kind of complex and boring to a large audience
well you tapped into something just a second ago when you said you know taking
data that is on its surface boring and and probably not for most of the people
in this room but for a lot of people you know staring at a spreadsheet full of
data or a database full of data is on its surface boring and what a lot of us
do is we we either are handed a bunch of data and we have to make sense out of it
but at the end of the day what we have to do is turn that into a story you know
we have to we have to do some storytelling you and it has to be
truthful so it’s a good segue into this which is what I want to get to is what
what data what makes when you’re reporting what makes data you know
interesting what’s that sort of moment and I know it when I feel it but it’s
really hard to describe when you know that you’re looking at a bit of data a
change in in you know market or something in a header file or something
like that but what is it about data when you know that there’s a story in this
when you know that there’s something that people didn’t already know and this
anyone I think for the WeWork S-1 is like at this great example of something that’s
written it’s so boring it’s so bland it was like a nightmare to read and it’s
something that’s assigned as like a the building is on fire everybody figure out
stories from this right now and you’re reading through this soup it’s
intentionally written to be not very interesting and to not stand out but
it’s like talking about how the company has loaned the CDO however many millions
of dollars so he can buy houses without having to pay you know a bunch of
interest on them or whatever else and trying to but it stands out when you
when you actually stop and look at a relatively boring sentence about that
loan and think like oh well a company lent their CEO just many millions of
dollars so he could buy something that he really didn’t need and he’s already a
billionaire in ball wah and then you start thinking about that story right
the story is that the CEO bought this thing they didn’t need it speaks to
needless wealth it speaks to all these other things and it kind of stands out
but on its face the sentence is banal I would quibble with the boringness of
the WeWork S-1 I think it might be one of the greatest
pieces of unintentional financial comedy ever written I think it was because I
think if we’re talking about data are talking about what’s important I mean we
had for two to three years for maybe since the Etsy IPO I think we’ve had
this amazing line of IPOs where this company is going public it’s telling
investors we don’t make a profit we in fact we lose an exorbitant amount of
money and we can’t tell you when that’s gonna stop however we’d love your public
market money and Etsy did it, Blue Apron did it to a Uber, Lyft and Uber and this
kept going on and on there is ones were boring and let cease was the most
interesting because it was like we’re gonna you know I mean not to be glib but
it was a lot of like we’re gonna give beads to children at its whele for free
and you’re gonna love it and the public workers like that’s adorable but can you
sell the beads because we like profits work you know where the public markets
so these things were boring documents on purpose to sort of hide this not to push
that forward the idea that we lose money just sort of let it hang there as a
premise the WeWork S-1 was stunning I think it’s one it’s a beautiful document
because it is not only not vague about it
it is bragging about how much money they lose and this might never end but don’t
worry cuz we’re amazing like we are such a transformative
beautiful company that we’re gonna lose maybe more billions going down the line
and we’re just gonna keep asking for more and it’s gonna be tremendous and I
think that was sort of when Wall Street was like it was part of my friends Jesus
Christ what what have we done it was literally like you know I think it’s I
think most parents have this experience they look down like oh my god I’ve
allowed this to become a reality I have to stop this and it was that was what
that was the we work as one it was oh no and I think that’s when the whole thing
sort of fell apart and I think that was a data story because finally we were
seeing what we’d seen other people who watched WeWork were like when they go
public this is gonna be wild and then they put it in the S-1 it was just their
data right there this there’s no money here and it was fascinating well I bet I
knew we had a story that day so we all deal with this on a daily basis as
reporters where we we see something in the data that’s true you know it’s
either it’s in the S-1 or it’s in you know the the type of hiring
that they’re doing or Ben and you know where they’ve put their stores we
did that story together about GameStop and it’s sure it’s out there it’s true
but the markets either don’t care or they ignore it or it to your point it’s
so there’s so much momentum that that that it just it keeps moving despite all
of that how do we I guess the question to you guys and maybe I’m just sitting
up here asking for advice from four brilliant people from my own purpose but
how do we how do we write around that how do we craft stories where we’re
writing about data that is true that is potentially troubling or is very serious
or is simply against everything that everyone is doing and how do we craft it
out without without becoming you with without going into clickbait territory I
can I can kind of speak to this so like for example I well first of all I I
completely agree it does feel kind of like a Sisyphean
don’t even know how to pronounce it but it does feel it kind of like you are
rolling a boulder up a hill and it only fertility kind of gotta come and crash
back down on top of you so like for example like I write about things like
ad fraud a lot ad fraud multi-billion dollar problem it’s not great but the
thing is when you’re writing about something like this like you’re writing
with the knowledge that it’s going to keep happening and it’s just going and
like even if you kind of crack down on one spam ring or you kind of like break
something up it’s just gonna move somewhere else most of the time because
it’s just too dang profitable to really stock so for the most part like in order
to kind of like make things meaningful for readers I need to be like hey this
is a big problem and it’s getting worse and I need any kind of like needed like
to tie it back to them in some way like say like here’s how you guys are losing
out because of this multi-billion dollar problem here’s why you should care
here’s the scale of what this problem is and when you’re writing about major tech
companies or major kind of like issues in tech that are multi-billion dollar
problems generally scale is enough to get people to kind of feel like
isn’t something that I thought about and and and how do we I guess exemplify or
show that scale I mean it’s it’s hard to you know you’re not going to read back
you know the S-1 an audience you’re gonna you’re going to read it for them
and you’re going to interpret it for them and I’m not going to list every
single person that Apple is hiring when I tell them that they’re hiring fifteen
hundred engineers because it’s part of a story but they’re there perhaps there
needs to be some way for us to say like you know here it is if if you want to
check it how do you guys do that in your reporting hyperlinks I mean there is and
I swear I’ll stop talking in a second but like visual journalism is really
great for this showing people graphs to kind of like illustrate like how things
kind of like change over time showing things like like a like a pie chart or
something like that but at least when I’m writing things I always say here’s
the documents if you want to look at them here’s the filing or if you’re
writing about like for example I think it was ProPublica recent I wrote a piece
about how into it was hiding some sort of like free tax filing thing from
google search results and they basically said hey if you want to see this page
here you go here’s the URL so being able to include either when you’re like a
tech reporter or you’re writing something that touches on tech being
able to link to a legal document including some sort of a including some
sort of a link to an API and being able to point readers to that and saying like
if you really want to you can check this yourself yeah I would agree with all
that I would also add that like one of the strategies is the journal really
taking is that there’s like a lot of different ways that people tack their
tackle their trust in data and storytelling and so thinking a lot about
like different doing different video explainers doing podcasts interactives
really trying to reach people in the way that resonates with them so that we can
show that not only like here’s the why behind it and here’s the decisions I
went into it but here’s also how we got the here’s our source of that data if
you really want to dive deeper I think it’s harder to do
you’re telling a more coherent story in a way to shock people I mean I want to
fit anyone from CNBC was here but I think CNBC does a great job of making
everything look dramatic which I have nothing but respect for how they do it
but even they’ll take a 2-day chart that looks like this and be like this is
massive but then I move my Dealbreaker we like to have fun with those stories
and we’ll put luck up like a six-month chart like it’s just a flat line if you
really if you peel back so let’s not get too carried away but like that’s it that
is the way to do it I mean if you really like that is that is a sort of a parlor
trick but it works for people I mean you watch you can watch the market move on a
CNBC blasting out this vertiginous chart for the last hour and a half and people
like freak out and it’s amazing and they do a great job with that I think
Bloomberg has learned a little bit of that too they they they’re delving into
that which is fun to watch but uh there’s that at a pulling back out we
cold water at more than we engaged in it but that is a way that it works for
people I mean people really respond to these sort of small grabs that look
really dramatic do you think that that that and sure you have to especially in this environment you have to show that this data or this information
that you have is impactful and is very exciting in certain times you need to do
that with you know very exciting Chiron and so some graphics and some sound
effects and that kind of the word turmoil as much as possible yeah but but
doesn’t it doesn’t it also have to be into the CNBC example is a good one
because all they’re really using is just the the stock market moving right it’s
it’s verifiable its objective data it’s not like they’re there they’re pulling
new information that we didn’t already have but it’s the way that they’ve that
they’ve presented it but but I wanted to back away from that a little bit and
maybe get get get your take on this both both Ross and Shoshana because you are
you’re sort of in that first party of actually finding data whether
it’s looking at you know trends and showing your reporters what’s working
and what isn’t and in your case if you’re like oh my god I found something
but you have to be able to you have to be able to verify
right and what makes data verifiable I guess what makes data that we report on
objective like at what point do we know that is that its objective and
verifiable yeah that’s a really challenging question for me I would say
like for a lot of pieces you know our journalists really rely on finding
trusted sources that they can actually do not only trusted sourcing from that
entity but then know people who are intimately familiar with the collection
process the details within know if there’s like things that are being
omitted that maybe aren’t necessarily clear so that’s one vector another
vector that we’re heavily investing in is actually getting the data ourselves
and knowing that full end-to-end process so a really good example is we just
recently built a like bot farm that is tackling all the search engines across
the internet because we have a suspicion that there’s some interesting
differences that could be really good journalism that we want to deliver on
and we’ve actually I can’t say too much but there should be some interesting
pieces coming out on some of the major search providers I might have to follow
up with well actually actually kind of kind of conduct at that point like
there’s it’s it cannot be stressed enough how valuable it is to be able to
like grab data yourself so like something that like I’ll do is I can
like decompile an app on your phone to see what kind of ad networks is
sending data to what kind of data is being sent out for things like privacy
stories because often what will happen is like I’ll read coverage of things
like oh it’s like this app is sending all of your data to Facebook it’s doing
that it’s doing this and believe like I am one of Facebook’s fit biggest critics
but a lot of the time it gets very basic facts wrong so what I so what I’ve
started doing instead is like I will like double check and I’ll like pull the
apps myself and be like okay what is being sent out what what is happening
here and with a little bit of basic code you can generally find out like okay
there isn’t actually that much data being shared to this actual behemoth of
a company that’s it for me so well speaking of verifiable Ben I wanted to
talk to you about a story that that you actually
picked up from us a month or two ago because you’re covering the the video
game industry pretty closely and we found some what we didn’t we
we found some data or recovering some data around GameStop the retail
video game store and we did a proximity analysis where game shop was
in trouble people are questioning the future of physical sales of video games
and that kind of thing and I don’t think the question what yeah it’s not a
question well it’s over yeah and we but we saw in the data all this is we did
this proximity analysis and showed how close they they were to one another and
like well they can afford to to close all these stores oh yes and they better
hurry yeah I mean they have huge problems where they spent years buying a
bunch of other companies have a ton of duplication and markets where they have
way too many stores that are way too close to each other and the market is
disappearing for physical old media in the same way it disappeared for CDs and
DVDs before it’s happening with video games just more slowly so New York is a
perfect example where there’s just a there’s like 25 game stops or some
ridiculous number across the five boroughs and a lot of them are very
close to each other and their leases are thankfully running out and they’re
letting go of a lot of that I mean they’re barely treading water right now
they’re their stock is in the gutter they’re trying to just survive until the
next set of consoles comes out and hopefully that’ll keep them afloat for a
little bit at least there that’s what they’re betting but they’re a great
example though of a company where you can see a lot of their data out there
and see why they’re failing right whether it’s their stores being too
close together their bottom line just being a bad in a bad place the fact that
they’re in a market where you can see you can track the sales of physical
media going down right like you can across the board they can talk all the
game they want and they continue to try doing that on their quarterly investment
calls but they’re treading water until the inevitability when they go away just
like Blockbuster and Tower Records and whatever else before it yeah
from a reporting standpoint from a business reporting standpoint retail has you know everyone talks about the retail apocalypse and that convenient and
traditionally it was very hard to see what a business was doing in a lot of
ways unless you were literally going behind the store and counting the
receipts but now that they’re there transacting online within there they’re
publishing their sales ranks and you can see things declining or that kind of
thing or we see the number of stores expanding or contracting over time it
tells it tells totally different and and very real objective stories about about
how these companies are doing from an economic standpoint Amazon has probably
been this huge huge benefit to retail reporters everywhere right so much of
their data is public so much of it is searchable just as a measure of having
an online store I mean you can see what’s the most popular item in every
category for the biggest retailer in the world like that’s huge yeah and some of
the people who are selling that stuff hate it doesn’t allow them to spin
things so we’ve quickly run out of time because this is what happens when you
put a room full of writers in the same space we will talk forever so thank you
very much for your insights and your brilliance guys