I’m going to periodically keep adding items to this list. It’s not comprehensive, but it’s pretty good.
I’ve always agreed with Rommel: “The battle is fought and decided by the quartermasters, long before the shooting begins.”
I think this is true of web as well. The internet is a gangster’s paradise, I’ve always believed you should be a mobster to grow your business.
If I was building a company like ClearBit, FullContact, or any one of these other companies that’s trying to get all the world’s contact information, here’s some of the things I would do:
- Hire a skilled backend developer and make sure they know how to scrape/crawl, use rotating IPs, and get around anything thrown their way. You basically want to remain undetectable…obfuscate your existence as a company that is scraping.
- Use commoncrawl.org to find publicly available email addresses in a massive body of text.
- Go to the dark web and buy the full linkedin hacked database of 2012 for $500-$5,000. I don’t know what the current market price is. Talk to your Eastern European or Russian digital marketers, and I’m sure they’ll know how to acquire it.
- Scan twitter every 5 seconds for email addresses formatted. Zapier allows you to do this. Strings like “email me @” etc… will take some complex scripting to do it all, but alas, it is doable. If you need to scale, switch off Zapier to another platform or code it from scratch.
- Issue FOIA requests in all 50 states to find the publicly available Secretary of State data on all businesses/non-profits in the state or scrape it and parse it with other databases.
- Use bing search query API and query tons of @nameofcompany.com iteratively across linkedin. Bing has extreme over-seo-indexing of LinkedIn.
- You do all of the above, and you should be able to build a 1k-20K/month company within 3–12 months.
- Platforms — Look at every marketplace application. For example Doximity. Doximity is illegal to scrape, but I guarantee you that any European or Russian scraping shop has been asked to scrape the site of all MDs and doctors. I won’t go down a list of all the platforms that enable you to do this, but it’s not hard.
- Find every about.me or multi-grouping page of twitter/linkedin profiles.
- Scrape Secretary of State listings in the USA and auto-deduce/ guess at permutations of emails. (firstname.lastname@) etc… Validate against SMTP
- Find companies that are dead, and ask them for their email databases.
- Make an app that deduplicates contact information for consumers and take the data.
- NLP Email Chatbots — I’d make application specific NLP chat bots that ask the price of an item, a good email address to reach someone, just off of craiglslist postings. Plenty of ways to do this at scale. You can get their phone number and email address.
I think there’s going to be some sort of implosion of sales automation/lead marketing companies as the data becomes more publicized during or after hacks. I wouldn’t be surprised if some of these leads companies are buying leads from hackers in other countries.
Buy the data and don’t ask where it came from. If you’re building this type of company, ask yourself “What Would Pablo Do?”
It’s not hard to imagine.
Step 1: A hacker gets into a gmail account and clicks export contacts.
Step 2: Rinse and Repeat
If you have a sales/leads automation company, one of your best bets isn’t to raise, but to opportunistically sell the venture, lest you love extreme competition.
The number of companies in this space is astounding. If you’re going to fund companies in this space, it’s extremely advisable to fund overqualified eastern European developer teams, not overpriced SV teams.
This is commoditized code wrangling, nothing unique here. If the founders can clearly delineate how they will crush, kill, and destroy their competition by scraping at low cost, or selling at scale quickly, then they can win in a monopolized way.
Make no mistake though, this space gets harder as you get further into it, not easier. The network graph’s size of the database only marginally increases its value.