Diffbot can augment data streams for SO MANY industries/use cases. Within ours we're able to keep track of news mentions on universities (from literally all over the web), and enrich leads for outreach. I'm sure there's a ton more we could be doing with Diffbot. But even with those uses the service has paid for itself many times over. It doesn't take many saved work hours to justify the $299 price tag... Review collected by and hosted on G2.com.
To tap into the full power of Diffbots offerings you do need a technical team member. (But for what service is this not the case?) Basically you can deal with pre-extracted sites (of which there seem to be millions) with the Knowledge Graph and Enhance. If you want to crawl a specific site repeatedly you'll need to at least know hot to make an API call. Review collected by and hosted on G2.com.
Diffbot's Extraction APIs and Crawlbot API provide an incredibly valuable, versatile, and simple to use pipeline for acquiring crucial information from web pages that may not have been visited before. The Analyze API makes it a snap to determine if the page in question is a product page or not, and the wide array of elements that Diffbot returns from most pages is exceptionally useful! Review collected by and hosted on G2.com.
In our space, we tend to cover a large percentage of the e-commerce world, and that takes us to many domains that are either irregular, outdated, or less than perfect in terms of function. We've noticed that for those pages, or ones with domains that have sophisticated/aggressive bot blocking techniques that Diffbot will often fail to provide a result (or at least within a minute or two). This can be problematic for a company like ours that explores tens of thousands of domains each day as it can slow down our discovery pipeline that finds new listings and e-commerce domains. Review collected by and hosted on G2.com.
We needed a content sourcing solution for our product, Tanjo Animated Personas, or TAPs. Tanjo Animated personas are simulated customers that learn and evolve over time. Our personas need to read a continual stream of articles, in order to evolve and function properly. Diffbot gives us an easy way to source that content.
We have been a Diffbot customer for over 5 years, and have used all of their products, including Crawlbot and Knowledge Graph. Before Diffbot, we mainly relied on RSS feeds and custom scrapers to import articles into our system. The results were often inconsistent, with misread or malformed text blocks. It was tedious and unsustainable. Diffbot provided an almost limitless set of sources with high quality data.
Implementing Diffbot has greatly improved scalability, efficiency and quality of feeding internet articles into our platform. They are always willing to work with us if we encounter any issues. They take customer feedback seriously and are willing to hear out suggestions for what features could be improved or added. We appreciate Diffbot’s flexibility to work with us for our needs. Review collected by and hosted on G2.com.
Diffbot has always been open to hearing our suggestions for what could be improved or added to their website. I don't think it would be fair to "dislike" anything since they have taken our feedback seriously in the past and iterated on their platform. If we think things could be better, we let Diffbot know. Review collected by and hosted on G2.com.
Their support team is very helpful. Even without purchasing their support plan to have an SLA, they usually get back within a week and provide thorough responses. Sometimes, they'll even see your API configuration, adjust it for you, and explain how the new setting is better.
I would highly recommend Diffbot for their robust and dependable products, supportive sales and customer support staff, and transparent pricing plans. Even their base plans make it easy for any company or team of any size to test it and determine what their positive ROI looks like. Review collected by and hosted on G2.com.
Documentation could be improved a bit. It can be hard for new users who aren't familiar with HTML and CSS how to apply specific filters and selectors. My recommendation here is to provide templates or additional documentation on best practices for scraping data from popular sources such as Wikipedia.
Another small thing they can improve on is providing better visibility into account usage statistics for accounts with multiple tokens, which are all tied into one parent account. Review collected by and hosted on G2.com.
Diffbot provides a simple, well documented API that allows for mind-boggling web scraping with brain-dead code. By finding what's important on nearly every kind of webpage, Diffbot helped launch my project further than I could have imagined, saving me hours writing code which would have only been able to understand a few websites. Review collected by and hosted on G2.com.
One suggestion for them is, there are probably individuals/small businesses out there that can't afford the plans they offer, that could still get a lot out of Diffbot, so maybe they should consider adding a smaller plan. But as a user I haven't encountered anything to dislike yet- really! Haven't had a single issue using the API and it was really easy to get started with all of their help. Review collected by and hosted on G2.com.
Diffbot is powerful and simple to use. Users from basic to advanced levels of technical expertise can use Diffbot and extract content from the web with ease. Diffbot is highly scale-able because it is so easy to extract content from the web. The pricing is better than other software we have used before. The customer support has been superb. We almost always receive responses from the support team within 24 hours after their submission. The support team works hard to give timely and accurate suggestions and fixes for issues we face. The onboarding process was very smooth. Diffbot provided us with a generous trial amount that really allowed us to evaluate Diffbot and see that it was the right solution for us. The user interface is simple and sleek. Many tasks on Diffbot can be automated making management of hundreds of crawlers or other extraction APIs fairly effortless. Diffbot has been everything we hoped for in web extraction. Review collected by and hosted on G2.com.
Monitoring the success of crawlers is challenging since there are not notifications on whether a crawler has not been delivering for a while or meeting a lot of errors. Review collected by and hosted on G2.com.
We have used Diffbot for several years, their API for text extraction is extremely powerful and accurate. It has become an important part of our data processing pipeline. Their API(s) allow us to convert unstructured HTML data into information we can ingest and store.
Their support is also very responsive and has always provide us with value answers and feedback when needed. Review collected by and hosted on G2.com.
They also provide with a web interface to define custom rules, that functionality has also proved very useful, however its UI can be not very intuitive sometimes. Review collected by and hosted on G2.com.
1) Enrichment data
2) Ability to query data in aggregate Review collected by and hosted on G2.com.
1) Being charged based on entities
2) Being charged as we go (I wish there was a way to limit my queries) Review collected by and hosted on G2.com.
We're a happy customer for about 6 years now, and we tend to forget Diffbot is there, since their data flows seaminglessly. Our work depends a lot on data processing, and we don't want to worry about how data sources provide their data, or when change their process along the way. With Diffbot we can really focus on processing. Review collected by and hosted on G2.com.
Nothing worth mentioning. The few glitches we had in the past were promptly dealt by their support. Review collected by and hosted on G2.com.
There are mulitple ways to "extract" data with Diffbot. We use the Knowledge Graph (which doesn't really require any knowledge of extraction or web scraping on our end) for exploratory analysis. And for more redudent scrapes the automatic extraction API. Solid documentation and the knowledge graph works right out of the box. Review collected by and hosted on G2.com.
There is a bit of a learning curve with DQL Review collected by and hosted on G2.com.