Web Scraper for Blinkist

Web Scraper for Blinkist

https://github.com/jljacoblo/BlinkistScrapper

Blinkist API

Blinkist data can be accessed from https://www.blinkist.com/api/

For example: 

https://www.blinkist.com/api/books/onboarding
https://www.blinkist.com/api/categories
https://www.blinkist.com/api/categories?includes=book_list
https://www.blinkist.com/api/books/trending
https://www.blinkist.com/api/books/everyday-vitality-en
https://www.blinkist.com/api/books/everyday-vitalityen/similar
https://www.blinkist.com/api/books/everyday-vitality-en/chapters
https://www.blinkist.com/api/books/everyday-vitality-en/chapters/5294dd4d396433000c020000
https://www.blinkist.com/api/books/everyday-vitality-en/chapters/5294dd4d396433000c020000/audio

Looking at Blinkist Source code

You can download all pages HTML, scripts, css, node_medules using chrome extension: Save all Resources

https://github.com/up209d/ResourcesSaverExt
https://chrome.google.com/webstore/detail/save-all-resources/abpdnfjocnmdomablahdcfnoggeeiedb?hl=en-US

This is how I decode the “infinite loop to crash developer console”, and their API calls

Browser Developer Console

Handle crashes

Blinkist put infinite loops in their source code, and will only crashes when open develop console. code from their source code

    return Fv().wrap((function(e) {
        for (;;) switch (e.prev = e.next) {
            case 0:
                return n = t.commit, e.prev = 1, e.next = 4, xo.organisations.me();
            case 4:

Option 1: Dont stay in book reading page

Option 2: Use Brave browser

Hacking Steps:

  1. Use Brave browser “Block scripts” feature
  2. go to Binkist main page
  3. open developer console
  4. Disable Block scripts
  5. Wait for a while for CPU to calm down
  6. than you can go to book reading pages

Try to experiment on and off Block scripts features. I believe this is how to get arround the infinite loops

load jQuery

Use tampermonkey to inject jQuery library into the webpage

https://github.com/jljacoblo/ActiveTabWebpageCrawlerSynchronous/tree/main/tampermonkey

Scraping

Run the code inside developer console

Categories

Example API calls: https://www.blinkist.com/api/categories?includes=book_list\

Developer console:

Code not shown, sorry

Output slip categories_book_list.json:

{
    "categories": [
        {
            "id": "5b868435b238e1000726ccba",
            "title": "Career & Success",
            "slug": "career-and-success-en",
            "url": "/en/content/categories/career-and-success-en",
            "priority": 26,
            "sprite": "career-and-success",
            "books_count": 483,
            "books": [
              ...
            ]
        },
        ...

Books

Total of 5500 books’ data

trending

Example API calls: https://www.blinkist.com/api/books/trending\

Developer console:

Code not shown, sorry

Output slip currentBooks.json:

{
   "the-4-hour-workweek-en": {
    "id": "5282267434613800112a0000",
    "kind": "book",
    "slug": "the-4-hour-workweek-en",
    "title": "The 4-Hour Workweek",
    "subtitle": "Escape 9–5, Live Anywhere, and Join the New Rich",
    "subtitleHtmlSafe": "Escape 9–5, Live Anywhere, and Join the New Rich",
    "aboutTheBook": "<p><em>The 4-Hour Workweek </em>(2009) describes ...",
    "buyOnAmazonUrl": "/en/books/the-4-hour-workweek-en/purchase",
    "author": "Tim Ferriss",
    "truncatedAuthor": "Tim Ferriss",
    "sourceAuthor": "Tim Ferriss",
    "url": "https://www.blinkist.com/en/books/the-4-hour-workweek-en",
    "browseUrl": "/en/app/books/the-4-hour-workweek-en",
    "previewUrl": "/en/books/the-4-hour-workweek-en",
    "readUrl": "/en/nc/reader/the-4-hour-workweek-en",
    "playUrl": "/en/nc/reader/the-4-hour-workweek-en?play=1",
    "readingDuration": 28,
    "minutesToRead": 28,
    "publishedAt": "2012-10-16T11:52:22.000+00:00",
    "isAudio": true,
    "readCount": "40.3k",
    "image": {},
      "sources": [
        ...
      ]
    },
    "audioUrl": "",
    "chaptersLength": 12,
    "hasAudio": true,
    "language": "en",
    "freeDaily": null,
    "isFree": false,
    "category": {
      "title": "Money & Investments",
      "sprite": "money-and-investments",
      "slug": "money-and-investments-en"
    },
    "averageRating": 4.3,
    "totalRatings": 1671,
    "categories": [
      ...
    ]
  },
  ...
}

latest

Same as trending, but use api link /api/books/trending

similar

Example API calls: https://www.blinkist.com/api/books/everyday-vitality-en/similar\

Developer console:

Code not shown, sorry

Chapter Summery

Example API calls: https://www.blinkist.com/api/books/everyday-vitality-en/chapters

Needs currentBooks.json

Developer console:

Code not shown, sorry

Output slip curBooksChapters.json:

{
  "the-4-hour-workweek-en": {
    "book": {
      "id": "5282267434613800112a0000",
      "slug": "the-4-hour-workweek-en",
      "title": "The 4-Hour Workweek",
      "author": "Tim Ferriss",
      "time": 28,
      "cover": {
        "default": {
          "src": "https://images.blinkist.io/images/books/5282267434613800112a0000/1_1/470.jpg",
          "srcset": {
            "2x": "https://images.blinkist.io/images/books/5282267434613800112a0000/1_1/640.jpg"
          }
        },
        "sources": [...]
      },
      "freeDaily": false
    },
    "chapters": [
      {
        "id": "5282270c3334640008020000",
        "order_no": 0,
        "action_title": "What’s in it for me? Learn to make time for the important things in your life."
      },
      {
        "id": "528227343334640008040000",
        "order_no": 1,
        "action_title": "For the New Rich, wealth means luxury in the here and now."
      },
      ...
    ],
    "current_chapter_id": null
  },
}

Example API calls: https://www.blinkist.com/api/books/everyday-vitality-en/chapters/5294dd4d396433000c020000

Needs curBookChaptersjjson, currentBooks.json

Developer console:

Code not shown, sorry

Output slip booksAllChapters.json:

{
    "the-4-hour-workweek-en": [
    {
      "id": "5282270c3334640008020000",
      "order_no": 0,
      "action_title": "What’s in it for me? Learn to make time for the important things in your life.",
      "text": "<p>The four-hour workweek. It sounds amazing, right. It sounds like a dream. Instead of working for 40 hours a week, you only ...",
      "audio_url": "https://hls.blinkist.io/bibs/5282267434613800112a0000/5282270c3334640008020000-T1632499808.m4a",
      "signed_audio_url": "https://hls.blinkist.io/bibs/5282267434613800112a0000/5282270c3334640008020000-T1632499808.m4a?Expires=1673380008&Signature=HaaOrmx3vWsagkZL1dwDvWztHt-DrBUp5Q1XTveLk7MQBYy1FHJdewpFDZVIgRrPEjBMRk5EJFR1JQB0SIToHbiHL10ol1U18NRhiXjTq-DLnxDwtIrnhsdvdeTQpHpV2oTTtf6ubAdhMesemHXc5sqOMq5EVeSShr7NgLfCHiwp-S6Y3nrwb5~Y7~7RPYHXXhp0z2eJcoV-XJc5sqN8-3l9l8JzHj-pFiN-uL6PS14ufbW7j6mlyN6vTePQG1xckh9QzhdbaNqUptUKwYNQjcZeGnNDPwS6pNeoPoMUKJ65OQsIbpxD5q5HsvywuHJmu5B8akr7~JrA2U8fJ18Etg__&Key-Pair-Id=APKAJXJM6BB7FFZXUB4A"
    },
    {
      "id": "528227343334640008040000",
      "order_no": 1,
      "action_title": "For the New Rich, wealth means luxury in the here and now.",
      "text": "",
      "audio_url": "",
      "signed_audio_url": ""
    },
    ...
  ],
  ...
}

Download All Book covers

Require currentBooks.json

In downloadAllBookCover.py:

Code not shown, sorry

Download all chapters audio

Download all the Books’ Chapter Blinks audio using in-browser developer console.\

Those audio files’ download link has time limit.\

The current method is to download it using browser’s “save as” dialog.\

We use AutohotKey to automate this process, download each audio file one-by-one.

Api calls

Call:

$.ajax('/api/books/everyday-vitality-en/chapters/617033a56cee0700087aa566/audio', {
  type: 'GET'
  ,success: function (data, status, xhr) {
    console.log(data);
  }
});

Result:

{
    "url": "https://hls.blinkist.io/bibs/617033a36cee0700087aa564/617033a56cee0700087aa566-T1634743314.m4a?Expires=1673360760&Signature=XIEsN3WN-WrBz9tWEENgiY9gzG6dqB8dz-vvSuL1sHLIKDkjCloUMvzpUeJKEXsrZfN7rbQV9nw6tYLFXXNh-s5dNOHg0Fsv2PUUd6Mck9pg-OjbSEbtHtSmzrg2PFqxEqRC1jL8eNBVHUDmP4NH0-5uhVJXaTF73CNjR7ritgo98Z4l-y3~wnLtuq1aXQIua224sdBVwzDM9wOx9AUXddz7CGjvlx4R7W9ShRgHe2TzJQS0x~TxDNzHHnigyvs8FSD3EaT9UjyAXz8D7Bt7aX0FU6zHr4RPRNMPb0X2VyDMhVZCRVOKtVPKEAKQ0NDUDZLPkoYye~NN4ZAab1QsZA__&Key-Pair-Id=APKAJXJM6BB7FFZXUB4A"
}

step 1

Create AutoHotkey script downloadAudio.ahk.
It basically download a herf tag by right-click “Save as” dialog inside the browser

step 2

Convert downloadAudio.ahk to downloadAudio.exe.

step 3

Create a custom URL scheme. So inside browser we can run that exe by calling autohotkey:// Something like this:
https://dev.to/pybash/making-a-custom-protocol-handler-and-uri-scheme-part-3-3fji

Basically, you can add url_scheme.reg into your Windows Registry.

step 4

Now, inside chrome developer console.\

Require curBookChapters.json In pullEachAudioRecursive.js:

Code not shown, sorry

Convert to Anki note cards

jsonToAnki.py can convert all the books’ blinks into Anki note card.
Each Anki note card is a chapter of a book.
One book is one Anki Deck, which is grouped by category as sub-decks.\

Require currentBooks.json, curBookChapters.json, booksAllChapters.json

In jsonToAnki.py:

convertBlinkistJson2Anki('categories','currentBooks', 'booksAllChapters')

Output File: Blinkist.apkg