Paid

Extract and Format ChatGPT Chat History as JSON #JavaScript #MDX

A specific method for simply extracting and formatting data using Chrome Developer Tools and JavaScript. The most major way to save ChatGPT conversation history is the built-in official feature, exporting via the 'Data Controls' tab → 'Export' for backup, but this has problems.

Shou Arisaka
4 min read
Aug 25, 2024

I think the most major way to save ChatGPT conversation history is the built-in official feature: exporting via the “Data Controls” tab → “Export” for backup.

I previously used this method to save conversation history and extract data from the JSON files within, but this has troublesome issues… or inconveniences:

  • It exports all conversation history, so it takes a long time to receive an email with the attached zip file from OpenAI.
  • You can’t do it multiple times a day. The entire flow of export operation, email checking, downloading, extracting, and file checking is redundant each time.
  • Since all conversation history is exported, in some cases you may run out of memory (in Node.js, errors like JavaScript heap out of memory).
  • So of course the processing is heavy and the code gets long. Redundant cognitive resources are consumed by processing.

Ultimately, the main problem is that extracting and processing all conversation history each time instead of just extracting the needed conversation takes time in various aspects (both in terms of computer resources and human resources).

To address this, let’s simply and smartly extract and format data (for example, into an MDX format array) using Chrome Developer Tools and JavaScript - this is the purpose of this article.

Tangent

In the Google Chrome extension store, there appear to be several extensions specialized in ChatGPT export and history saving, such as “Save ChatGPT” (currently seems to be unavailable) and “ChatGPT to Markdown”.

If these meet your needs, the JavaScript extraction method introduced in this article may not be strictly necessary.

Specific Method

Open any page in ChatGPT, then open Chrome Developer Tools there.

The following steps are:

Below is a specific example of JavaScript code. It creates an array of objects in title and content format.

[Shown with subscription]
    .map(a => a?.content.parts).flat().filter(a => a).map(a => a.match(/### (.*)\n\n([\s\S]+)/m)).filter(a => a).map(a => { return { title: a[1], content: a[2] }})

We’ve extracted it cleanly.

(Bonus) Let’s also convert this to MDX format. While we’re at it, let’s save it to files.

// Assuming the above result is stored in variable obj
// the summary is the first paragraph of the content
obj.map(a => `---\ntitle: '${a.title}'\ndate: ${new Date().toISOString().split('T')[0]}\nlastmod: ${new Date().toISOString().split('T')[0]}\ntags: [drawing tools, AI-assisted]\ndraft: false\nsummary: '${a.content.match(/(.+)\n\n/)[1]}'\nimages: []\n---\n\n${a.content}`).forEach((a, i) => {
    // save as a file, on browser
    const blob = new Blob([a], { type: 'text/plain' });
    const url = URL.createObjectURL(blob);
    const aTag = document.createElement('a');
    aTag.href = url;
    aTag.download = `chatgpt-${i}.mdx`;
    aTag.click();
    URL.revokeObjectURL(url);
});

(The above code has not been tested, so it may not work in some cases)

That’s it - a method to extract ChatGPT conversation history as JSON and format it into any shape using JavaScript and Chrome Developer Tools.

Continue Processing in Node.js (Added)

I happened to write a quick Node.js script while writing this article, so I’ll share it.


(async () => {

  const objRaw = fs.readFileSync(`${process.cwd()}/obj.json`, 'utf8');
  const obj = JSON.parse(objRaw);

  const mdx_strings = obj.map((item) => {
    return `---
title: ${item.title}
date: ${new Date().toISOString().split('T')[0]}
lastmod: ${new Date().toISOString().split('T')[0]}
tags: [drawing tools, AI-assisted]
draft: false
summary: ${item.content.match(/(.+)\n\n/)[1]}
images: []
---

${item.content}`
  }
  ).join("\n\n\n\n");

  // split by \n\n\n\n+
  const mdx_array = mdx_strings.split(/\n\n\n\n+/);
//   ...omitted

The tedious part of creating site articles with MDX is having to think about and set filenames (URL slugs).

So recently, I’ve been automating this with the DeepL API.

Please also refer to a recent article I wrote about DeepL: Creating Glossaries with DeepL API to Specify Translation Results #Node.js

Node.js program:

  • Using “deepl-node”: “^1.13.0”
  • Node.js version is v22.5.1, but it should work fine with v16.6.0 or later

Operation of translating titles to English for filenames and saving as mdx format files:

After waiting about 2 minutes for it to finish, I was able to create about 120 mdx files from the title and content object array mentioned earlier.

  • Want to format ChatGPT-generated text (text, CSV, spreadsheets, JSON, HTML, Markdown, etc.)

For example, as shown below, ChatGPT’s generation results are random for each conversation, so they don’t always start with ”##” titles.

We can handle data processing of such non-standard, informal output with JavaScript, Node.js, and Python.

  • Of course, in addition to formatting ChatGPT-generated text, we also accept application development using ChatGPT.

  • As in this article, feel free to consult about work efficiency and automation for ChatGPT and AI tools (Phind, Perplexity, Claude, etc.).

The fee is a flat rate of 3000 yen/30 minutes. (We may not be able to respond during busy periods, so please make advance reservations or consultations.)

Share this article

Shou Arisaka Aug 25, 2024

🔗 Copy Links