Skip to main content

Getting Transcripts for an Individual Episode

Note: Getting transcripts requires our plus or pro plans.

Podchaser generates or collects transcripts for new episodes on the top 5,000 podcasts. Additionally, Podchaser collects some transcripts that are discovered in RSS feeds of podcasts. You can use our API to retrieve these transcripts. The transcripts are available within the Episode object which is returned in a variety of places including within a podcast object or when searching using the episodes query.

If you haven't already, you will need to get your authorization token which you can get by following our "Your First Podchaser Query" guide.

Available Transcript Formats

raw_JSON

This is a JSON file that contains a JSON array of objects where each object is up to a few words.

Example content (shortened here for brevity):

[
{"speaker": "1 - voice", "timestamp": ["00:00:01.480", "00:00:01.840"], "utterance": "Support"},
{"speaker": "1 - voice", "timestamp": ["00:00:01.840", "00:00:02.080"], "utterance": "for"},
{"speaker": "1 - voice", "timestamp": ["00:00:02.080", "00:00:02.440"], "utterance": "pivot"},
{"speaker": "1 - voice", "timestamp": ["00:00:02.440", "00:00:02.720"], "utterance": "comes"},
{"speaker": "1 - voice", "timestamp": ["00:00:02.720", "00:00:03.000"], "utterance": "from"},
{"speaker": "1 - voice", "timestamp": ["00:00:03.000", "00:00:03.500"], "utterance": "NerdWallet."},
... the rest of the transcript ...
]

beautified_JSON

This is a JSON file that contains a JSON array of objects where each object is around 2 seconds and includes metadata about the episode.

Example content (shortened here for brevity):

{
"metadata" : {
"podchaser_episode_id": 163052393,
"podchaser_podcast_id": 731600,
"podchaser_episode_url": "https://www.podchaser.com/podcasts/731600/episodes/163052393",
"podcast_title": "Pivot",
"episode_title": "AI Blunders, The GOP Cries Censorship, and Adam Neumann\u2019s Comeback",
"episode_air_date": "2023-02-10 11:00:00"
},
"utterances": [
{
"speaker": "1 - voice",
"timestamp": ["00:00:01.470", "00:00:04.720"],
"utterance": "Support for pivot comes from NerdWallet. If you've ever"
},
{
"speaker": "1 - voice",
"timestamp": ["00:00:04.720", "00:00:07.760"],
"utterance": "tried searching for a great cashback credit card, you know"
},
{
"speaker": "1 - voice",
"timestamp": ["00:00:07.760", "00:00:10.990"],
"utterance": "how paralyzing that hunt can be. NerdWallet can help"
}
... the rest of the transcript ...
]
}

Important Note: The transcripts field on the Episode object is an unordered list. As we add new formats and types, it is not guaranteed a specific format you want to use is in the same place in the list so you should look for the transcriptType field on the transcript to find the one you are planning on using. Also, due to the variety of transcript data podcasters might have, not all episodes that have transcripts will have all formats.

Retrieving Episode Transcripts Within Podcast Query

In order to get an individual podcast, you can use our podcast query. The podcast query accepts an identifier which can be:

  1. Apple Podcasts ID
  2. Spotify ID
  3. RSS feed URL
  4. Podchaser ID

You can see review the documention for how to structure the identifier and the accepted type values.

We have a variety of fields that can be returned by this query so we encourage you to view the podcast object documentation.

For example, a basic podcast query would look like this:

POST https://api.podchaser.com/graphql

Authorization: Bearer YOURTOKEN
query {
podcast(identifier: {id: "1073226719", type: APPLE_PODCASTS}) {
title,
description
}
}

One of the fields available is for getting the podcast's episodes. The returned Episode object contains a variety of fields, including a transcripts field. The transcripts field will return a list of the available transcript formats we have for the episode.

You can learn more about these objects in our Episode object documentation and our Transcript object documentation.

A sample query to get a podcast's recent few episodes along with their transcripts (if we have any) would look like this:

POST https://api.podchaser.com/graphql

Authorization: Bearer YOURTOKEN
query {
podcast(identifier: {id: "1073226719", type: APPLE_PODCASTS}) {
title,
description,
episodes(first: 5, sort: {sortBy: AIR_DATE, direction: DESCENDING}) {
data {
title,
description,
transcripts {
url,
source,
transcriptType
}
}
}
}
}

This query will have this reponse (though shortened here for brevity):

{
"data": {
"podcast": {
"title": "Pivot",
"description": "Every Tuesday and Friday, tech journalist Kara Swisher and NYU Professor Scott Galloway offer sharp, unfiltered insights into the biggest stories in tech, business, and politics. They make bold predictions, pick winners and losers, and bicker and banter like no one else. After all, with great power comes great scrutiny. From New York Magazine and the Vox Media Podcast Network.",
"episodes": {
"data": [
{
"title": "AI Blunders, The GOP Cries Censorship, and Adam Neumann’s Comeback",
"transcripts": [
{
"url": "https://transcripts-cleaned.s3.amazonaws.com/163052393/163052393-transcript.json?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIPDEVVMPSLJMI3KA%2F20230215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230215T165257Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Signature=ec6277d560cd933023e0077cb5a8612065b30f680e6b5e3a9125a2c22c12bb66",
"source": "podchaser",
"transcriptType": "raw_JSON"
},
{
"url": "https://transcripts-cleaned.s3.amazonaws.com/163052393/beautified_JSON/transcript.json?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIPDEVVMPSLJMI3KA%2F20230316%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230316T215136Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Signature=6b9672d6ccf6ca59d92ea8b5599b3521d8f6bcc921bbe1f994c7c2f3531343b4",
"source": "podchaser",
"transcriptType": "beautified_JSON"
}
]
},
{
"title": "Balloon-Gate, The AI Arms Race, and Guest Jerry Saltz",
"transcripts": [
{
"url": "https://transcripts-cleaned.s3.amazonaws.com/162758832/162758832-transcript.json?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIPDEVVMPSLJMI3KA%2F20230215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230215T165257Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Signature=244a5e593992ee7c32b62f5648ab4eea774487a1e69297a15ee03301d53a9fad",
"source": "podchaser",
"transcriptType": "raw_JSON"
},
{
"url": "https://transcripts-cleaned.s3.amazonaws.com/163052393/beautified_JSON/transcript.json?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIPDEVVMPSLJMI3KA%2F20230316%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230316T215136Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Signature=6b9672d6ccf6ca59d92ea8b5599b3521d8f6bcc921bbe1f994c7c2f3531343b4",
"source": "podchaser",
"transcriptType": "beautified_JSON"
}
]
},
{
"title": "Sick Day! Tinder Changed the Game - from Land of the Giants: Dating Games",
"transcripts": [
{
"url": "https://transcripts-cleaned.s3.amazonaws.com/162184436/162184436-transcript.json?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIPDEVVMPSLJMI3KA%2F20230215%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230215T165257Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Signature=38a50585acb6bda173b9481a638d99d0e09e0182fcd2401044fe64a28529c6ab",
"source": "podchaser",
"transcriptType": "raw_JSON"
},
{
"url": "https://transcripts-cleaned.s3.amazonaws.com/163052393/beautified_JSON/transcript.json?X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIPDEVVMPSLJMI3KA%2F20230316%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230316T215136Z&X-Amz-SignedHeaders=host&X-Amz-Expires=600&X-Amz-Signature=6b9672d6ccf6ca59d92ea8b5599b3521d8f6bcc921bbe1f994c7c2f3531343b4",
"source": "podchaser",
"transcriptType": "beautified_JSON"
}
]
}
...other episodes...
]
}
}
}
}