Transforming the Bible into a Dataframe

Introduction to Transforming the Bible

In this first post of a series of posts about our Bible NLP Analysis, we will be transforming the Bible into a dataframe. Part 1 will focus on the ingestion of the file into a broadly usable Pandas dataframe.

I utilized a .txt file of the King James Bible from the website sacred-texts.com. Their formatting of the file enabled for much easier ingestion of the text in a logical way than other sources I have found.

Python & Bible NLP

My language of choice for NLP is Python. Python has many excellent modules to aid in various object-oriented tasks, NLP included. Pandas is one such module in the data science realm. One can find links to many of the resources I utilize here on the Better Biblos resources page. In addition to many other functions, Pandas allows for easy ingestion of the .txt file using:

!pip install pandas
import pandas as pd
bible = pd.read_csv(r'...\kjvdat.txt')
bible.head()

Inspecting the object using bible.head() we observe the following tabular structure:

01
Gen|1|1| In the beginning God created the heav..NaN
Gen|1|2| And the earth was without form, and v…NaN
Gen|1|3| And God said, Let there be light: and…NaN
Gen|1|4| And God saw the light, that it was go…NaN
Gen|1|5| And God called the light Day, and the…NaN

Not too bad, but we can utilize the logical structure of the Bible into books, chapters, and verses to create a more accessible object. In particular we will focus on the pipe (“|”) delimiter character nicely included in the file, and then force name the automatically generated columns from the .split command in line 1 below:

bible = bible[0].str.split('|', n=-1, expand=True)
bible.columns = ['book','chapter','verse_number','verse']
bible.head()
bookchapterverse_numberverse
Gen11In the beginning God created the heaven and t…
Gen12And the earth was without form, and void; and…
Gen13And God said, Let there be light: and there w…
Gen14And God saw the light, that it was good: and …
Gen15And God called the light Day, and the darknes…

We can now very easily query subsets of data by books, chapters, and verses.

Leave a Reply

Your email address will not be published. Required fields are marked *