Transforming the Bible into a Dataframe

Introduction to Transforming the Bible

In this first post of a series of posts about our Bible NLP Analysis, we will be transforming the Bible into a dataframe. Part 1 will focus on the ingestion of the file into a broadly usable Pandas dataframe.

I utilized a .txt file of the King James Bible from the website sacred-texts.com. Their formatting of the file enabled for much easier ingestion of the text in a logical way than other sources I have found.

Python & Bible NLP

My language of choice for NLP is Python. Python has many excellent modules to aid in various object-oriented tasks, NLP included. Pandas is one such module in the data science realm. One can find links to many of the resources I utilize here on the Better Biblos resources page. In addition to many other functions, Pandas allows for easy ingestion of the .txt file using:

!pip install pandas
import pandas as pd
bible = pd.read_csv(r'...\kjvdat.txt')
bible.head()

Inspecting the object using bible.head() we observe the following tabular structure:

01
Gen|1|1| In the beginning God created the heav..NaN
Gen|1|2| And the earth was without form, and v…NaN
Gen|1|3| And God said, Let there be light: and…NaN
Gen|1|4| And God saw the light, that it was go…NaN
Gen|1|5| And God called the light Day, and the…NaN

Not too bad, but we can utilize the logical structure of the Bible into books, chapters, and verses to create a more accessible object. In particular we will focus on the pipe (“|”) delimiter character nicely included in the file, and then force name the automatically generated columns from the .split command in line 1 below:

bible = bible[0].str.split('|', n=-1, expand=True)
bible.columns = ['book','chapter','verse_number','verse']
bible.head()
bookchapterverse_numberverse
Gen11In the beginning God created the heaven and t…
Gen12And the earth was without form, and void; and…
Gen13And God said, Let there be light: and there w…
Gen14And God saw the light, that it was good: and …
Gen15And God called the light Day, and the darknes…

We can now very easily query subsets of data by books, chapters, and verses.


Discover more from Better Biblos

Subscribe to get the latest posts sent to your email.

Leave a Reply