Introduction to Transforming the Bible
In this first post of a series of posts about our Bible NLP Analysis, we will be transforming the Bible into a dataframe. Part 1 will focus on the ingestion of the file into a broadly usable Pandas dataframe.
I utilized a .txt file of the King James Bible from the website sacred-texts.com. Their formatting of the file enabled for much easier ingestion of the text in a logical way than other sources I have found.
Python & Bible NLP
My language of choice for NLP is Python. Python has many excellent modules to aid in various object-oriented tasks, NLP included. Pandas is one such module in the data science realm. One can find links to many of the resources I utilize here on the Better Biblos resources page. In addition to many other functions, Pandas allows for easy ingestion of the .txt file using:
!pip install pandas
import pandas as pd
bible = pd.read_csv(r'...\kjvdat.txt')
bible.head()
Inspecting the object using bible.head() we observe the following tabular structure:
0 | 1 |
Gen|1|1| In the beginning God created the heav.. | NaN |
Gen|1|2| And the earth was without form, and v… | NaN |
Gen|1|3| And God said, Let there be light: and… | NaN |
Gen|1|4| And God saw the light, that it was go… | NaN |
Gen|1|5| And God called the light Day, and the… | NaN |
Not too bad, but we can utilize the logical structure of the Bible into books, chapters, and verses to create a more accessible object. In particular we will focus on the pipe (“|”) delimiter character nicely included in the file, and then force name the automatically generated columns from the .split command in line 1 below:
bible = bible[0].str.split('|', n=-1, expand=True)
bible.columns = ['book','chapter','verse_number','verse']
bible.head()
book | chapter | verse_number | verse |
Gen | 1 | 1 | In the beginning God created the heaven and t… |
Gen | 1 | 2 | And the earth was without form, and void; and… |
Gen | 1 | 3 | And God said, Let there be light: and there w… |
Gen | 1 | 4 | And God saw the light, that it was good: and … |
Gen | 1 | 5 | And God called the light Day, and the darknes… |
We can now very easily query subsets of data by books, chapters, and verses.