book

Bible NLP

What does Bible NLP mean?

Bible NLP refers to the application of natural language processing (NLP) techniques upon the Bible as a dataset. NLP is, broadly, the composite field of study between linguistics and computer science concerned with the interaction between language and computer systems. The Bible is a monumentally influential ancient text which serves as the foundation for several major world religions.

Using techniques from natural language processing, we can produce NLP visualizations of the Bible. For this analysis we will be examining the King James (KJV) version. While there may be particular rewards for utilizing these techniques in other domains, this technique becomes interesting quickly when analyzing texts of great meaning, with the Bible chief among texts ever produced for wisdom, lessons, and historical documentation. On this page we will be documenting several adventures into Bible NLP Visualization.

Overview of Bible NLP Analysis

As technical background, we imported a newline separated file of the King James Bible and performed text preprocessing with the string content of each verse on a separate row. To start, we are employing word2vec algorithm for generation of semantic space, but we have many options. One can think of a semantic space as the multidimensional relationship of textual information based on its cooccurrence with other units of textual information.

Exploratory Data Analysis (EDA)

Usually the first step of any data project is to understand better the nature of the data. Prior to transforming the data further, there were a number of experiments we conducted to understand the nature of the data better.

Bible NLP visualization with characters

The below image represents an analysis of character names from the Bible. More specifically, the image below is a two dimensional compression of a multi dimensional semantic space of the KJV English Bible. Distances between points represent semantic difference.

One immediately interesting observation is the closeness of Jonah, Noah and Adam, bringing to mind the commonality of importance of nature as a conduit of God’s power and mission on Earth.

Arguably the closest point to Jesus in this visual is David which makes sense given the frequent mention and importance of the lineage to establish Jesus’ historical and prophetic role as Messiah.

Bible NLP Visualization

Interestingly these observations held true for a second run of the model:

Bible NLP Visualization

Turning our analysis to concepts

We can apply the same model against conceptual terms and see semantic meaning in our observations below.

Prophet, Israel, and Jerusalem form a semantic cluster, signifying many of the events of the nation of Israel that take place in the Old Testament.

Sign, heal, and Messiah appear together. Notably signs are understood to point to God and it is interesting to observe the relative closeness of sign and prophet.

Bread, blood, and wine form a distinct cluster with lamb as a tangential member given its notable role as a vehicle for the significance of the former three.

Interestingly heaven occupies a space its own, with arguably the closest semantic concept sign pointing up to it.

Bible NLP Visualization

We can broader further to collections of content

Bible NLP Visualization

The above viz represents a unique set of words from Genesis Chapter 1 of the King James Bible.

Several notable relationships can be spotted. God is linked physically to the rest of the story via words such as bless, spirit, life, given, sign, and man. Heaven and Earth, though close relatively, stand distinct.

Thankfully the above output is now captured in a function that will allow for easier investigation into the semantic closeness of words in the King James Bible.

Additional Detail on Bible NLP

Transforming the Bible into a dataframe

Generating a Bible Semantic Space