python


ElasticSearch throwing mapper parsing exception when indexing JSON array of integers and strings


I am attempting to use python to pull a JSON array from a file and input it into ElasticSearch. The array looks as follows:
{"name": [["string1", 1, "string2"],["string3", 2, "string4"], ... (variable length) ... ["string n-1", 3, "string n"]]}
ElasticSearch throws a TransportError(400, mapper_parsing_exception, failed to parse) when attempting to index the array. I discovered that ElasticSearch sometimes throws the same error whenever I try to feed it a string with both strings and integers. So, for example, the following will sometimes crash and sometimes succeed:
import json
from elasticsearch import Elasticsearch
es = Elasticsearch()
test = json.loads('{"test": ["a", 1, "b"]}')
print test
es.index(index, body=test)
This code is everything I could safely comment out without breaking the program. I put the JSON in the program instead of having it read from a file. The actual strings I'm inputting are quite long (or else I would just post them) and will always crash the program. Changing the JSON to "test": ["a"] will cause it to work. The current setup crashes if it last crashed, or works if it last worked. What is going on? Will some sort of mapping setup fix this? I haven't figured out how to set a map with variable array length. I'd prefer to take advantage of the schema-less input but I'll take whatever works.
It is possible you are running into type conflicts with your mapping. Since you have expressed a desire to stay "schema-less", I am assuming you have not explicitly provided a mapping for your index. That works fine, just recognize that the first document you index will determine the schema for your index. Each document you index afterwards that has the same fields (by name), those fields must conform to the same type as the first document.
Elasticsearch has no issues with arrays of values. In fact, under the hood it treats all values as arrays (with one or more entries). What is slightly concerning is the example array you chose, which mixes string and numeric types. Since each value in your array gets mapped to the field named "test", and that field may only have one type, if the first value of the first document ES processes is numeric, it will likely assign that field as a long type. Then, future documents that contain a string that does not parse nicely into a number, will cause an exception in Elasticsearch.
Have a look at the documentation on Dynamic Mapping.
It can be nice to go schema-less, but in your scenario you may have more success by explicitly declaring a mapping on your index for at least some of the fields in your documents. If you plan to index arrays full of mixed datatypes, you are better off declaring that field as string type.

Related Links

Predicting Posterior for New Data in Bayesian Linear Regression Using PyMC3
http.client get a 404 error
importerror cannot import name candlestick
Python matplotlib uneven spacing y
The parameters are not updated using multi-gpu training
Python comparing integers and using if
pandas dataframe reshape / pivot
Avoiding “Too broad exception clause” warning in PyCharm
Built-in data types of python
How does one install librets on Mac OSX Yosemite?
Unhashable type error trying to pass list to function in python
Similarity Measure/Matrix for data (recommender system)- Python
How can you add movement to shapes in pygame?
Extract integer from list using python
Having trouble using “in” function to check for containment of one array in another
Getting next two indexes regardless of current index

Categories

HOME
spring
google-apps-script
symfony
puzzle
angular-cli
bower
google-tag-manager
formal-verification
amazon-product-api
apollo
rebol
apiconnect
data-science-experience
subquery
facebook-javascript-sdk
tee
webmethods
jacoco
ejs
reduction
scenebuilder
tree-traversal
google-people
atlassian-plugin-sdk
device-detection
ios10.3
tortoisegit
cvs2svn
ghost-inspector
explode
azure-servicebus-queues
bus-error
metadata-extractor
version-numbering
partial-application
jboss5.x
spring-profiles
oracle-xml-db
geopy
mapguide
css-counter
xen
nlb
forecasting
readfile
s3cmd
parent
slot
openh264
butterknife
info
iphone-developer-program
emoticons
lotus
festival
barcode-printing
gulp-typescript
windows-azure-pack
spoofing
listadapter
achievements
picking
logcat
search-regex
aerogear
sun-codemodel
git-rebase
license-key
anythingslider
hateoas
affix
textkit
subresource-integrity
xjc
pyopengl
cpu-cores
robocode
sysfs
jai
named-ranges
gyroscope-framework
fragment-tab-host
formvalidation-plugin
csplit
code-testing
alpha-transparency
korma
alertifyjs
jstack
reserved-words
java.util.date
modeshape
installshield-2011
sql-server-2012-web
dynamic-binding
grunt-contrib-compass
qtgui
fieldset
code-conversion
returnurl
web-safe-fonts
moq-3
couchdb-lucene
xmlslurper
nintendo
querystringparameter
datacontract
ets
email-spec
out-of-browser
getresource
hibernate3-maven-plugin
windows-controls
system-codedom-compiler
handheld

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App