python


ElasticSearch throwing mapper parsing exception when indexing JSON array of integers and strings


I am attempting to use python to pull a JSON array from a file and input it into ElasticSearch. The array looks as follows:
{"name": [["string1", 1, "string2"],["string3", 2, "string4"], ... (variable length) ... ["string n-1", 3, "string n"]]}
ElasticSearch throws a TransportError(400, mapper_parsing_exception, failed to parse) when attempting to index the array. I discovered that ElasticSearch sometimes throws the same error whenever I try to feed it a string with both strings and integers. So, for example, the following will sometimes crash and sometimes succeed:
import json
from elasticsearch import Elasticsearch
es = Elasticsearch()
test = json.loads('{"test": ["a", 1, "b"]}')
print test
es.index(index, body=test)
This code is everything I could safely comment out without breaking the program. I put the JSON in the program instead of having it read from a file. The actual strings I'm inputting are quite long (or else I would just post them) and will always crash the program. Changing the JSON to "test": ["a"] will cause it to work. The current setup crashes if it last crashed, or works if it last worked. What is going on? Will some sort of mapping setup fix this? I haven't figured out how to set a map with variable array length. I'd prefer to take advantage of the schema-less input but I'll take whatever works.
It is possible you are running into type conflicts with your mapping. Since you have expressed a desire to stay "schema-less", I am assuming you have not explicitly provided a mapping for your index. That works fine, just recognize that the first document you index will determine the schema for your index. Each document you index afterwards that has the same fields (by name), those fields must conform to the same type as the first document.
Elasticsearch has no issues with arrays of values. In fact, under the hood it treats all values as arrays (with one or more entries). What is slightly concerning is the example array you chose, which mixes string and numeric types. Since each value in your array gets mapped to the field named "test", and that field may only have one type, if the first value of the first document ES processes is numeric, it will likely assign that field as a long type. Then, future documents that contain a string that does not parse nicely into a number, will cause an exception in Elasticsearch.
Have a look at the documentation on Dynamic Mapping.
It can be nice to go schema-less, but in your scenario you may have more success by explicitly declaring a mapping on your index for at least some of the fields in your documents. If you plan to index arrays full of mixed datatypes, you are better off declaring that field as string type.

Related Links

turbogears request/user object in templates and request context
Django date filter to output “am” or “A.M.”
Schedule Python Script - Windows 7
slicing arrays in numpy/scipy
Recommendations for a simple 2D graphics python library that can output to screen and pdf?
Reading numpy arrays outside of Python
Return value from thread
Is this control structure a code smell?
Django: How do I validate unique_together from within the model
Munging non-printable characters to dots using string.translate()
Error when using astWCS trying to create WCS object
if there any better way to read bb function souce code.i was very faint
PyQt4 signals and slots
Sql Alchemy What is wrong?
How do I plot a graph in Python?
Reordering matrix elements to reflect column and row clustering in naiive python

Categories

HOME
json
puzzle
fparsec
puppet
wxwidgets
deep-linking
playframework
propertygrid
defragmentation
tizen-wearable-sdk
jpanel
schemacrawler
triggers
codeblocks
ipfs
lotus-notes
google-shopping
xul
wysiwyg
ctypes
future
react-leaflet
uisplitviewcontroller
reduction
ng-show
jogl
zoho
ios10.3
nmf
npm-install
logarithm
non-deterministic
maximo
dss
configure
entity-system
spreedly
modelandview
pace
pmwiki
visualstudio.testtools
parent
infusionsoft
intellij-idea-2016
exiftool
tango
pdftk
ivona
email-parsing
archer
sapui
workflow-foundation-4.5
tcserver
collapsingtoolbarlayout
laravel-query-builder
pdfkit
objloader
master
iostat
cleditor
hspec
mongoose-populate
children
tomcat5
dts
foxit
django-filer
metalsmith
inet
bluesnap
android-cursoradapter
jericho-html-parser
sevenzipsharp
vine
isml
stackframe
es2015
cpan
left-recursion
temp-tables
tcpreplay
dmp
cg
collabnet
java-collections-api
emma
boost-test
level
dvcs
reserved-words
circos
.net-cf-3.5
iconv
android-contextmenu
snapjs
gdataxml
icsharpcode
flexicious
mysql-error-1045
path-manipulation
recordset
visual-c++-2008-express
odbc-sql-server-driver
opcodes
standardized

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App