python


ElasticSearch throwing mapper parsing exception when indexing JSON array of integers and strings


I am attempting to use python to pull a JSON array from a file and input it into ElasticSearch. The array looks as follows:
{"name": [["string1", 1, "string2"],["string3", 2, "string4"], ... (variable length) ... ["string n-1", 3, "string n"]]}
ElasticSearch throws a TransportError(400, mapper_parsing_exception, failed to parse) when attempting to index the array. I discovered that ElasticSearch sometimes throws the same error whenever I try to feed it a string with both strings and integers. So, for example, the following will sometimes crash and sometimes succeed:
import json
from elasticsearch import Elasticsearch
es = Elasticsearch()
test = json.loads('{"test": ["a", 1, "b"]}')
print test
es.index(index, body=test)
This code is everything I could safely comment out without breaking the program. I put the JSON in the program instead of having it read from a file. The actual strings I'm inputting are quite long (or else I would just post them) and will always crash the program. Changing the JSON to "test": ["a"] will cause it to work. The current setup crashes if it last crashed, or works if it last worked. What is going on? Will some sort of mapping setup fix this? I haven't figured out how to set a map with variable array length. I'd prefer to take advantage of the schema-less input but I'll take whatever works.
It is possible you are running into type conflicts with your mapping. Since you have expressed a desire to stay "schema-less", I am assuming you have not explicitly provided a mapping for your index. That works fine, just recognize that the first document you index will determine the schema for your index. Each document you index afterwards that has the same fields (by name), those fields must conform to the same type as the first document.
Elasticsearch has no issues with arrays of values. In fact, under the hood it treats all values as arrays (with one or more entries). What is slightly concerning is the example array you chose, which mixes string and numeric types. Since each value in your array gets mapped to the field named "test", and that field may only have one type, if the first value of the first document ES processes is numeric, it will likely assign that field as a long type. Then, future documents that contain a string that does not parse nicely into a number, will cause an exception in Elasticsearch.
Have a look at the documentation on Dynamic Mapping.
It can be nice to go schema-less, but in your scenario you may have more success by explicitly declaring a mapping on your index for at least some of the fields in your documents. If you plan to index arrays full of mixed datatypes, you are better off declaring that field as string type.

Related Links

tkinter populate treeview using threading pool
How to make a function determining the winner of Tic-Tac-Toe more concise
Django update model entry using form fails
ctypes using HRESULT(python)
How to export property values with django-import-export
Plotting Coordinate Lines Using Matplotlib
BASH - Summarising information present in 2 genotype data columns in one column (ped file) [closed]
Python 3: How to call function from another file and pass arguments to that function ?
Streaming mp3 files in Django through Nginx
Opening PE file with Bokken
Python exercise: last letter / first letter
Python/Pygame: Can you run a program whilst having a Pygame window that can still update?
Replacing values in array from netCDF
confused about Python list syntax
Object orientated function parameter to alter variables
Batch processing and breaking up an image

Categories

HOME
spring
converter
vb6
checkbox
windows-7
ubuntu-16.04
amazon-product-api
apollo
automated-tests
specflow
vsftpd
jframe
shader
nuxt.js
simple-injector
rapidjson
ejs
scala-native
bar-chart
zoomcharts
pyyaml
ip-camera
shippo
scalaz7
onsen-ui
tcpclient
azure-servicebus-queues
memsql
nmf
jflex
samsung-mobile
pentaho-report-designer
invoke-command
ipfw
shapes
service-locator
pox
jboss-esb
xamarin.uitest
image-compression
wcf-security
ivona
data-integration
proof
production-environment
jags
forerunnerdb
launch
teiid
redux-router
garrys-mod
aescryptoserviceprovider
ruby-2.0
linkageerror
inotifypropertychanged
emokit
amazon-kcl
license-key
servlet-3.0
enyo
registrykey
findcontrol
debian-based
hiera
custom-url
tilestache
jquery-autocomplete
mov
author
valence
coin-flipping
reserved-words
lov
sanitization
image-zoom
javascriptmvc
carddav
google-closure-library
manage.py
mozart-mvc
vim-powerline
javascriptserializer
cadisplaylink
expression-evaluation
v4l
msgbox
hinstance
adsl
port-scanning
symbol-server
pantheios
requestfactory
integer-promotion
visual-studio-2010-beta-2
opcodes

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App