python


Index JSON files in elasticsearch using Python?


I have a bunch of JSON files(100), which are named as merged_file 1.json, merged_file 2. json and so on.
How do I index all these files into elasticsearch using python(elasticsearch_dsl) ?
I am using this code, but it doesn't seem to work:
from elasticsearch_dsl import Elasticsearch
import json
import os
import sys
es = Elasticsearch()
json_docs =[]
directory = sys.argv[1]
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
json_docs.append(json.load(open_file))
es.bulk("index_name", "type_name", json_docs)
The JSON looks like this:
{"one":["some data"],"two":["some other data"],"three":["other data"]}
What can I do to make this correct ?
For this task you should be using elasticsearch-py (pip install elasticsearch):
from elasticsearch import Elasticsearch, helpers
import sys, json
es = Elasticsearch()
def load_json(directory):
" Use a generator, no need to load all in memory"
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
yield json.load(open_file)
helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')

Related Links

Python code giving error
Celery concurrency configuration for io/cpu bound task
Function that takes 3 list arguments and returns all the combinations [duplicate]
how to read files with special characters in python
Using the Python Request Library to get Google Oauth2 tokens
(Beginner)Python functions Codeacademy
Python script hangs
How to shift a list nth time in Left or Right and fill it up with 0 in python
NumPy, why equality check does not work for an array of objects?
String formatting of timedeltas in Pandas
Celery - How to send task from remote machine?
YouTube Api v3 handling Exceptions in Python
Copy data from one oracle database to another with Python
converting python program into executable
How to use multidatabase across multiproject in django?
SSLHandshakeError when connecting to Google Analytics using Google APIs Client Library for Python

Categories

HOME
makefile
crystal-reports
security
erlang
redis
crate
xamarin.android
ocaml
angular2-routing
javamail
orientation
histogrammar
gspread
triggers
cpanel
nuxt.js
event-log
scala-native
yeoman-generator
pyephem
internet-explorer-8
boolean-expression
statusbar
prediction
tarantool
piwik
data-cleansing
switching
continuous-deployment
rhandsontable
helper
gzip
autofill
invoke-command
dhtmlx-scheduler
postgresql-9.2
outsystems
shapes
transient
phpspreadsheet
windows-10-iot-core
rider
excel-interop
z3py
proof
pinvoke
jenkins-jira-trigger
ticker
shinyjs
sqldatareader
mix
jackson-databind
mpeg-4
inet
aerogear
asp.net-mvc-2
bluesnap
geodjango
matcaffe
xjc
twitter-rest-api
findcontrol
pylearn
callstack
epl
guzzle6
pl-i
line-numbers
libsndfile
neoload
kraken.js
slick-2.0
dbsetup
ksoap2
healthvault
cdata
iconv
shortcuts
dwolla
digital-design
zend-lucene
phpsh
xgettext
isolatedstorage
law-of-demeter
dcpu-16
noir
front-controller
graph-layout
integer-promotion
internals
configurable

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App