python


pandas parse csv with newlines


A while ago I used quotes on both sides of my data and read it into pandas pandas parse csv with left and right quote chars now, I also need to support newlines and some weird characters.
Minimum sample below, the first string (temp) will work just fine, but the second one won't parse properly.
import pandas as pd
import os
from pandas.compat import StringIO
temp=u"""<first>$$><$$<second>$$><$$<first>
<foo>$$><$$<bar>$$><$$<baz>"""
temp=u"""<first>$$><$$<second>$$><$$<third>
<foo>$$><$$<bar>$$><$$<baz>
<foo>$$><$$<Green; kkkk 101; aaaa, bbb; [foo<1>>aaa<123>>xxx<1>>zzz<1.17989207 | 18187681 | asdf |>>
;sdf{
}
;ADD{
]>$$><$$<baz>"""
big_df = pd.read_csv(StringIO(temp),
encoding='utf8',
sep='\$\$><\$\$',
decimal=',',
engine='python') # we cant use pandas optimized C parser due to our special delimiters.
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df = big_df.replace(['^<', '>$'], ['', ''], regex=True)
big_df.columns = big_df.columns.to_series().replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)
big_df
edit
As outlined in the comment - when putting all onto a single line it works just fine.
How could I automate this maybe via sed/Awk?
awk '{printf("%s ",$0)} END{print ""}' sample.csv will remove all new lines and concatenate everything into a single line. I would rather only want to remove the problematic newlines.
awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' sample.csv will already remove the normal newlines. But still there are the additional blank lines.
So your "real" newlines are marked with $$>\n. Read your file in to string, replace $$>\n with something temporary, remove any remaining newlines, reinsert the "real" newlines, then pass to read_csv().
temp = temp.replace('$$>\n', '%%NEWLINE%%').replace('\n','').replace('%%NEWLINE%%', '\n')
big_df = pd.read_csv(StringIO(temp), ...)

Related Links

CNTK Python how to pass multiple features into model
Python Pandas GroupBy % calculation
Programmatic copy and paste XML Node in MS Word Document?
What specific requirements does the function passed to scipy.optimize.curve_fit need to fulfill in order to run?
Organizing daily Excel data into xarray dataset
Trouble creating MSI installer with electron
Error getting json using oauthlib python
How to extend instance with no class inheritance [duplicate]
How to check for inclusion of multisets?
why cv2.imwrite() changes the color of pics?
Optimize data conversion program to avoid memory error
Flask list of last used pages with sessions TypeError
python sqlite3.OperationalError: near “-”: syntax error
Using bokeh to select a data region within a Jupyter Notebook
Using asyncio nested_future() and gather() with nested loops
why does no picture show

Categories

HOME
oauth
nuxeo
label
youtube-livestreaming-api
extract
hana
mainframe
docker-windows
i2c
dropbox
facebook-javascript-sdk
uiactivityviewcontroller
ndis
nixos
ssr
gatsby
automata
symfony-forms
scaling
scenebuilder
derived
device-detection
restfb
qsub
shopping-cart
mayavi
websauna
pep8-assembly
helix-3d-toolkit
postgresql-9.2
service-locator
pace
wfp
xamarin.uitest
linq-to-entities
twiki
matlab-cvst
catia
onmouseover
jtds
walmart-electrode
dimple.js
spell-checking
cgcontext
greenhills
google-maps-ios
jexl
hpcc
teiid
restivejs
cleditor
multifile-uploader
nssplitview
twgl.js
fortran90
galaxy
android-recyclerview
geodjango
qbfc
anti-patterns
pgm
epl
low-level
cpu-speed
odftoolkit
fragment-tab-host
hidden-field
cg
node-imagemagick
system.web
emma
spdy
funkload
level
mimosa
sslexception
ruboto
visual-studio-2003
ng-pattern
mstsc
mailcore
zend-lucene
arbor.js
xml-libxml
cbcentralmanager
xgettext
maven-ant-tasks
shared-objects
electronic-signature
handwriting
posting
clause
quick-search
rtsp-client
openvg
winsnmp
visual-studio-dbpro

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App