python


pandas parse csv with newlines


A while ago I used quotes on both sides of my data and read it into pandas pandas parse csv with left and right quote chars now, I also need to support newlines and some weird characters.
Minimum sample below, the first string (temp) will work just fine, but the second one won't parse properly.
import pandas as pd
import os
from pandas.compat import StringIO
temp=u"""<first>$$><$$<second>$$><$$<first>
<foo>$$><$$<bar>$$><$$<baz>"""
temp=u"""<first>$$><$$<second>$$><$$<third>
<foo>$$><$$<bar>$$><$$<baz>
<foo>$$><$$<Green; kkkk 101; aaaa, bbb; [foo<1>>aaa<123>>xxx<1>>zzz<1.17989207 | 18187681 | asdf |>>
;sdf{
}
;ADD{
]>$$><$$<baz>"""
big_df = pd.read_csv(StringIO(temp),
encoding='utf8',
sep='\$\$><\$\$',
decimal=',',
engine='python') # we cant use pandas optimized C parser due to our special delimiters.
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df.iloc[:, -1] = big_df.iloc[:, -1].str.replace('\$\$>$', '')
big_df = big_df.replace(['^<', '>$'], ['', ''], regex=True)
big_df.columns = big_df.columns.to_series().replace(['^<', '>$', '>\$\$'], ['', '', ''], regex=True)
big_df
edit
As outlined in the comment - when putting all onto a single line it works just fine.
How could I automate this maybe via sed/Awk?
awk '{printf("%s ",$0)} END{print ""}' sample.csv will remove all new lines and concatenate everything into a single line. I would rather only want to remove the problematic newlines.
awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' sample.csv will already remove the normal newlines. But still there are the additional blank lines.
So your "real" newlines are marked with $$>\n. Read your file in to string, replace $$>\n with something temporary, remove any remaining newlines, reinsert the "real" newlines, then pass to read_csv().
temp = temp.replace('$$>\n', '%%NEWLINE%%').replace('\n','').replace('%%NEWLINE%%', '\n')
big_df = pd.read_csv(StringIO(temp), ...)

Related Links

boto command for describing an Auto Scaling Group?
get sets of index values, grouped by column year
How to use a tensorflow model extracted from a trained keras model
How to query with many tables
python-shell on linux system indentation error
How come 1 is printed instead of 0?
How to subtracting two hyperspectral image?
Getting an error: list assignment index out of range
Launching dev_appserver.py from windows powershell gives me “too few arguments” error
How do you split all of a certain character in Python [duplicate]
Issues with data types in pandas functions
Firebase Console but there is error tell “Error Generating Download URL”
Python Twitter Streaming Timeline
Interval intersection in pandas
Putting double quotes for an output string
How to query with raw SQL using Session or engine

Categories

HOME
cntk
security
meshlab
nlp
system-verilog
visual-studio-2013
json-ld
wampserver
flyway
ios10
shader
uiactivityviewcontroller
ssr
lagom
intellij-plugin
typeahead
multiple-monitors
pdo
cross-platform
ssms-2016
scaling
reduction
atlassian-plugin-sdk
jogl
primes
android-fragmentactivity
url.action
iup
qsub
pcre
forum
blackboard
websauna
superscript
devforce
android-vpn-service
strophe
entity-system
canvasjs
swiftcharts
wfp
nsexception
tango
mod-fcgid
matlab-cvst
archer
rdw
jtds
defold
jags
livefyre
jgraph
udev
ticker
tcserver
in-memory-database
pycurl
infinite-scroll
selenium-firefoxdriver
spyne
createobject
android-navigationview
gradle-script-kotlin
push-diffusion
uistackview
debugdiag
aescryptoserviceprovider
dbscan
django-south
search-regex
mfc-feature-pack
altbeacon
schtasks.exe
unity5.3
matcaffe
rad
jericho-html-parser
ffprobe
type-mismatch
findcontrol
jai
ultrawingrid
ember-components
humanizer
dache
chrome-for-android
sim900
driver-signing
onselect
nimrod
carddav
google-closure-library
client-side-scripting
mstsc
adomd.net
resgen
digital-design
qt4.6
process-monitor
android-contextmenu
transitive-closure-table
git-filter-branch
enumerators
selectonemenu
nssavepanel
zookeeper
cookieless
pivotal-crm
loadui
camtasia
communicationexception

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App